Amazon Route 53 DNS Service Examined

Amazon has announced a new authoritative DNS service – Route 53.

Sign up is straightforward – click a few buttons on aws.amazon.com, and a few moments later, you’ll have an email confirming your access to the service. If you dig into the Getting Started Guide, you’ll note that “Part of the sign-up procedure involves receiving a phone call and entering a PIN using the phone keypad,” however, that wasn’t necessary for me. Perhaps it’s only for new AWS accounts?

There is no user interface in the AWS Console although there are indications one is on its way. The Route 53 developer tools are fairly bare-bones at this point – four Perl scripts. Those scripts require relatively uncommon Perl modules, not included in the default Ubuntu (Lucid) repositories, although they are available through CPAN.

However, the third-party Boto Python interface to Amazon Web Services already includes support, and presumably other tools are also rapidly adding support, if they don’t have it already.

Using the Perl tools, I created a zone for an example domain – gearlister.org – and was given four name servers:

ns-1945.awsdns-51.co.uk (205.251.199.153)
ns-39.awsdns-04.com (205.251.192.39)
ns-690.awsdns-22.net (205.251.194.178)
ns-1344.awsdns-40.org (205.251.197.64)

Continue reading

Adding Swap to an EC2 Micro Instance

EC2 micro instances come with no swap by default – at least every micro instance that I’ve ever launched does, I’m not sure if it’s theoretically possible to launch an instance with swap. The lack of swap is probably a side-effect of the limited memory combined with EBS-only storage and concomitant risk of high EBS charges if you swap heavily.

However, if you’re willing to accept the risk of unexpected high EBS I/O costs, it’s straightforward to add swap:

# /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
# /sbin/mkswap /var/swap.1
# /sbin/swapon /var/swap.1

Or, if you prefer Puppet:

class swapfile {

  exec { "create swap file":
    command => "/bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024",
    creates => "/var/swap.1",
  }

  exec { "attach swap file":
    command => "/sbin/mkswap /var/swap.1 && /sbin/swapon /var/swap.1",
    require => Exec["create swap file"],
    unless => "/sbin/swapon -s | grep /var/swap.1",
  }
  
}

VMware Tools Upgrade on CentOS Enables Host Time Sync (plus fix)

After bringing some CentOS guests from an ESX 3.5 environment to an ESXi 4.1 environment and performing a VMware Tools upgrade, I noticed log messages on the VMs similar to the following:

Nov 12 09:07:18 node01 ntpd[2574]: time reset +175.995101 s

Along with console messages about the cmos clock such as:

time.c can't update cmos clock from 0 to 59

Inspecting the affected VMs, the clock appeared to be losing almost a second each second, despite ntpd being up and running and kernel options set appropriately. Further investigation revealed that “Synchronize guest time with host” had been silently enabled for the guest during the Tools upgrade, contrary to VMware’s Timekeeping best practices.

To be fair, I don’t know how widespread this problem is – it could be particular to CentOS, ESX 3.5 to 4.1 migrations, the fact that the virtual hardware hasn’t yet been upgraded from version 4 to version 7, or even my method of upgrading the tools. However, once you know to look for this issue, the resolution is simple: Disable host time sync. You can do this manually, or, if you use Puppet to manage your Linux VMs, the following manifest snippet will automate this for you (assuming you have a “vmware-tools” Service):

exec { "Disable host time sync":
  onlyif => "/usr/bin/test `/usr/bin/vmware-toolbox-cmd timesync status` = 'Enabled'",
  command => "/usr/bin/vmware-toolbox-cmd timesync disable",
  require => Service["vmware-tools"],
}

Put Down the Saw and Get the Glue: Working Around VMware KB1022751

VMware KB article 1022751 lays out the details of an interesting bug in ESXi 4.0 and 4.1 pretty plainly:

When trying to team NICs using EtherChannel, the network connectivity is disrupted on an ESXi host. This issue occurs because NIC teaming properties do not propagate to the Management Network portgroup in ESXi. When you configure the ESXi host for NIC teaming by setting the Load Balancing to Route based on ip hash, this configuration is not propagated to Management Network portgroup.

(Note that load balancing by IP hash is the only supported option for EtherChannel link aggregation.)

Unfortunately, the KB article’s workaround – there is no patch that I’m aware of – requires network connectivity to the host via the vSphere Client. But what do you do if you’ve just sawed off the branch you’re sitting on network-wise, and can no longer connect with the vSphere client?
Continue reading

Installing the F5 FirePass VPN Client on Ubuntu 10.04 AMD64

Disclaimer: I am not a FirePass administrator; only an end-user and have no other relationship with F5. There may be better methods to address this issue; please comment if you know of one.

See also: f5vpn-login.py, described here, and brought to my attention by sh4k3sph3r3. A CLI FirePass client is quite likely a better solution than separate browser instances, etc.

Preliminaries: Although the F5 FirePass SSL VPN product supports Linux, as best as I can tell, that support is somewhat limited: My understanding is that they officially claim support for 32-bit installs only, and they do not appear to track new distribution releases particularly aggressively. F5 has also been somewhat slow in supporting new browser versions: They announced support for Firefox 3 on October 6, 2008, nearly four months after its release and with only two months to go before Firefox 2 was end-of-lifed. For Firefox 3.6 support, a comment on the post linked above states that you need to request a special hot fix from F5 (which my site has not applied). There is no Google Chrome support that I am aware of.

Further, F5’s automated client installation tools have unfortunately never worked for me on Linux, even when the architecture and browser are in their support matrix. The manual download instruction links are also broken on the FirePass install I connect to.

Solution: Install a dedicated, 32-bit version of Firefox in a supported version; create a single-purpose Firefox profile for VPN use. Add the FirePass client to that browser and the operating system.
Continue reading

Test Driving Google Public DNS (Updated with OpenDNS comparison)

Google announced its Public DNS service this morning, claiming enhanced performance and security; I took it for a brief test drive with the following results.

(See bottom of post for an update running similar tests on OpenDNS.)

Methods: I searched Google for keywords that I believed fell somewhere between obscure and common and collected the first ten hostnames printed on the screen. I then used local installations of dig to query a collection of DNS servers for the hostnames’ A records and collected the response times. The different resolvers used were:

  • A local BIND installation (127.0.0.1, cache empty) with Comcast Internet connectivity;
  • A Comcast DNS server (68.87.69.150) via Comcast Internet connectivity;
  • My employer’s internal caching DNS;
  • Google (8.8.8.8) via my employer’s Internet connectivity (mostly Level 3);
  • Google (8.8.8.8) via Comcast; and
  • Google (8.8.8.8) via an Amazon EC2 instance in us-east-1a.

Anticipating a bimodal distribution of results, I assumed high latency responses were cache misses, while low latency responses were cache hits, and categorized results correspondingly.
Continue reading

Interesting Linux VM Crash Pattern

I’ve just begun to pull together some interesting data on a series of Linux VM crashes I’ve seen. I don’t have a resolution yet, but some interesting patterns have emerged.

Crash Symptoms

A CentOS 4.x or 5.x guest will crash with a message similar to the following on its console:

CentOS 4.x:

[<f883b299>] .text.lock.scsi_error+0x19/0x34 [scsi_mod]
[<f88c19ce>] mptscsih_io_done+0x5ee/0x608 [mptscsi] (…)
[<c02de564>] common_interrupt+0x18/0x20
[<c02ddb54>] system_call+0x0/0x30

CentOS 5.x:

RIP  [<ffffffff8014c562>] list_del+0x48/0x71 RSP <ffffffff80425d00> <0>Kernel Panic - not syncing: Fatal exception

A hard reset (i.e. pressing the reset button on the VM’s console) is required to reboot the guest.
Continue reading

Keeping your RHEL VMs from crushing your storage at 4:02am

Running a lot of Red Hat VMs in your virtual infrastructure, on shared storage? CentOS, Scientific Linux, both versions 4 and 5, they count for these purposes; Fedora should likely be included too. Do you have the slocate (version 4.x and earlier) or mlocate (version 5.x) RPMs installed? If you’re uncertain, check using the following:

> rpm -q slocate
slocate-2.7-13.el4.8.i386

or

> rpm -q mlocate
mlocate-0.15-1.el5.2.x86_64

If so, multiple RHEL VMs plus mlocate or slocate may be adding up to an array-crushing 4:02am shared storage load and latency spike for you. Before being addressed, this spike was bad enough at my place of employment (when combined with a NetApp Sunday-morning disk scrub) to cause a Windows VM to crash with I/O errors. Ouch.
Continue reading