vmware – Thinking Sysadmin

VMware Tools Upgrade on CentOS Enables Host Time Sync (plus fix)

anleonard — Fri, 12 Nov 2010 18:53:22 +0000

After bringing some CentOS guests from an ESX 3.5 environment to an ESXi 4.1 environment and performing a VMware Tools upgrade, I noticed log messages on the VMs similar to the following:

Nov 12 09:07:18 node01 ntpd[2574]: time reset +175.995101 s

Along with console messages about the cmos clock such as:

time.c can't update cmos clock from 0 to 59

Inspecting the affected VMs, the clock appeared to be losing almost a second each second, despite ntpd being up and running and kernel options set appropriately. Further investigation revealed that “Synchronize guest time with host” had been silently enabled for the guest during the Tools upgrade, contrary to VMware’s Timekeeping best practices.

To be fair, I don’t know how widespread this problem is – it could be particular to CentOS, ESX 3.5 to 4.1 migrations, the fact that the virtual hardware hasn’t yet been upgraded from version 4 to version 7, or even my method of upgrading the tools. However, once you know to look for this issue, the resolution is simple: Disable host time sync. You can do this manually, or, if you use Puppet to manage your Linux VMs, the following manifest snippet will automate this for you (assuming you have a “vmware-tools” Service):

exec { "Disable host time sync":
  onlyif => "/usr/bin/test `/usr/bin/vmware-toolbox-cmd timesync status` = 'Enabled'",
  command => "/usr/bin/vmware-toolbox-cmd timesync disable",
  require => Service["vmware-tools"],
}

Put Down the Saw and Get the Glue: Working Around VMware KB1022751

anleonard — Thu, 23 Sep 2010 22:18:53 +0000

VMware KB article 1022751 lays out the details of an interesting bug in ESXi 4.0 and 4.1 pretty plainly:

When trying to team NICs using EtherChannel, the network connectivity is disrupted on an ESXi host. This issue occurs because NIC teaming properties do not propagate to the Management Network portgroup in ESXi. When you configure the ESXi host for NIC teaming by setting the Load Balancing to Route based on ip hash, this configuration is not propagated to Management Network portgroup.

(Note that load balancing by IP hash is the only supported option for EtherChannel link aggregation.)

Unfortunately, the KB article’s workaround – there is no patch that I’m aware of – requires network connectivity to the host via the vSphere Client. But what do you do if you’ve just sawed off the branch you’re sitting on network-wise, and can no longer connect with the vSphere client?

Enable Local Tech Support Mode on the ESXi host, log in as root and run the following command:

vim-cmd hostsvc/net/portgroup_set --nicteaming-policy=loadbalance_ip vSwitch0 "Management Network"

(Replace “vSwitch0” and “Management Network” with the appropriate vSwitch and portgroup as necessary.)

You may also find that while both NICs are active for the vSwitch, one will be in “standby” for the portgroup – a configuration not supported for IP hash load balancing. It would be reasonable to think that you could fix this with the following, but you can’t (see error on lines 2-8):

~ # vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-active=vmnic0,vmnic1 vSwitch0 "Management Network"
(vmodl.fault.InvalidArgument) {
   dynamicType = , 
   faultCause = (vmodl.MethodFault) null, 
   invalidProperty = , 
   msg = "A specified parameter was not correct. 
", 
}

VMware appears to have known about this bug for a while now – try searching the VMware Communities for some workarounds dating back to the 3.x days, including some from VMware employees – so resolving it is presumably either extremely difficult or not currently a high priority. However, you will likely be able reach the ESXi host using the vSphere Client after fixing the portgroup NIC teaming policy, so you can fix this issue in the GUI.

If you find yourself attempting to automate an ESXi install with Kickstart and don’t want to make fixing the portgroup through the vSphere Client part of your install process, consider not using EtherChannel at all for the Management Network – just use active and standby NICs, perhaps in a configuration similar to Kendrick Coleman’s ESXi 4.1 Kickstart Install blog post.

Interesting Linux VM Crash Pattern

anleonard — Fri, 20 Nov 2009 19:09:16 +0000

I’ve just begun to pull together some interesting data on a series of Linux VM crashes I’ve seen. I don’t have a resolution yet, but some interesting patterns have emerged.

Crash Symptoms

A CentOS 4.x or 5.x guest will crash with a message similar to the following on its console:

CentOS 4.x:

[] .text.lock.scsi_error+0x19/0x34 [scsi_mod] [] mptscsih_io_done+0x5ee/0x608 [mptscsi] (…) [] common_interrupt+0x18/0x20 [] system_call+0x0/0x30

CentOS 5.x:

RIP [] list_del+0x48/0x71 RSP <0>Kernel Panic - not syncing: Fatal exception

A hard reset (i.e. pressing the reset button on the VM’s console) is required to reboot the guest.

Further Details

Five different VMs have encountered this issue, running at a mix of close-to-current CentOS 4.x and 5.x patch levels. Guest kernel versions when the crash occurred were 2.6.18-128.7.1.el5 and 2.6.18-128.1.10.el5 (5.x) and 2.6.9-89.0.9.ELsmp (4.x). Memory allocations on affected guests range from 512MB to 3072MB. Notably, all affected VMs are using SMP – each has 2 vCPUs – having been created before our in-house practices followed VMware guidelines and discouraged use of SMP on ESX guests when unnecessary. One VM was created via P2V; the rest were created de novo on virtual hardware.

All crashes have happened on a single node in an ESX 3.5 HA cluster composed of four Dell PowerEdge 1950s. ESX hosts have tracked the latest VMware patches closely. COS memory on the ESX host in question was increased from the default to 800MB prior to the three most recent crashes; in other words, the COS memory increase appears to have had no effect on the crashes. DRS is in use, set to “fully automated” and “apply recommendations with three or more stars” and no virtual machine rules have been created to control DRS host placement.

All guests are on the same NFS data store, served from a NetApp filer running ONTAP 7.2.x. One guest had its vmswap placed separately on an iSCSI data store; the rest have their swap stored on NFS with the VM. No log messages were seen on the filer during the event, although the a log message similar to the following has been seen several times on the ESX host:

vmkernel: 43:07:27:51.725 cpu2:2185)WARNING: NFS: 4590: Can't find call with serial number -2146566055

Curiously, all crashes have happened in the evening, in the 10 o’clock hour, after nightly backups have been completed. Backups are created using a combination of VMware and NetApp snapshots via a script similar to one detailed on vmwaretips.com. No substantial load or latency has been recorded on the NetApp during the crashes, and weeks have passed between events.

Speculation

Explanations I’m leaning towards, ranked by my judgment of their likelihood:

1) Hardware issue. Assuming a random distribution of VMs – recall that DRS is in use and no virtual machine rules are in place – the odds of all five crashes happening on one host out of four are slim: 1 in 1024. Unfortunately, by all measures we’ve used, including the VI Client’s “Health Status” and Dell OMSA, there are no hardware issues with the host.

Further, the distribution of VMs is not truly random. DRS migrations are infrequent in this cluster, and the largest determinant of guest location is migration following hosts being placed into maintenance mode for patching.

If it is a hardware issue, it’s subtle, and possibly only brought to the fore by the following issues.

2) Red Hat Enterprise Linux bug – which, by extension, is typically equivalent to a CentOS bug. In fact, this issue appears to have been raised with Red Hat already in bugs 197158 and 228108 – but, according the bug reports, the issue is resolved, and the patches have since been ported downstream to CentOS. However, perhaps the issue is not truly resolved – see comment 35 in 228108.

3) vSMP Bug. The majority of our Linux VMs are uniprocessor and appear so far to be immune to this issue; it is striking that the crash has only occurred on dual processor guests. I cannot articulate a mechanism for multiple vCPUs causing this crash, however.

4) NetApp issue. This appears to be a storage issue at some level, considering the mptscsi and NFS messages noted above, so performance of the NetApp filer would be a natural place for further investigation. However, we monitor the performance of our filer relatively closely, using the ONTAP SDK and Cacti, and nothing unusual was recorded during any crash. It seems unusual that all VMs reside on the same data store, but that data store shares an aggregate with multiple other unaffected data stores, and several LUNs are served from the same aggregate to non-ESX machines without complaint.

I have not yet opened a case with VMware on this issue – or Dell, or NetApp, for that matter – but if and when I do, I’ll update here to the extent possible.

Update 11/20/2009: Prompted by a helpful comment from nate below, I looked up and verified the NFS settings across the cluster. They are the same across all hosts, and are as follows:

NFS.IndirectSend 0 NFS.DiskFileLockUpdate 10 NFS.LockUpdateTimeout 5 NFS.LockRenewMaxFailureNumber 3 NFS.LockDisable 0 NFS.HeartbeatFrequency 12 NFS.HeartbeatTimeout 5 NFS.HeartbeatDelta 5 NFS.HeartbeatMaxFailures 10 NFS.MaxVolumes 8 NFS.SendBufferSize 264 NFS.ReceiveBufferSize 128 NFS.VolumeRemountFrequency 30 NFS.UDPRetransmitDelay 700

The only values that are changed from default are HeartbeatFrequency and HeartbeatMaxFailures, to match NetApp’s recommendations in TR-3428.

Keeping your RHEL VMs from crushing your storage at 4:02am

anleonard — Thu, 19 Nov 2009 19:39:30 +0000

Running a lot of Red Hat VMs in your virtual infrastructure, on shared storage? CentOS, Scientific Linux, both versions 4 and 5, they count for these purposes; Fedora should likely be included too. Do you have the slocate (version 4.x and earlier) or mlocate (version 5.x) RPMs installed? If you’re uncertain, check using the following:

> rpm -q slocate slocate-2.7-13.el4.8.i386

> rpm -q mlocate mlocate-0.15-1.el5.2.x86_64

If so, multiple RHEL VMs plus mlocate or slocate may be adding up to an array-crushing 4:02am shared storage load and latency spike for you. Before being addressed, this spike was bad enough at my place of employment (when combined with a NetApp Sunday-morning disk scrub) to cause a Windows VM to crash with I/O errors. Ouch.

Details and ideas for resolution:

By default, a line in /etc/crontab runs the scripts within /etc/cron.daily at 4:02am each morning:

02 4 * * * root run-parts /etc/cron.daily

One of those scripts – mlocate.cron or slocate.cron, depending on your OS version – launches updatedb; as the man page says, “updatedb creates or updates a database used by locate(1).” (The “locate” binary is a filesystem search tool, see “man locate” for more information.) Updatedb refreshes its database by walking the filesystem, generating a fair amount of I/O on a single system. Imagine upwards of thirty of these running in parallel through VMDKs on one shared storage system carrying out internal maintenance at the same time, and you’re pretty much picturing the problem my employer had.

I see three options for addressing this issue:

1) Uninstall mlocate or slocate. If you don’t currently use “locate” and you’re not interested in learning to use a tool that will likely make you more effective at your job (again, see “man locate”), this is probably the best option. (Yeah, I know, people that fit this bill generally don’t read blogs more technical than this one, so I could probably have skipped it here. Consider it an option for completeness, or if you really need to strip down an install.)

2) Disable the scheduled job by removing mlocate.cron or slocate.cron from /etc/cron.daily. This keeps locate available for your use, but requires that you update locate’s database ad-hoc and interactively by running the following as root:

# updatedb

This will take a few minutes to return, depending on the size of your file systems.

I don’t recommend this option either; at least it doesn’t fit the way I work. I often find myself using locate in high-pressure situations in which I need to quickly get a file location on a system. Waiting minutes for updatedb to return is extra painful when every second counts.

3) Stagger when updatedb runs by inserting a random delay into the script.. This is my preferred alternative; locate’s database is kept current automatically, and your storage doesn’t have to bear a sudden spike in load. I implemented this by adding the lines in bold (lines 2-7 if your browser doesn’t display the bold text clearly):

#!/bin/sh # sleep up to two hours before launching job: value=$RANDOM while [ $value -gt 7200 ] ; do value=$RANDOM done sleep $value nodevs=$(/dev/null 2>&1 /usr/bin/updatedb -f "$nodevs"

The added code inserts a pseudo-random sleep delay of up to two hours before updatedb runs, with the key being the built-in Bash function $RANDOM. In our environment, this removed a 2000 IOPS spike at 4:02am, and eliminated a corresponding jump in filer latency. Obviously, adjust the delay period as appropriate for your environment. Additionally, be sure to add this change to your configuration management or installation management tools so that all of your RHEL and RHEL-derived VMs get the updated script.

Using $RANDOM to avoid this variant of the thundering herd problem also works nicely for a range of similar problems; I believe I first saw it at Moundalexis.com.

(This problem may apply to other Linux distributions being run as VMs, and FreeBSD does something equivalent – weekly – with /etc/periodic/weekly/310.locate. A similar solution can be applied to these environments, if necessary.)

VMware/NFS/NetApp SnapRestore/Linux LVM Single File Recovery Notes

anleonard — Mon, 01 Jun 2009 21:55:54 +0000

There have been a few posts elsewhere discussing file-level recovery for Linux VMs on NetApp NFS datastores, but none that have dealt specifically with Linux LVM-encapsulated partitions.

Here’s our in-house procedure for recovery; note that we do not have FlexClone licensed on our filers.

Prerequisites

An existing VMware ESX infrastructure, connected to a NetApp filer NFS datastore; SnapRestore speeds the recovery process but is not mandatory – see discussion below.
A backup script or system which coordinates VMware snapshots with NetApp snapshots – perhaps something along the lines of Rick Scherer’s script.
A dedicated Linux restore VM, at a similar version level to the rest of your Linux VM infrastructure. This VM should have LVM support, but should not have any volume groups (VGs) or logical volumes (LVs) configured – volume group and logical volume names on the VMDK you are restoring from must not conflict with VGs and LVs already in use on the restore system; the simplest way to guarantee this is to simply not have any VGs or LVs.

Restore Procedure

Restore the VMDK file from the appropriate snapshot to a new location in the datastore. With SnapRestore, this can be done as follows (one line in the filer CLI, restoring from snapshot sv_daily.0 to a new file – again, be extremely careful not to overwrite the current version of the VMDK in your datastore, consider restoring to an entirely different directory in the FlexVol):
snap restore -t file -s sv_daily.0 -r /vol/vmware04_sis/system.example.com/system.example.com-restore.vmdk /vol/vmware04_sis/system.example.com/system.example.com.vmdk

Follow the prompts, verifying the restore path is correct and is not the path to your existing VMDK. Do the same for the flat VMDK file (again, one line, and, as before, use caution to make sure you do not clobber an existing file):

snap restore -t file -s sv_daily.0 -r /vol/vmware04_sis/system.example.com/system.example.com-restore-flat.vmdk /vol/vmware04_sis/system.example.com/system.example.com-flat.vmdk

Without SnapRestore, you can simply mount the NFS export of the datastore on a Linux machine and use “cp” to copy the files out of the snapshot. For flat VMDK files, expect this copy to run for a substantial amount of time compared the nearly-instant recovery SnapRestore offers.
Manually edit the line below “# Extent description” in the recovered .vmdk file to match the path to the recovered flat VMDK. In this case, it would look something like this:
# Extent description RW 20971520 VMFS "system.example.com-restore-flat.vmdk"
Attach the recovered VMDK to your powered-off restore host. Boot the restore host.
Once your restore host is up, use “pvscan”, “vgscan” and “lvscan” (each without arguments) as root to examine available LVM components. Then, use the “lvchange” command to activate the necessary volume group (in this case, “VolGroup00”):
# lvchange -ay VolGroup00
Mount the appropriate logical volume – for example, LogVol00 in VolGroup00:
mount -o ro /dev/VolGroup00/LogVol00 /mnt
Restore files by copying them out of /mnt.

Cleanup

Shut down the Linux restore host.
Remove the recovery VMDK – the files restored with SnapRestore or by “cp” above – from the restore host in the VMware Infrastructure Client.
Delete the recovery .vmdk and -flat.vmdk files in the NFS datastore. Don’t screw up here: Be sure to delete the recovery files only, not the working VMDK.

ESX VM swap on NFS: If it crashes, try something else

anleonard — Thu, 05 Feb 2009 00:58:42 +0000

I’ve written about running VMware ESX with VM swap on an NFS datastore previously – specifically whether or not it was supported/recommended:

After writing the second post, I thought the issue was pretty much resolved: From multiple sources, the consensus seemed to be that running ESX with VM swap on NFS would be fine. Imagine my surprise (and disappointment) at seeing the following VMware KB article 1008091, updated yesterday: An ESX virtual machine on NFS fails with swap errors. Further details are in the article itself, but VMware’s KB site is throwing intermittent errors for me at the moment, so I’ll provide the money quote:

The reliability of the virtual machine can be improved by relocating the swap file location to a non-NFS datastore. Either SAN or local storage datastores improve virtual machine stability.

VMware: Not kidding about VMotion GigE Requirement

anleonard — Wed, 04 Feb 2009 00:42:06 +0000

In case you’re curious/adventurous/broke enough to try configuring your VMotion network on Fast Ethernet instead of Gigabit Ethernet, here’s what you can expect.

First, a warning from your VI client that you’re venturing into unsupported territory:

A friendly warning

" data-large-file="https://andyleonard.com/wp-content/uploads/2009/02/vmotion.png?w=535" class="size-medium wp-image-208" title="vmotion" src="https://andyleonard.com/wp-content/uploads/2009/02/vmotion.png?w=300&h=182" alt="A friendly warning" width="300" height="182" srcset="https://andyleonard.com/wp-content/uploads/2009/02/vmotion.png?w=300&h=182 300w, https://andyleonard.com/wp-content/uploads/2009/02/vmotion.png?w=150&h=91 150w, https://andyleonard.com/wp-content/uploads/2009/02/vmotion.png 535w" sizes="(max-width: 300px) 100vw, 300px" />

A friendly warning

And then, if you go ahead with the VMotion, a slight pause on the VM in question. The following is output from running while true; do date; sleep 1; done on a Linux guest during the VMotion:

Tue Feb  3 13:23:17 PST 2009
Tue Feb  3 13:23:18 PST 2009
Tue Feb  3 13:23:19 PST 2009
Tue Feb  3 13:23:20 PST 2009
Tue Feb  3 13:23:21 PST 2009
Tue Feb  3 13:23:22 PST 2009
Tue Feb  3 13:24:12 PST 2009
Tue Feb  3 13:24:13 PST 2009
Tue Feb  3 13:24:14 PST 2009
Tue Feb  3 13:24:15 PST 2009
Tue Feb  3 13:24:16 PST 2009

Note the fifty second pause between 13:23:22 and 13:24:12? Ouch…

VMware about ESX swap on NFS: It’s okay

anleonard — Mon, 24 Nov 2008 18:32:42 +0000

Paul Manning, from VMware, in response to a question I asked in the VI:OPS forums:

The current best practice for NFS is to not seperate the VM swap space from the VMhome directory on a NFS datastore. The reason for the originial recommendation was just good old fashioned conservitiveness.

More at the forum post, including more on the reasoning for the old recommendation of separating swap when using NFS – thanks, Paul, you made my day.

Quick and Dirty VMware ESX Patching

anleonard — Thu, 31 Jul 2008 20:41:09 +0000

On the ESX console, do the following:

Read the documentation for each patch.
Group patches that can be installed together into a directory, possibly an NFS mount available on all your ESX hosts.
Cd into the patch directory and untar the patches:
for i in `ls *.tgz`; do tar -xvzf $i done
Install the patches:
for i in `ls`; do
if [ -d $i ]; then
cd $i
esxupdate --noreboot update
cd ..
fi
done
Reboot.

VMware’s Comparison of Storage Protocol Performance

anleonard — Sat, 09 Feb 2008 04:18:49 +0000

VMware has just released a paper entitled Comparison of Storage Protocol Performance (seen at Scale the Mind and blog.scottlowe.org); maybe this will help deflate some of the too-often repeated speculation that NFS is too slow for VMware ESX.

VMware’s findings match well with what I’ve seen. On some in-house application-specific benchmarking that I’ve done, I actually saw overall better performance with an NFS datastore than with a software iSCSI datastore on the same filer. I won’t get into details, because the benchmark was specific to us (and I’d probably need a lawyer to review the EULAs before “publishing” anything…), but NFS was equal to or slightly faster than iSCSI across the board on this specific set of tests in our specific environment. Given the management and deployment advantages of NFS, that’s huge.

Of course, I wouldn’t recommend basing your whole ESX/NetApp deployment strategy off of one unsubstantiated benchmark-related post on my blog; if you’re using ESX with NetApp storage, I would strongly recommend testing NFS datastores if you haven’t already, though.