thinking sysadmin

qstat -u aleonard -s z

Archive for the ‘vmware’ tag

Interesting Linux VM Crash Pattern

3 comments

I’ve just begun to pull together some interesting data on a series of Linux VM crashes I’ve seen. I don’t have a resolution yet, but some interesting patterns have emerged.

Crash Symptoms

A CentOS 4.x or 5.x guest will crash with a message similar to the following on its console:

CentOS 4.x:

[<f883b299>] .text.lock.scsi_error+0x19/0x34 [scsi_mod]
[<f88c19ce>] mptscsih_io_done+0x5ee/0x608 [mptscsi] (…)
[<c02de564>] common_interrupt+0x18/0x20
[<c02ddb54>] system_call+0x0/0x30

CentOS 5.x:

RIP  [<ffffffff8014c562>] list_del+0x48/0x71 RSP <ffffffff80425d00> <0>Kernel Panic - not syncing: Fatal exception

A hard reset (i.e. pressing the reset button on the VM’s console) is required to reboot the guest.
Read the rest of this entry »

Written by Andy

November 20th, 2009 at 12:09 pm

Keeping your RHEL VMs from crushing your storage at 4:02am

3 comments

Running a lot of Red Hat VMs in your virtual infrastructure, on shared storage? CentOS, Scientific Linux, both versions 4 and 5, they count for these purposes; Fedora should likely be included too. Do you have the slocate (version 4.x and earlier) or mlocate (version 5.x) RPMs installed? If you’re uncertain, check using the following:

> rpm -q slocate
slocate-2.7-13.el4.8.i386

or

> rpm -q mlocate
mlocate-0.15-1.el5.2.x86_64

If so, multiple RHEL VMs plus mlocate or slocate may be adding up to an array-crushing 4:02am shared storage load and latency spike for you. Before being addressed, this spike was bad enough at my place of employment (when combined with a NetApp Sunday-morning disk scrub) to cause a Windows VM to crash with I/O errors. Ouch.
Read the rest of this entry »

Written by Andy

November 19th, 2009 at 12:39 pm

VMware/NFS/NetApp SnapRestore/Linux LVM Single File Recovery Notes

leave a comment

There have been a few posts elsewhere discussing file-level recovery for Linux VMs on NetApp NFS datastores, but none that have dealt specifically with Linux LVM-encapsulated partitions.

Here’s our in-house procedure for recovery; note that we do not have FlexClone licensed on our filers.
Read the rest of this entry »

Written by Andy

June 1st, 2009 at 2:55 pm

ESX VM swap on NFS: If it crashes, try something else

5 comments

I’ve written about running VMware ESX with VM swap on an NFS datastore previously – specifically whether or not it was supported/recommended:

After writing the second post, I thought the issue was pretty much resolved: From multiple sources, the consensus seemed to be that running ESX with VM swap on NFS would be fine.  Imagine my surprise (and disappointment) at seeing the following VMware KB article 1008091, updated yesterday: An ESX virtual machine on NFS fails with swap errors. Further details are in the article itself, but VMware’s KB site is throwing intermittent errors for me at the moment, so I’ll provide the money quote:

The reliability of the virtual machine can be improved by relocating the swap file location to a non-NFS datastore. Either SAN or local storage datastores improve virtual machine stability.

Written by Andy

February 4th, 2009 at 5:58 pm

Posted in virtualization

Tagged with , ,

VMware: Not kidding about VMotion GigE Requirement

one comment

In case you’re curious/adventurous/broke enough to try configuring your VMotion network on Fast Ethernet instead of Gigabit Ethernet, here’s what you can expect.

First, a warning from your VI client that you’re venturing into unsupported territory:

A friendly warning

A friendly warning

And then, if you go ahead with the VMotion, a slight pause on the VM in question.  The following is output from running while true; do date; sleep 1; done on a Linux guest during the VMotion:

Tue Feb  3 13:23:17 PST 2009
Tue Feb  3 13:23:18 PST 2009
Tue Feb  3 13:23:19 PST 2009
Tue Feb  3 13:23:20 PST 2009
Tue Feb  3 13:23:21 PST 2009
Tue Feb  3 13:23:22 PST 2009
Tue Feb  3 13:24:12 PST 2009
Tue Feb  3 13:24:13 PST 2009
Tue Feb  3 13:24:14 PST 2009
Tue Feb  3 13:24:15 PST 2009
Tue Feb  3 13:24:16 PST 2009

Note the fifty second pause between 13:23:22 and 13:24:12? Ouch…

Written by Andy

February 3rd, 2009 at 5:42 pm

Posted in virtualization

Tagged with , ,

Fishworks on the VMware HCL

leave a comment

I was checking out VMware’s new online search-able HCL and I noticed that the new Sun Unified Storage Systems were on the HCL. That was fast – and now I’m really curious as to how the systems with flash drives perform as storage for ESX.

Written by Andy

December 11th, 2008 at 12:29 pm

Posted in storage, virtualization

Tagged with , ,

VMware about ESX swap on NFS: It’s okay

one comment

Paul Manning, from VMware, in response to a question I asked in the VI:OPS forums:

The current best practice for NFS is to not seperate the VM swap space from the VMhome directory on a NFS datastore. The reason for the originial recommendation was just good old fashioned conservitiveness.

More at the forum post, including more on the reasoning for the old recommendation of separating swap when using NFS – thanks, Paul, you made my day.

Written by Andy

November 24th, 2008 at 11:32 am

Posted in virtualization

Tagged with , , ,

Links, 9/10/2008

leave a comment

  • Timekeeping best practices for Linux – “This article presents best practices for Linux timekeeping. These recommendations include specifics on the particular kernel command line options to use for the Linux operating system of interest. There is also a description of the recommended settings and usage for NTP time sync, configuration of VMware Tools time synchronization, and Virtual Hardware Clock configuration, to achieve best timekeeping results.” Where has this document been since I started deploying VMware? Oh, wait, looks like it may have been written on August 19th… Still, thanks, VMware – exactly what I wanted!
  • VI:OPS – A new VMware site: “We created VI:OPS to widen the discussion beyond pure, deep technical by adding five topics that VMware staff, partners and customers talk about all the time but where there is no online collaboration facility for these topics.” I found the above link through a post on this site.

Written by Andy

September 10th, 2008 at 12:57 pm

Posted in link dump

Tagged with , ,

Links, 8/30/2008: Usable space, licensing Windows, multiprotocol VMware storage

leave a comment

  • Your Usable Capacity May Vary – Chuck conducts a thought deployment comparing EMC, HP and NetApp usable space for a 120 disk Exchange deployment. And while he glosses over a couple perhaps non-minor issues (RAID-5 vs RAID-DP and whether EMC’s snapshots are adequately performant), he does hit one of NetApp’s weak spots dead on: Usable capacity, particularly on LUNs if you follow the 100% space reservation recommendation. (Being a NetApp admin these days, I can’t really comment on what he writes about HP – it’s been a long time since I’ve touched that StorageWorks stuff – and I can only repeat what I’ve heard others say about EMC.) More Chuck on this here.
  • How to License Windows VMs in a Non Microsoft Virtual Environment: Why Windows Server 2008 Datacenter Edition may be the best choice. (Seen at blog.scottlowe.org.)
  • Welcome – My friend, NetApp’s Vaughan Stewart: Chad Sakac highlights some flaws in NetApp’s TR-3697 (“Performance Report: Multiprotocol Performance Test of VMware® ESX 3.5 on NetApp Storage Systems”):

    What’s the scoop with:

    * 4K/8K IO size only
    * 2Gbps FC
    * You guys have “throughput/IOPs” shown only in relative, not in absolute.
    * 84 144GB drives with 16 VMs driving the IOMeter workloads with * 10GB of data each on them = 1.3% utilization (rounding up!).

Written by Andy

August 30th, 2008 at 8:47 pm

8/14/2008 Link Dump

leave a comment

Written by Andy

August 14th, 2008 at 2:44 pm

Posted in link dump

Tagged with , , , , ,