Tagged: rhel

November 20, 2009

Interesting Linux VM Crash Pattern

I’ve just begun to pull together some interesting data on a series of Linux VM crashes I’ve seen. I don’t have a resolution yet, but some interesting patterns have emerged.

Crash Symptoms

A CentOS 4.x or 5.x guest will crash with a message similar to the following on its console:

CentOS 4.x:

[<f883b299>] .text.lock.scsi_error+0x19/0x34 [scsi_mod] [<f88c19ce>] mptscsih_io_done+0x5ee/0x608 [mptscsi] (…) [<c02de564>] common_interrupt+0x18/0x20 [<c02ddb54>] system_call+0x0/0x30

CentOS 5.x:

RIP [<ffffffff8014c562>] list_del+0x48/0x71 RSP <ffffffff80425d00> <0>Kernel Panic - not syncing: Fatal exception

A hard reset (i.e. pressing the reset button on the VM’s console) is required to reboot the guest.
Continue reading →

November 19, 2009

Keeping your RHEL VMs from crushing your storage at 4:02am

Running a lot of Red Hat VMs in your virtual infrastructure, on shared storage? CentOS, Scientific Linux, both versions 4 and 5, they count for these purposes; Fedora should likely be included too. Do you have the slocate (version 4.x and earlier) or mlocate (version 5.x) RPMs installed? If you’re uncertain, check using the following:

> rpm -q slocate slocate-2.7-13.el4.8.i386

> rpm -q mlocate mlocate-0.15-1.el5.2.x86_64

If so, multiple RHEL VMs plus mlocate or slocate may be adding up to an array-crushing 4:02am shared storage load and latency spike for you. Before being addressed, this spike was bad enough at my place of employment (when combined with a NetApp Sunday-morning disk scrub) to cause a Windows VM to crash with I/O errors. Ouch.
Continue reading →