<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Interesting Linux VM Crash Pattern</title>
	<atom:link href="http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/feed/" rel="self" type="application/rss+xml" />
	<link>http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/</link>
	<description>qstat -u aleonard -s z</description>
	<lastBuildDate>Mon, 11 Jan 2010 17:10:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Andy</title>
		<link>http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/comment-page-1/#comment-257</link>
		<dc:creator>Andy</dc:creator>
		<pubDate>Fri, 20 Nov 2009 23:37:48 +0000</pubDate>
		<guid isPermaLink="false">http://andyleonard.com/?p=343#comment-257</guid>
		<description>I&#039;ve posted the NFS settings above; unfortunately, they don&#039;t appear to deviate from default/recommended settings.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve posted the NFS settings above; unfortunately, they don&#8217;t appear to deviate from default/recommended settings.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy</title>
		<link>http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/comment-page-1/#comment-256</link>
		<dc:creator>Andy</dc:creator>
		<pubDate>Fri, 20 Nov 2009 23:13:20 +0000</pubDate>
		<guid isPermaLink="false">http://andyleonard.com/?p=343#comment-256</guid>
		<description>Thanks for the input and suggestions, Nate - much appreciated.

I failed to mention in the post that we are running another cluster of ESX 3.5 hosts off the same filer (and same aggregate) using a Fibre Channel LUN without issue, FWIW.  Guests on that cluster include Linux VMs, but all without vSMP.  That suggests it&#039;s not the filer itself, but it could still be the storage protocol, or the host hardware, or vSMP.

(And now that I&#039;ve written that, I&#039;m curious about verifying the VMware NFS settings on the problem host.  I&#039;ll post an update on what I find there.)</description>
		<content:encoded><![CDATA[<p>Thanks for the input and suggestions, Nate &#8211; much appreciated.</p>
<p>I failed to mention in the post that we are running another cluster of ESX 3.5 hosts off the same filer (and same aggregate) using a Fibre Channel LUN without issue, FWIW.  Guests on that cluster include Linux VMs, but all without vSMP.  That suggests it&#8217;s not the filer itself, but it could still be the storage protocol, or the host hardware, or vSMP.</p>
<p>(And now that I&#8217;ve written that, I&#8217;m curious about verifying the VMware NFS settings on the problem host.  I&#8217;ll post an update on what I find there.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nate</title>
		<link>http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/comment-page-1/#comment-254</link>
		<dc:creator>nate</dc:creator>
		<pubDate>Fri, 20 Nov 2009 22:54:32 +0000</pubDate>
		<guid isPermaLink="false">http://andyleonard.com/?p=343#comment-254</guid>
		<description>just for reference, I run about 100 VMs with 2.6.18-128.1.10.el5 on CentOS 5.2 (5.3 kernel on 5.2 distro) with the SMP kernel though there is only 1 vCPU per VM. All of them are local storage on Dell R610. Also have about a dozen VMs on the same kernel running with 2 vCPUs.

I run about 40 physical servers on the same kernel on native R610 with 8 cores no panics there either.

I run the SMP kernel on everything even if it&#039;s only 1 vCPU, because you never know if you may need to upgrade to 2 or more CPUs and if you do I don&#039;t want to have to change kernels. Though now that I think about it I think RHEL/CentOS 5.x kernels are all SMP now vs on 4.x where they had SMP and UP kernels, but not 100% sure.

Never had a panic of any sort on any of them.

To me your problem looks related to storage, so I would look to NetApp or VMware&#039;s NFS stuff. I would also consider hooking at least one host via Fiber or iSCSI and run some of the VMs off of that and see what you get.

My own infrastructure is split into two classes, we have our edge web servers which are at several physical locations, all of them run off of local storage. The other class is most of our back end stuff or QA or internal IT, of which it&#039;s all fiber channel connected(most of it is boot from SAN, with the exception of some older ESXi 3.5 systems that don&#039;t support that), connected to our 3PAR T400 storage system.

in case it helps you in your tracing of the issue..</description>
		<content:encoded><![CDATA[<p>just for reference, I run about 100 VMs with 2.6.18-128.1.10.el5 on CentOS 5.2 (5.3 kernel on 5.2 distro) with the SMP kernel though there is only 1 vCPU per VM. All of them are local storage on Dell R610. Also have about a dozen VMs on the same kernel running with 2 vCPUs.</p>
<p>I run about 40 physical servers on the same kernel on native R610 with 8 cores no panics there either.</p>
<p>I run the SMP kernel on everything even if it&#8217;s only 1 vCPU, because you never know if you may need to upgrade to 2 or more CPUs and if you do I don&#8217;t want to have to change kernels. Though now that I think about it I think RHEL/CentOS 5.x kernels are all SMP now vs on 4.x where they had SMP and UP kernels, but not 100% sure.</p>
<p>Never had a panic of any sort on any of them.</p>
<p>To me your problem looks related to storage, so I would look to NetApp or VMware&#8217;s NFS stuff. I would also consider hooking at least one host via Fiber or iSCSI and run some of the VMs off of that and see what you get.</p>
<p>My own infrastructure is split into two classes, we have our edge web servers which are at several physical locations, all of them run off of local storage. The other class is most of our back end stuff or QA or internal IT, of which it&#8217;s all fiber channel connected(most of it is boot from SAN, with the exception of some older ESXi 3.5 systems that don&#8217;t support that), connected to our 3PAR T400 storage system.</p>
<p>in case it helps you in your tracing of the issue..</p>
]]></content:encoded>
	</item>
</channel>
</rss>
