<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>thinking sysadmin &#187; iscsi</title>
	<atom:link href="http://andyleonard.com/tag/iscsi/feed/" rel="self" type="application/rss+xml" />
	<link>http://andyleonard.com</link>
	<description>qstat -u aleonard -s z</description>
	<lastBuildDate>Sun, 22 Jan 2012 03:46:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Interesting Linux VM Crash Pattern</title>
		<link>http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/</link>
		<comments>http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 19:09:16 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[virtualization]]></category>
		<category><![CDATA[centos]]></category>
		<category><![CDATA[crash]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[iscsi]]></category>
		<category><![CDATA[kernel panic]]></category>
		<category><![CDATA[mptscsi]]></category>
		<category><![CDATA[netapp]]></category>
		<category><![CDATA[nfs]]></category>
		<category><![CDATA[rhel]]></category>
		<category><![CDATA[vmware]]></category>
		<category><![CDATA[vmware esx]]></category>
		<category><![CDATA[vsmp]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=343</guid>
		<description><![CDATA[I&#8217;ve just begun to pull together some interesting data on a series of Linux VM crashes I&#8217;ve seen. I don&#8217;t have a resolution yet, but some interesting patterns have emerged. Crash Symptoms A CentOS 4.x or 5.x guest will crash with a message similar to the following on its console: CentOS 4.x: [&#60;f883b299&#62;] .text.lock.scsi_error+0x19/0x34 [scsi_mod] [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just begun to pull together some interesting data on a series of Linux VM crashes I&#8217;ve seen.  I don&#8217;t have a resolution yet, but some interesting patterns have emerged.</p>
<p><strong>Crash Symptoms</strong></p>
<p>A CentOS 4.x or 5.x guest will crash with a message similar to the following on its console:</p>
<p>CentOS 4.x:</p>
<p><code>[&lt;f883b299&gt;] .text.lock.scsi_error+0x19/0x34 [scsi_mod]<br />
[&lt;f88c19ce&gt;] mptscsih_io_done+0x5ee/0x608 [mptscsi] (…)<br />
[&lt;c02de564&gt;] common_interrupt+0x18/0x20<br />
[&lt;c02ddb54&gt;] system_call+0x0/0x30</code></p>
<p>CentOS 5.x:</p>
<p><code>RIP  [&lt;ffffffff8014c562&gt;] list_del+0x48/0x71 RSP &lt;ffffffff80425d00&gt; &lt;0&gt;Kernel Panic - not syncing: Fatal exception</code></p>
<p>A hard reset (i.e. pressing the reset button on the VM&#8217;s console) is required to reboot the guest.<br />
<span id="more-343"></span><br />
<strong>Further Details</strong></p>
<p>Five different VMs have encountered this issue, running at a mix of close-to-current CentOS 4.x and 5.x patch levels.  Guest kernel versions when the crash occurred were 2.6.18-128.7.1.el5 and 2.6.18-128.1.10.el5 (5.x) and 2.6.9-89.0.9.ELsmp (4.x).  Memory allocations on affected guests range from 512MB to 3072MB.  Notably, all affected VMs are using SMP &#8211; each has 2 vCPUs &#8211; having been created before our in-house practices followed <a href="http://blogs.vmware.com/performance/2008/06/esx-scheduler-s.html">VMware guidelines</a> and discouraged use of SMP on ESX guests when unnecessary.  One VM was created via P2V; the rest were created <em>de novo</em> on virtual hardware.</p>
<p>All crashes have happened on a single node in an ESX 3.5 HA cluster composed of four Dell PowerEdge 1950s.  ESX hosts have tracked the latest VMware patches closely.  COS memory on the ESX host in question was increased from the default to 800MB prior to the three most recent crashes; in other words, the COS memory increase appears to have had no effect on the crashes.  DRS is in use, set to &#8220;fully automated&#8221; and &#8220;apply recommendations with three or more stars&#8221; and no virtual machine rules have been created to control DRS host placement.</p>
<p>All guests are on the same NFS data store, served from a NetApp filer running ONTAP 7.2.x.  One guest had its vmswap placed separately on an iSCSI data store; the rest have their swap stored on NFS with the VM.  No log messages were seen on the filer during the event, although the a log message similar to the following has been seen several times on the ESX host:</p>
<p><code>vmkernel: 43:07:27:51.725 cpu2:2185)WARNING: NFS: 4590: Can't find call with serial number -2146566055</code></p>
<p>Curiously, all crashes have happened in the evening, in the 10 o&#8217;clock hour, after nightly backups have been completed.  Backups are created using a combination of VMware and NetApp snapshots via a script similar to one detailed on <a href="http://vmwaretips.com/wp/2008/12/05/netapp-snapshots-in-esx-take-2/">vmwaretips.com</a>.  No substantial load or latency has been recorded on the NetApp during the crashes, and weeks have passed between events.</p>
<p><strong>Speculation</strong></p>
<p>Explanations I&#8217;m leaning towards, ranked by my judgment of their likelihood:</p>
<p>1) <strong>Hardware issue.</strong>  Assuming a random distribution of VMs &#8211; recall that DRS is in use and no virtual machine rules are in place &#8211; the odds of all five crashes happening on one host out of four are slim: 1 in 1024.  Unfortunately, by all measures we&#8217;ve used, including the VI Client&#8217;s &#8220;Health Status&#8221; and Dell OMSA, there are no hardware issues with the host.</p>
<p>Further, the distribution of VMs is not truly random.  DRS migrations are infrequent in this cluster, and the largest determinant of guest location is migration following hosts being placed into maintenance mode for patching.</p>
<p>If it is a hardware issue, it&#8217;s subtle, and possibly only brought to the fore by the following issues.</p>
<p>2) <strong>Red Hat Enterprise Linux bug</strong> &#8211; which, by extension, is typically equivalent to a CentOS bug.  In fact, this issue appears to have been raised with Red Hat already in bugs <a href="https://bugzilla.redhat.com/show_bug.cgi?id=197158">197158</a> and <a href="https://bugzilla.redhat.com/show_bug.cgi?id=228108">228108</a> &#8211; but, according the bug reports, the issue is resolved, and the patches have since been ported downstream to CentOS.  However, perhaps the issue is not truly resolved &#8211; see <a href="https://bugzilla.redhat.com/show_bug.cgi?id=228108#c35">comment 35</a> in 228108.</p>
<p>3) <strong>vSMP Bug.</strong>  The majority of our Linux VMs are uniprocessor and appear so far to be immune to this issue; it is striking that the crash has only occurred on dual processor guests.  I cannot articulate a mechanism for multiple vCPUs causing this crash, however.</p>
<p>4) <strong>NetApp issue.</strong>  This appears to be a storage issue at some level, considering the mptscsi and NFS messages noted above, so performance of the NetApp filer would be a natural place for further investigation.  However, we monitor the performance of our filer relatively closely, using the ONTAP SDK and Cacti, and nothing unusual was recorded during any crash.  It seems unusual that all VMs reside on the same data store, but that data store shares an aggregate with multiple other unaffected data stores, and several LUNs are served from the same aggregate to non-ESX machines without complaint.</p>
<p>I have not yet opened a case with VMware on this issue &#8211; or Dell, or NetApp, for that matter &#8211; but if and when I do, I&#8217;ll update here to the extent possible.</p>
<p><strong>Update 11/20/2009:</strong> Prompted by a helpful comment from nate below, I looked up and verified the NFS settings across the cluster.  They are the same across all hosts, and are as follows:</p>
<p><code>NFS.IndirectSend 0<br />
NFS.DiskFileLockUpdate 10<br />
NFS.LockUpdateTimeout 5<br />
NFS.LockRenewMaxFailureNumber 3<br />
NFS.LockDisable 0<br />
NFS.HeartbeatFrequency 12<br />
NFS.HeartbeatTimeout 5<br />
NFS.HeartbeatDelta 5<br />
NFS.HeartbeatMaxFailures 10<br />
NFS.MaxVolumes 8<br />
NFS.SendBufferSize 264<br />
NFS.ReceiveBufferSize 128<br />
NFS.VolumeRemountFrequency 30<br />
NFS.UDPRetransmitDelay 700</code></p>
<p>The only values that are changed from default are HeartbeatFrequency and HeartbeatMaxFailures, to match NetApp&#8217;s recommendations in <a href="http://media.netapp.com/documents/tr-3428.pdf">TR-3428</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2009/11/20/interesting-linux-vm-crash-pattern/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>First Thoughts about Fishworks</title>
		<link>http://andyleonard.com/2008/11/11/first-thoughts-about-fishworks/</link>
		<comments>http://andyleonard.com/2008/11/11/first-thoughts-about-fishworks/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 00:20:37 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[storage]]></category>
		<category><![CDATA[adm]]></category>
		<category><![CDATA[comstar]]></category>
		<category><![CDATA[fishworks]]></category>
		<category><![CDATA[hsm]]></category>
		<category><![CDATA[iscsi]]></category>
		<category><![CDATA[nfs]]></category>
		<category><![CDATA[sun]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=113</guid>
		<description><![CDATA[With surprisingly little buzz (outside of sun.com) &#8211; must be that darned economy &#8211; Sun launched its new Fishworks product line yesterday: Three hardware products, several of them with flash drives, and an impressive looking user interface, which appears at first glace to surpass anything NetApp offers. Here&#8217;s a quick rundown of features from Mike [...]]]></description>
			<content:encoded><![CDATA[<p>With surprisingly little buzz (outside of sun.com) &#8211; must be that darned economy &#8211; Sun launched its new Fishworks product line yesterday: <a href="http://www.sun.com/7110/">Three</a> <a href="http://www.sun.com/7210/">hardware</a> <a href="http://www.sun.com/7410/">products</a>, several of them with flash drives, and an impressive looking <a href="http://www.sun.com/storage/disk_systems/unified_storage/features.jsp">user interface</a>, which appears at first glace to surpass anything NetApp offers.  Here&#8217;s a quick rundown of features from Mike Shapiro on <a href="http://blogs.sun.com/mws/">blogs.sun.com</a>:</p>
<ul>
<li>NFS v3 and v4</li>
<li>CIFS</li>
<li>iSCSI</li>
<li>HTTP</li>
<li>WebDAV</li>
<li>FTP</li>
<li>RAID-Z (RAID-5 and RAID-6), Mirrored, and Striped disk configurations</li>
<li>Unlimited Read-only and Read-write Snapshots, with Snapshot Schedules</li>
<li>Built-in Data Compression</li>
<li>Remote Replication of data for Disaster Recovery</li>
<li>Active-Active Clustering (in the Sun Storage 7410) for High Availability</li>
<li>Thin Provisioning of iSCSI LUNs</li>
<li>Virus Scanning and Quarantine</li>
<li>NDMP Backup and Restore</li>
</ul>
<p>A few comments: Looks like all of the usual ZFS features are there, with a few additions &#8211; in particular, I wasn&#8217;t aware that the virus scanning project existed, and I didn&#8217;t know that NDMP was far enough along to be included in a production release.  Additionally, from looking at various Sun blogs, I believe that the remote replication feature is zfs send/recv, not <a href="http://opensolaris.org/os/project/avs/">AVS</a>.  Finally, from the nomenclature (&#8220;2008.11&#8243;), I&#8217;d guess that the software is based on the forthcoming release of OpenSolaris, not the recently released update to Solaris 10.<br />
<span id="more-113"></span><br />
What&#8217;s missing?  Off the top of my head:</p>
<ul>
<li>Fibre Channel &#8211; <a href="http://opensolaris.org/os/project/comstar/">COMSTAR</a> is coming, presumably.</li>
<li>HSM &#8211; <a href="http://opensolaris.org/os/project/adm/">ADM</a> is also presumably on its way in a future release.</li>
<li>HCL entries for various products like VMware, but again, I have to believe that Sun is working hard on this as well.</li>
</ul>
<p>My first impression from the launch materials: Neat, but the price seems high.  Looking at list prices for the models, and doing some quick calculations for RAID-Z2 configurations with at least one hot spare, the price per usable TB ranges from $3999 and $3933 for a 7210 with 250GB and 1TB drives, respectively, to $11,209 for a single head 7410.  Compare this to the hardware that the 7210/250GB is based on, the X4540, where you pay $2513.71 per usable TB.  Now, as far as I know, Sun isn&#8217;t offering flash drives with their non-Fishworks hardware, so it makes direct comparisons of the price of the Fishworks Special Sauce impossible for most of the rest of the line, but that may change later this fall.</p>
<p>Other thoughts: Why not use the UltraSPARC T2 instead of Opterons?  I&#8217;d expect better performance from the UltraSPARCs, especially when using 10GbE.  Is it a cost issue?</p>
<p>One final note: Sun is making their simulator (a VMware image) available for <a href="http://www.sun.com/storage/disk_systems/unified_storage/resources.jsp">download</a> &#8211; nice touch.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/11/11/first-thoughts-about-fishworks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>8/14/2008 Link Dump</title>
		<link>http://andyleonard.com/2008/08/14/8142008-link-dump/</link>
		<comments>http://andyleonard.com/2008/08/14/8142008-link-dump/#comments</comments>
		<pubDate>Thu, 14 Aug 2008 21:44:23 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[link dump]]></category>
		<category><![CDATA[esx]]></category>
		<category><![CDATA[fibre channel]]></category>
		<category><![CDATA[iscsi]]></category>
		<category><![CDATA[netapp]]></category>
		<category><![CDATA[nfs]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=62</guid>
		<description><![CDATA[Performance Report: Multiprotocol Performance Test of VMware® ESX 3.5 on NetApp Storage Systems: A complementary whitepaper to VMware&#8217;s own work comparing Fibre Channel, iSCSI and NFS as storage protocols for VMware ESX. (Seen at blog.scottlowe.org.)]]></description>
			<content:encoded><![CDATA[<ul>
<li><a href="http://media.netapp.com/documents/tr-3697.pdf">Performance Report: Multiprotocol Performance Test of VMware® ESX 3.5 on NetApp Storage Systems</a>: A complementary whitepaper to <a href="/2008/02/08/vmwares-comparison-of-storage-protocol-performance/">VMware&#8217;s own work comparing Fibre Channel, iSCSI and NFS</a> as storage protocols for VMware ESX.  (Seen at <a href="http://blog.scottlowe.org/2008/08/14/storage-protocol-performance-whitepaper-from-netapp/">blog.scottlowe.org</a>.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/08/14/8142008-link-dump/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Link Dump, 7/17/2008</title>
		<link>http://andyleonard.com/2008/07/17/link-dump-7172008/</link>
		<comments>http://andyleonard.com/2008/07/17/link-dump-7172008/#comments</comments>
		<pubDate>Thu, 17 Jul 2008 19:50:44 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[link dump]]></category>
		<category><![CDATA[avs]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[comstar]]></category>
		<category><![CDATA[iscsi]]></category>
		<category><![CDATA[it consumerization]]></category>
		<category><![CDATA[miasma computing]]></category>
		<category><![CDATA[ndmp]]></category>
		<category><![CDATA[opensolaris]]></category>
		<category><![CDATA[sam-qfs]]></category>
		<category><![CDATA[sco]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=49</guid>
		<description><![CDATA[Elektronkind: OpenSolaris 2008.11 &#8211; A Preview For The Storage Admin &#8211; A look at upcoming storage technologies in OpenSolaris 2008.11, including ZFS, iSCSI, NDMP, COMSTAR, AVS and SAM-QFS. These products really set OpenSolaris apart from Linux distributions, although I wonder how official this list is, and have some doubts about the status of some of [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li><a href="http://elektronkind.org/2008/07/opensolaris-2008-11-storage">Elektronkind: OpenSolaris 2008.11 &#8211; A Preview For The Storage Admin</a> &#8211; A look at upcoming storage technologies in OpenSolaris 2008.11, including ZFS, iSCSI, NDMP, COMSTAR, AVS and SAM-QFS.  These products really set OpenSolaris apart from Linux distributions, although I wonder how official this list is, and have some doubts about the status of some of the projects.  For example, there doesn&#8217;t appear to be much activity on the <a href="http://opensolaris.org/os/project/samqfs/">SAM-QFS</a> OpenSolaris project, although maybe I&#8217;m just looking in the wrong place.  (Seen at <a href="http://www.c0t0d0s0.org/archives/4638-The-upcoming-Opensolaris-2008.11-for-the-storage-admin.html">c0t0d0s0.org</a>.)</li>
<li><a href="http://arstechnica.com/news.ars/post/20080716-ruling-sco-owes-novell-2-54-million-from-sco-sun-svrx-deal.html">Ruling: SCO owes Novell $2.54 million from SCO-Sun SVRX deal</a> &#8211; Interesting excerpt: &#8220;Judge Kimball also reviewed SCO&#8217;s agreement with Sun and found that some of the terms exceeded SCO&#8217;s licensing authority. Through the agreement, SCO lifted the confidentiality provisions of Sun&#8217;s 1994 SVRX deal with Novell even though SCO was not permitted to do so without Novell&#8217;s explicit consent. The judge concluded that lifting of the SVRX confidentiality provisions was not incidental to a UnixWare license and was consequently not permissible. This raises some intriguing legal questions about OpenSolaris, which includes SVRX code that we now know SCO clearly had no right to let Sun open.&#8221;  I wonder if we&#8217;ll be hearing more about this in the coming months.</li>
<li><a href="http://arstechnica.com/articles/culture/interview-it-consumerization.ars">Interview: IT consumerization and the future of higher ed</a> &#8211; Another interesting piece on Ars Technica from today, an interview with Oren Sreebny of the University of Washington, whose best bits obliquely refer to the challenges of <a href="http://news.bbc.co.uk/2/hi/technology/7421099.stm">miasma computing</a> and information security.  Quotes: &#8220;Lately we&#8217;ve been looking at Google and Microsoft offerings for commodity stuff, and one of the things we deal with in some of our research [departments] is government regulations about &#8216;exporting munitions.&#8217; So one of the manifestations of those government regulations is that you cannot store your data outside the US if you&#8217;re working on some types of government-funded projects.  Google has said, &#8216;We can&#8217;t guarantee that anybody&#8217;s stuff in particular won&#8217;t be in a datacenter that&#8217;s located outside the US, so don&#8217;t bring that stuff to us,&#8217; which is exactly what I&#8217;d be saying if I was them. So we have to figure out, as we start to move in those directions, what we do about that.&#8221;  Also: &#8220;[Separate identity principals for people who are working on sensitive data] is an interesting conversation because, in many ways we&#8217;ve spent the last decade trying to integrate people&#8217;s identity, and do single-sign-on, and not make them have lots of separate accounts in separate places. And in many ways it really goes against the grain to step back from that, but maybe it&#8217;s time to do that.&#8221;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/07/17/link-dump-7172008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VMware&#8217;s Comparison of Storage Protocol Performance</title>
		<link>http://andyleonard.com/2008/02/08/vmwares-comparison-of-storage-protocol-performance/</link>
		<comments>http://andyleonard.com/2008/02/08/vmwares-comparison-of-storage-protocol-performance/#comments</comments>
		<pubDate>Sat, 09 Feb 2008 04:18:49 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[storage]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[esx]]></category>
		<category><![CDATA[iscsi]]></category>
		<category><![CDATA[netapp]]></category>
		<category><![CDATA[nfs]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://andyleonard.com/2008/02/08/vmwares-comparison-of-storage-protocol-performance/</guid>
		<description><![CDATA[VMware has just released a paper entitled Comparison of Storage Protocol Performance (seen at Scale the Mind and blog.scottlowe.org); maybe this will help deflate some of the too-often repeated speculation that NFS is too slow for VMware ESX. VMware&#8217;s findings match well with what I&#8217;ve seen. On some in-house application-specific benchmarking that I&#8217;ve done, I [...]]]></description>
			<content:encoded><![CDATA[<p>VMware has just released a paper entitled <a href="http://www.vmware.com/files/pdf/storage_protocol_perf.pdf">Comparison of Storage Protocol Performance</a> (seen at <a href="http://scalethemind.com/2008/02/vmware-releases-storage-protocol-performance-white-paper/">Scale the Mind</a> and <a href="http://blog.scottlowe.org/2008/02/08/virtualization-short-take-1/">blog.scottlowe.org</a>); maybe this will help deflate some of the too-often repeated speculation that NFS is too slow for VMware ESX.<br />
<span id="more-15"></span><br />
VMware&#8217;s findings match well with what I&#8217;ve seen.  On some in-house application-specific benchmarking that I&#8217;ve done, I actually saw overall better performance with an NFS datastore than with a software iSCSI datastore on the same filer.  I won&#8217;t get into details, because the benchmark was specific to us (and I&#8217;d probably need a lawyer to review the EULAs before &#8220;publishing&#8221; anything&#8230;), but NFS was equal to or slightly faster than iSCSI across the board on this specific set of tests in our specific environment.  Given the management and deployment advantages of NFS, that&#8217;s huge.</p>
<p>Of course, I wouldn&#8217;t recommend basing your whole ESX/NetApp deployment strategy off of one unsubstantiated benchmark-related post on my blog; if you&#8217;re using ESX with NetApp storage, I would strongly recommend testing NFS datastores if you haven&#8217;t already, though.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/02/08/vmwares-comparison-of-storage-protocol-performance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

