<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>thinking sysadmin &#187; virtualization</title>
	<atom:link href="http://andyleonard.com/tag/virtualization/feed/" rel="self" type="application/rss+xml" />
	<link>http://andyleonard.com</link>
	<description>qstat -u aleonard -s z</description>
	<lastBuildDate>Tue, 28 Feb 2012 04:47:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Keeping your RHEL VMs from crushing your storage at 4:02am</title>
		<link>http://andyleonard.com/2009/11/19/keeping-your-rhel-vms-from-crushing-your-storage-at-402am/</link>
		<comments>http://andyleonard.com/2009/11/19/keeping-your-rhel-vms-from-crushing-your-storage-at-402am/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 19:39:30 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[operating systems]]></category>
		<category><![CDATA[centos]]></category>
		<category><![CDATA[locate]]></category>
		<category><![CDATA[mlocate]]></category>
		<category><![CDATA[rhel]]></category>
		<category><![CDATA[scientific linux]]></category>
		<category><![CDATA[slocate]]></category>
		<category><![CDATA[updatedb]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=315</guid>
		<description><![CDATA[Running a lot of Red Hat VMs in your virtual infrastructure, on shared storage? CentOS, Scientific Linux, both versions 4 and 5, they count for these purposes; Fedora should likely be included too. Do you have the slocate (version 4.x and earlier) or mlocate (version 5.x) RPMs installed? If you&#8217;re uncertain, check using the following: [...]]]></description>
			<content:encoded><![CDATA[<p>Running a lot of Red Hat VMs in your virtual infrastructure, on shared storage?  CentOS, Scientific Linux, both versions 4 and 5, they count for these purposes; Fedora should likely be included too.  Do you have the slocate (version 4.x and earlier) or mlocate (version 5.x) RPMs installed?  If you&#8217;re uncertain, check using the following:</p>
<p><code>> rpm -q slocate<br />
slocate-2.7-13.el4.8.i386</code></p>
<p>or</p>
<p><code>> rpm -q mlocate<br />
mlocate-0.15-1.el5.2.x86_64</code></p>
<p>If so, multiple RHEL VMs plus mlocate or slocate may be adding up to an array-crushing 4:02am shared storage load and latency spike for you.  Before being addressed, this spike was bad enough at my place of employment (when combined with a NetApp Sunday-morning disk scrub) to cause a Windows VM to crash with I/O errors.  Ouch.<br />
<span id="more-315"></span><br />
<strong>Details and ideas for resolution:</strong></p>
<p>By default, a line in /etc/crontab runs the scripts within /etc/cron.daily at 4:02am each morning:</p>
<p><code>02 4 * * * root run-parts /etc/cron.daily</code></p>
<p>One of those scripts &#8211; mlocate.cron or slocate.cron, depending on your OS version &#8211; launches updatedb; as the man page says, &#8220;updatedb  creates  or  updates  a  database  used by locate(1).&#8221;  (The &#8220;locate&#8221; binary is a filesystem search tool, see &#8220;man locate&#8221; for more information.)  Updatedb refreshes its database by walking the filesystem, generating a fair amount of I/O on a single system.  Imagine upwards of thirty of these running in parallel through VMDKs on one shared storage system carrying out internal maintenance at the same time, and you&#8217;re pretty much picturing the problem my employer had.</p>
<p>I see <strong>three options</strong> for addressing this issue:</p>
<p><strong>1) Uninstall mlocate or slocate.</strong>  If you don&#8217;t currently use &#8220;locate&#8221; and you&#8217;re not interested in learning to use a tool that will likely make you more effective at your job (again, see &#8220;man locate&#8221;), this is probably the best option.  (Yeah, I know, people that fit this bill generally don&#8217;t read blogs more technical than <a href="http://perezhilton.com/">this one</a>, so I could probably have skipped it here.  Consider it an option for completeness, or if you really need to strip down an install.)</p>
<p><strong>2) Disable the scheduled job by removing mlocate.cron or slocate.cron from /etc/cron.daily.</strong>  This keeps locate available for your use, but requires that you update locate&#8217;s database ad-hoc and interactively by running the following as root:</p>
<p><code># updatedb</code></p>
<p>This will take a few minutes to return, depending on the size of your file systems.</p>
<p>I don&#8217;t recommend this option either; at least it doesn&#8217;t fit the way I work.  I often find myself using locate in high-pressure situations in which I need to quickly get a file location on a system.  Waiting minutes for updatedb to return is extra painful when every second counts.</p>
<p><strong>3) Stagger when updatedb runs by inserting a random delay into the script.</strong>.  This is my preferred alternative; locate&#8217;s database is kept current automatically, and your storage doesn&#8217;t have to bear a sudden spike in load.  I implemented this by adding the lines in <strong>bold</strong> (lines 2-7 if your browser doesn&#8217;t display the bold text clearly): </p>
<p><code>#!/bin/sh<br />
<strong># sleep up to two hours before launching job:<br />
value=$RANDOM<br />
while [ $value -gt 7200 ] ; do<br />
  value=$RANDOM<br />
done<br />
sleep $value</strong><br />
nodevs=$(< /proc/filesystems awk '$1 == "nodev" { print $2 }')<br />
renice +19 -p $$ >/dev/null 2>&#038;1<br />
/usr/bin/updatedb -f "$nodevs"<br />
</code></p>
<p>The added code inserts a pseudo-random sleep delay of up to two hours before updatedb runs, with the key being the built-in Bash function <a href="http://tldp.org/LDP/abs/html/randomvar.html">$RANDOM</a>.  In our environment, this removed a 2000 IOPS spike at 4:02am, and eliminated a corresponding jump in filer latency.  Obviously, adjust the delay period as appropriate for your environment.  Additionally, be sure to add this change to your configuration management or installation management tools so that all of your RHEL and RHEL-derived VMs get the updated script.</p>
<p>Using $RANDOM to avoid this variant of the <a href="http://en.wikipedia.org/wiki/Thundering_herd_problem">thundering herd problem</a> also works nicely for a range of similar problems; I believe I first saw it at <a href="http://www.moundalexis.com/archives/000076.php">Moundalexis.com</a>.</p>
<p>(This problem may apply to other Linux distributions being run as VMs, and FreeBSD does something equivalent &#8211; weekly &#8211; with /etc/periodic/weekly/310.locate.  A similar solution can be applied to these environments, if necessary.)</p>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2009/11/19/keeping-your-rhel-vms-from-crushing-your-storage-at-402am/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>VMware/NFS/NetApp SnapRestore/Linux LVM Single File Recovery Notes</title>
		<link>http://andyleonard.com/2009/06/01/vmwarenfsnetapp-snaprestorelinux-lvm-single-file-recovery-notes/</link>
		<comments>http://andyleonard.com/2009/06/01/vmwarenfsnetapp-snaprestorelinux-lvm-single-file-recovery-notes/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 21:55:54 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[virtualization]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[netapp]]></category>
		<category><![CDATA[nfs]]></category>
		<category><![CDATA[snaprestore]]></category>
		<category><![CDATA[vmware]]></category>
		<category><![CDATA[vmware esx]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=241</guid>
		<description><![CDATA[There have been a few posts elsewhere discussing file-level recovery for Linux VMs on NetApp NFS datastores, but none that have dealt specifically with Linux LVM-encapsulated partitions. Here&#8217;s our in-house procedure for recovery; note that we do not have FlexClone licensed on our filers. Prerequisites An existing VMware ESX infrastructure, connected to a NetApp filer [...]]]></description>
			<content:encoded><![CDATA[<p>There have been a few posts <a href="http://storagefoo.blogspot.com/2007/10/vmware-over-nfs-backup-trickscontinued.html">elsewhere</a> discussing file-level recovery for Linux VMs on NetApp NFS datastores, but none that have dealt specifically with Linux LVM-encapsulated partitions.</p>
<p>Here&#8217;s our in-house procedure for recovery; note that we do not have FlexClone licensed on our filers.<br />
<span id="more-241"></span><br />
<strong>Prerequisites</strong></p>
<ul>
<li>An existing VMware ESX infrastructure, connected to a NetApp filer NFS datastore; SnapRestore speeds the recovery process but is not mandatory &#8211; see discussion below.</li>
<li>A backup script or system which coordinates VMware snapshots with NetApp snapshots &#8211; perhaps something along the lines of <a href="http://vmwaretips.com/wp/2008/12/05/netapp-snapshots-in-esx-take-2/">Rick Scherer&#8217;s script</a>.</li>
<li>A dedicated Linux restore VM, at a similar version level to the rest of your Linux VM infrastructure.  This VM should have LVM support, but <em>should not have any volume groups (VGs) or logical volumes (LVs) configured</em> &#8211; volume group and logical volume names on the VMDK you are restoring from must not conflict with VGs and LVs already in use on the restore system; the simplest way to guarantee this is to simply not have any VGs or LVs.</li>
</ul>
<p><strong>Restore Procedure</strong></p>
<ul>
<li>Restore the VMDK file from the appropriate snapshot to a <em>new location</em> in the datastore.  With SnapRestore, this can be done as follows (one line in the filer CLI, restoring from snapshot sv_daily.0 to a new file &#8211; again, <strong>be extremely careful not to overwrite the current version of the VMDK in your datastore</strong>, consider restoring to an entirely different directory in the FlexVol):
<p><code>snap restore -t file -s sv_daily.0<br />
-r /vol/vmware04_sis/system.example.com/system.example.com-restore.vmdk<br />
/vol/vmware04_sis/system.example.com/system.example.com.vmdk</code></p>
<p>Follow the prompts, verifying the restore path is correct and is not the path to your existing VMDK.  Do the same for the flat VMDK file (again, one line, and, as before, <strong>use caution to make sure you do not clobber an existing file</strong>):</p>
<p><code>snap restore -t file -s sv_daily.0<br />
-r /vol/vmware04_sis/system.example.com/system.example.com-restore-flat.vmdk<br />
/vol/vmware04_sis/system.example.com/system.example.com-flat.vmdk</code></p>
<p>Without SnapRestore, you can simply mount the NFS export of the datastore on a Linux machine and use &#8220;cp&#8221; to copy the files out of the snapshot.  For flat VMDK files, expect this copy to run for a substantial amount of time compared the nearly-instant recovery SnapRestore offers.</li>
<li>Manually edit the line below &#8220;# Extent description&#8221; in the recovered .vmdk file to match the path to the recovered flat VMDK.  In this case, it would look something like this:<br />
<code># Extent description<br />
RW 20971520 VMFS "system.example.com-restore-flat.vmdk"</code></li>
<li>Attach the recovered VMDK to your powered-off restore host.  Boot the restore host.</li>
<li>Once your restore host is up, use &#8220;pvscan&#8221;, &#8220;vgscan&#8221; and &#8220;lvscan&#8221; (each without arguments) as root to examine available LVM components.  Then, use the &#8220;lvchange&#8221; command to activate the necessary volume group (in this case, &#8220;VolGroup00&#8243;):<br />
<code># lvchange -ay VolGroup00</code></li>
<li>Mount the appropriate logical volume &#8211; for example, LogVol00 in VolGroup00:<br />
<code>mount -o ro /dev/VolGroup00/LogVol00 /mnt</code><br />
Restore files by copying them out of /mnt.</li>
</ul>
<p><strong>Cleanup</strong></p>
<ul>
<li>Shut down the Linux restore host.</li>
<li>Remove the recovery VMDK &#8211; the files restored with SnapRestore or by &#8220;cp&#8221; above &#8211; from the restore host in the VMware Infrastructure Client.</li>
<li>Delete the recovery .vmdk and -flat.vmdk files in the NFS datastore.  <strong>Don&#8217;t screw up here: Be sure to delete the recovery files only, not the working VMDK.</strong></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2009/06/01/vmwarenfsnetapp-snaprestorelinux-lvm-single-file-recovery-notes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts on a VMware ESX Deployment</title>
		<link>http://andyleonard.com/2007/12/16/thoughts-on-a-vmware-esx-deployment/</link>
		<comments>http://andyleonard.com/2007/12/16/thoughts-on-a-vmware-esx-deployment/#comments</comments>
		<pubDate>Sun, 16 Dec 2007 15:32:41 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[virtualization]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://andyleonard.com/2007/12/16/thoughts-on-a-vmware-esx-deployment/</guid>
		<description><![CDATA[Given my last post, it&#8217;s pretty ironic that I&#8217;m right now in the middle of a VMware ESX deployment at work. VMware seems to have this reality distortion field around it that makes tech management think that &#8211; despite its substantial overhead &#8211; it&#8217;s the only &#8220;real&#8221; virtualization product out there: The rest are just [...]]]></description>
			<content:encoded><![CDATA[<p>Given my last post, it&#8217;s pretty ironic that I&#8217;m right now in the middle of a VMware ESX deployment at work.   VMware seems to have this reality distortion field around it that makes tech management think that &#8211; despite its substantial overhead &#8211; it&#8217;s the only &#8220;real&#8221; virtualization product out there: The rest are just hacks, so we bought VMware.  Now that I&#8217;m knee-deep in working with it, a few other thoughts:<br />
<span id="more-11"></span></p>
<ul>
<li>Wow, the VI client is really buggy and slow.  It crashes for me at least once a day.  We haven&#8217;t upgraded from 3.0.2 to 3.5 yet, hopefully some of this is fixed in the new version.</li>
<li>There also seems to be an IP storage bug on reboot where ESX server doesn&#8217;t reconnect to iSCSI or NFS storage &#8211; possibly something to do with a transient name resolution or default route problem in ESX.  I haven&#8217;t gotten any good suggestions out of VMware support about this, but the case is still open.</li>
<li>VMware does a good job of pricing ESX so it&#8217;s expensive but affordable for non-profits.</li>
</ul>
<p>All in all, I have to admit, I&#8217;m pretty pleased with ESX so far, but I also know that the honeymoon isn&#8217;t over &#8211; we only have one virtual machine in quasi-production so far.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2007/12/16/thoughts-on-a-vmware-esx-deployment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

