Running NetApp’s aggrSpaceCheck without turning on RSH

When upgrading a NetApp filer from a pre-7.3 release to 7.3, metadata is apparently moved from within the FlexVol into the containing aggregate. If your aggregate is tight on space – more than 96% full – NetApp requires that you complete extra verification steps to ensure that you can complete the upgrade. From the Data ONTAP® 7.3.1.1 Release Notes (NOW login required):

If you suspect that your system has almost used all of its free space, or if you use thin provisioning, you should check the amount of space in use by each aggregate. If any aggregate is 97 percent full or more, do not proceed with the upgrade until you have used the Upgrade Advisor or aggrSpaceCheck tools to determine your system capacity and plan your upgrade.

Upgrade Advisor is a great tool, and I heartily recommend you use it for your upgrade. However, it doesn’t give you a lot of visibility into what’s being checked for here. Lucky for us, NetApp offers an alternative tool: aggrSpaceCheck (NOW login required).
Continue reading

VMware/NFS/NetApp SnapRestore/Linux LVM Single File Recovery Notes

There have been a few posts elsewhere discussing file-level recovery for Linux VMs on NetApp NFS datastores, but none that have dealt specifically with Linux LVM-encapsulated partitions.

Here’s our in-house procedure for recovery; note that we do not have FlexClone licensed on our filers.
Continue reading

Duplicity to Amazon S3 on FreeBSD: Building on the work of others

(This post adds only a couple small details to work described at randys.org and cenolan.com – go there for background on this post and useful scripts for automated Duplicity backup to S3.)

First off, if you want to use Duplicity installed from FreeBSD Ports to backup to Amazon S3, be sure to also install the devel/py-boto and security/pinentry-curses ports.

If you attempt to run the backup script described at randys.org or cenolan.com from cron, you may run into an error similar to the following:
Continue reading

ESX VM swap on NFS: If it crashes, try something else

I’ve written about running VMware ESX with VM swap on an NFS datastore previously – specifically whether or not it was supported/recommended:

After writing the second post, I thought the issue was pretty much resolved: From multiple sources, the consensus seemed to be that running ESX with VM swap on NFS would be fine.  Imagine my surprise (and disappointment) at seeing the following VMware KB article 1008091, updated yesterday: An ESX virtual machine on NFS fails with swap errors. Further details are in the article itself, but VMware’s KB site is throwing intermittent errors for me at the moment, so I’ll provide the money quote:

The reliability of the virtual machine can be improved by relocating the swap file location to a non-NFS datastore. Either SAN or local storage datastores improve virtual machine stability.

VMware: Not kidding about VMotion GigE Requirement

In case you’re curious/adventurous/broke enough to try configuring your VMotion network on Fast Ethernet instead of Gigabit Ethernet, here’s what you can expect.

First, a warning from your VI client that you’re venturing into unsupported territory:

A friendly warning

A friendly warning

And then, if you go ahead with the VMotion, a slight pause on the VM in question.  The following is output from running while true; do date; sleep 1; done on a Linux guest during the VMotion:

Tue Feb  3 13:23:17 PST 2009
Tue Feb  3 13:23:18 PST 2009
Tue Feb  3 13:23:19 PST 2009
Tue Feb  3 13:23:20 PST 2009
Tue Feb  3 13:23:21 PST 2009
Tue Feb  3 13:23:22 PST 2009
Tue Feb  3 13:24:12 PST 2009
Tue Feb  3 13:24:13 PST 2009
Tue Feb  3 13:24:14 PST 2009
Tue Feb  3 13:24:15 PST 2009
Tue Feb  3 13:24:16 PST 2009

Note the fifty second pause between 13:23:22 and 13:24:12? Ouch…

VMware about ESX swap on NFS: It’s okay

Paul Manning, from VMware, in response to a question I asked in the VI:OPS forums:

The current best practice for NFS is to not seperate the VM swap space from the VMhome directory on a NFS datastore. The reason for the originial recommendation was just good old fashioned conservitiveness.

More at the forum post, including more on the reasoning for the old recommendation of separating swap when using NFS – thanks, Paul, you made my day.

Practical Limits of NetApp Deduplication

I’ve blogged before about the limits of NetApp’s A-SIS (Deduplication). In practical use, however, those limits can be even lower – here’s why:

Suppose, for example, that you have a FAS2050; the maximum size FlexVol that you can dedupe is 1 TB. If the volume has ever been larger than 1 TB and then shrunk below that limit, it can’t be deduped, and, of course, you can’t grow a volume with A-SIS enabled beyond 1 TB. Fair enough, you say – but consider those limitations in the case of a volume where you aren’t sure how large it will eventually grow.

If you think your volume could eventually grow beyond 1 TB (deduped), and you’re getting a healthy 50% savings from dedupe you’ll actually need to undo A-SIS at 500GB. If you let your deduped data approach filling a 1TB volume, you will not be able to run “sis undo” – you’ll run out of space. TR-3505 has this to say about it:

Note that if sis undo starts processing and then there is not enough space to undeduplicate, it will stop, complain with a message about insufficient space, and leave the flexible volume dense. All data is still accessible, but some block sharing is still occurring. Use “df –s” to understand how much free space you really have and then either grow the flexible volume or delete data or Snapshot copies to provide the needed free space.

Bottom line: Either be absolutely sure you won’t ever need to grow your volume beyond the A-SIS limitations of your hardware platform, or run “sis undo” before the sum of the “used” and “saved” columns of “df -s” reaches the volume limit.

Postscript: If you were thinking – like I was – that ONTAP 7.3 would up the A-SIS limitations, apparently you need to think again.

Postscript 2: See also NOW KB35784, as pointed out by Dan C on Scott Lowe’s blog.

Amazon Elastic Block Store is out!

Amazon’s much-awaited Elastic Block Store for EC2 is out this morning; I’m excited to give this a try. A couple downers from the announcement: The pricing is somewhat high – $0.10 per allocated GB per month plus $0.10 per 1 million I/O requests – and the reliability isn’t where I’d like it to be. Specifically, Amazon notes:

Volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives.

Because Amazon EBS servers are replicated within a single Availability Zone, mirroring data across multiple Amazon EBS volumes in the same Availability Zone will not significantly improve volume durability.

That last sentence makes it sound like there is a 0.1% – 0.5% chance of catastrophic data loss of many distinct EBS volumes in an availability zone. If that’s the case, that’s scary – off the top of my head, I’d say your run-of-the mill “Enterprise” SAN doesn’t have a one-in-two hundred risk of catastrophic failure per year.

More links, not all of which I’ve had a chance to fully digest yet:

Quick and Dirty VMware ESX Patching

On the ESX console, do the following:

  • Read the documentation for each patch.
  • Group patches that can be installed together into a directory, possibly an NFS mount available on all your ESX hosts.
  • Cd into the patch directory and untar the patches:

    for i in `ls *.tgz`; do
    tar -xvzf $i
    done

  • Install the patches:

    for i in `ls`; do
    if [ -d $i ]; then
    cd $i
    esxupdate --noreboot update
    cd ..
    fi
    done

  • Reboot.