OpenSolaris has ZFS Automatic Snapshots; FreeBSD, while it has ZFS, doesn’t have a comparable feature that I’m aware of. So I wrote my own, zfs-snapshot.sh:
Read the rest of this entry »
Automatic ZFS Snapshot Rotation on FreeBSD
Drupal Deployment Sysadmin Best Practices
Drupal is a popular open source CMS reportedly used on tens of thousands of sites ranging from personal blogs to whitehouse.gov; for readers of this blog, it probably requires no further introduction.
Despite its many desirable features and continuing popularity, Drupal is not without its shortcomings, as many readers are also likely aware. Although Drupal has an active and responsive security team, the software has a long track record of requiring frequent security patches – Secunia has seven 2009 advisories for Drupal 6.x listed as of this writing. Although by its nature an apples-to-oranges comparison, this ranks Drupal behind similarly large and complex PHP projects such as WordPress 2.x (5) and Gallery 2.x (0) – and the number for Drupal does not include dozens of additional advisories for Drupal modules. Further, Drupal has struggled and lagged with support for PHP 5.3.x, suggesting to this outside observer that the project is having difficulties maintaining its codebase.
All that being said, I do not personally believe that the above issues rule out using Drupal; the benefits outweigh the shortcomings. So, assuming the question is not whether to deploy Drupal, but how to do so most securely and efficiently, my recommendations from a systems administration perspective are below.
Read the rest of this entry »
Test Driving Google Public DNS (Updated with OpenDNS comparison)
Google announced its Public DNS service this morning, claiming enhanced performance and security; I took it for a brief test drive with the following results.
(See bottom of post for an update running similar tests on OpenDNS.)
Methods: I searched Google for keywords that I believed fell somewhere between obscure and common and collected the first ten hostnames printed on the screen. I then used local installations of dig to query a collection of DNS servers for the hostnames’ A records and collected the response times. The different resolvers used were:
- A local BIND installation (127.0.0.1, cache empty) with Comcast Internet connectivity;
- A Comcast DNS server (68.87.69.150) via Comcast Internet connectivity;
- My employer’s internal caching DNS;
- Google (8.8.8.8) via my employer’s Internet connectivity (mostly Level 3);
- Google (8.8.8.8) via Comcast; and
- Google (8.8.8.8) via an Amazon EC2 instance in us-east-1a.
Anticipating a bimodal distribution of results, I assumed high latency responses were cache misses, while low latency responses were cache hits, and categorized results correspondingly.
Read the rest of this entry »
Migrating from self-hosted email to Google Apps for Domains
I recently moved my personal email from a self-managed Exim/Cyrus setup on a dedicated FreeBSD server to Gmail (Google Apps for Domains). This migration was motivated by a desire to reduce expenses, reduce time spent managing mail software and the importance of email (for me, personally) dropping to a level where I was willing to accept the risks inherent in outsourcing it. Details of the exact process I used to migrate mail are below.
Assumptions: An IMAP interface to your current email, basic comptency at managing DNS, and the ability to run the imapsync Perl script (built via FreeBSD ports in my case, but installation should be straightforward under most UNIX or Linux systems).
Read the rest of this entry »
Interesting Linux VM Crash Pattern
I’ve just begun to pull together some interesting data on a series of Linux VM crashes I’ve seen. I don’t have a resolution yet, but some interesting patterns have emerged.
Crash Symptoms
A CentOS 4.x or 5.x guest will crash with a message similar to the following on its console:
CentOS 4.x:
[<f883b299>] .text.lock.scsi_error+0x19/0x34 [scsi_mod]
[<f88c19ce>] mptscsih_io_done+0x5ee/0x608 [mptscsi] (…)
[<c02de564>] common_interrupt+0x18/0x20
[<c02ddb54>] system_call+0x0/0x30
CentOS 5.x:
RIP [<ffffffff8014c562>] list_del+0x48/0x71 RSP <ffffffff80425d00> <0>Kernel Panic - not syncing: Fatal exception
A hard reset (i.e. pressing the reset button on the VM’s console) is required to reboot the guest.
Read the rest of this entry »
Keeping your RHEL VMs from crushing your storage at 4:02am
Running a lot of Red Hat VMs in your virtual infrastructure, on shared storage? CentOS, Scientific Linux, both versions 4 and 5, they count for these purposes; Fedora should likely be included too. Do you have the slocate (version 4.x and earlier) or mlocate (version 5.x) RPMs installed? If you’re uncertain, check using the following:
> rpm -q slocate
slocate-2.7-13.el4.8.i386
or
> rpm -q mlocate
mlocate-0.15-1.el5.2.x86_64
If so, multiple RHEL VMs plus mlocate or slocate may be adding up to an array-crushing 4:02am shared storage load and latency spike for you. Before being addressed, this spike was bad enough at my place of employment (when combined with a NetApp Sunday-morning disk scrub) to cause a Windows VM to crash with I/O errors. Ouch.
Read the rest of this entry »
Running NetApp’s aggrSpaceCheck without turning on RSH
When upgrading a NetApp filer from a pre-7.3 release to 7.3, metadata is apparently moved from within the FlexVol into the containing aggregate. If your aggregate is tight on space – more than 96% full – NetApp requires that you complete extra verification steps to ensure that you can complete the upgrade. From the Data ONTAP® 7.3.1.1 Release Notes (NOW login required):
If you suspect that your system has almost used all of its free space, or if you use thin provisioning, you should check the amount of space in use by each aggregate. If any aggregate is 97 percent full or more, do not proceed with the upgrade until you have used the Upgrade Advisor or aggrSpaceCheck tools to determine your system capacity and plan your upgrade.
Upgrade Advisor is a great tool, and I heartily recommend you use it for your upgrade. However, it doesn’t give you a lot of visibility into what’s being checked for here. Lucky for us, NetApp offers an alternative tool: aggrSpaceCheck (NOW login required).
Read the rest of this entry »
NetApp FAS2020 aggregate capacity on ONTAP 7.3.1 – now 16TB
My NetApp FAS 2020 Sizing post remains popular nearly a year after I wrote it. However, with ONTAP 7.3.1 (and later releases) out, it’s also out of date. Here’s current information from p. 33 of the ONTAP 7.3.1.1 release notes (NOW login required):
Beginning with Data ONTAP 7.3.1, FAS2020 systems support aggregates up to 16 TB raw capacity,
provided that the root volume is hosted in a dedicated aggregate (that is, one that contains only the root
volume and no user data).
The release notes go on to point out an alternative to the dedicated root aggregate – having two spare disks per controller.
It’s nice to see the FAS2020 finally getting a maximum aggregate size on par with the rest of NetApp’s product line. However, in an era where 2TB drives are available from Western Digital – and presumably other manufacturers before too long – ONTAP’s 16TB aggregate limit grows increasingly anachronistic.
SnapManager for Exchange/SnapVault Integration Requirements
Update: NetApp has a KB article in NOW addressing this: Using SnapVault to Archive SnapManager for Exchange Backups Sets. Bottom line: You do not necessarily need ONTAP 7.3, Protection Manager and DataFabric Manager to send SnapManager for Exchange snapshots to a SnapVault secondary.
We recently acquired SnapManager for Exchange (SME) at my place of employment. We have an existing NetApp deployment consisting of two primary filers in a SnapVault arrangement with a third filer. The SME install is part of an upgrade from Exchange 2003 (on DAS) to 2007 (on Fibre Channel storage).
What we missed prior to purchasing SME: If you want to use SnapVault with SME, you need two additional pieces of software: Protection Manager and NetApp Management Console (part of DataFabric Manager, apparently). Here’s what p. 408 of the SnapManager® 5.0 for Microsoft® Exchange Installation and Administration Guide (NOW login required) says:
The following are the software dependencies for integrating SnapManager with
data set and SnapVault:◆ Protection Manager 3.7 and later
◆ NetApp Management Console 3.7 and later
◆ SnapDrive for Windows 6.0 and later
◆ Data ONTAP 7.3 or later
Wish I’d known that sooner.
(This is the point where some random NetApp fanboy pops down to the comments and fires off something about how NetApp is the greatest storage company ever, and if I’d done appropriate due diligence, I wouldn’t have missed this requirement. My advice: Spare us, smart guy. I’m writing this post to make it easier for other NetApp customers to do their “due diligence”.)
VMware/NFS/NetApp SnapRestore/Linux LVM Single File Recovery Notes
There have been a few posts elsewhere discussing file-level recovery for Linux VMs on NetApp NFS datastores, but none that have dealt specifically with Linux LVM-encapsulated partitions.
Here’s our in-house procedure for recovery; note that we do not have FlexClone licensed on our filers.
Read the rest of this entry »