(Note that this presentation has my employer’s logo on it, but this site certainly does not speak for them.)
Presentation: When ACLs Attack – Cross-Platform File Permissions
Filling in the Missing Parts of NetApp’s API
Late last year, NetApp released long-overdue Python and Ruby support in their SDK, officially known as the NetApp Manageability SDK. The SDK download is – oddly and unfortunately – still buried behind a paywall, and you have to submit a web form about how you plan to use it to get access to the download; otherwise it’s available to all.
But perhaps there’s good reason for hiding the download away: There are still large gaps in the API. For instance, say you want to change the security mode of a qtree? You’re out of luck. (Makes one wonder how NetApp implements this functionality in OnCommand System Manager – are they eating their own dogfood?)
That said, if you’re willing to venture off the beaten (and supported) path, you can use the undocumented system-cli API call. Here’s how I’m using it in a Python wrapper I’m working on that makes the SDK feel a little bit less like handling thinly-varnished XML:
Read the rest of this entry »
Git pre-commit hook for DNS zone data
If you’re storing your DNS configuration in Git, a pre-commit hook to automatically run named-checkzone before zone file changes are committed may be useful to you. The pre-commit hook I use assumes that zone files (and only zone files) are in the format db.<zonename> (e.g. “db.andyleonard.com”), and only tests zone files (e.g. named-checkconf is not run against configuration files).
This pre-commit hook’s structure is based heavily on a Puppet 2.7 pre-commit published elsewhere. Read the rest of this entry »
Git-driven BIND (plus Fabric)
Step 0. Store your DNS configuration in Git. If you aren’t using some sort of version control system for your zone files and other BIND configuration, you ought to be. May I recommend Git? Put your entire configuration directory in there, but do read the “Downsides” section below for some important security considerations.
Step 1. Create a bare Git repository on your DNS server. Using Fabric, you’d do it something like this:
def config_git():
# Create bare git repo for direct DNS data pushes:
sudo('/bin/mkdir /srv/bind.git')
sudo('/bin/chown ubuntu:ubuntu /srv/bind.git')
with cd('/srv/bind.git'):
run('/usr/bin/git init --bare .')
git_post_receive()
(The above assumes an Ubuntu system, where the “ubuntu” user has sudo privileges, such as on EC2; adjust to your environment as needed.)
Read the rest of this entry »
What’s Wrong With OpenDNS
First off, before I get to anything that’s wrong, there’s a lot that’s right about OpenDNS: It’s a simple, effective and flexible tool for content filtering. As a company, they’re trying to improve the state of DNS for end users with tools like DNSCrypt. You can’t beat their entry-level price – free. Their anycast network is good, especially if you’re on the west coast of the United States, like I am (in fact, it’s better for me than surely-much-larger Google’s 8.8.8.8 and 8.8.4.4). Their dashboard is pretty neat, too.
Second, let’s get the most common complaint about OpenDNS – one that isn’t going to be discussed here any further – out of the way: Their practice of returning ads on blocked or non-existent sites in your browser, via a bogus A RR of 67.215.65.132 (if you don’t go with one of their paid options). OpenDNS is upfront about doing this, so you can decide if the trade-off is worthwhile before you sign up – and you can quit using them any time you want.
Those two preliminaries covered, here’s a case study of what I think is a serious problem with OpenDNS, plus some thoughts on how they could fix it.
Read the rest of this entry »
What t1.micro CPU Bursting Looks Like
Amazon’s smallest and least expensive instance type, the t1.micro “provide[s] a small amount of consistent CPU resources and allow[s] you to burst CPU capacity when additional cycles are available. [It is] well suited for lower throughput applications and web sites that consume significant compute cycles periodically.” (source)
Running a cpu-bound workload (building Perl modules) on an Ubuntu 11.10 t1.micro instance in us-west-2 tonight, I noticed the following curious CPU usage pattern of approximately 15 seconds on, 60 seconds off:
> vmstat 5 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 38528 29524 370540 0 0 86 423 84 216 12 5 35 4 1 0 0 6800 30288 388856 0 0 5356 26 660 1433 27 27 6 40 5 0 0 21752 27624 378088 0 0 30 211 150 159 40 22 0 8 6 0 0 21256 27636 378104 0 0 0 27 9 7 1 1 0 0 7 0 0 21256 27644 378108 0 0 0 10 9 9 1 1 0 0 7 0 0 21256 27652 378112 0 0 0 8 9 9 2 1 0 0 7 0 0 20256 27652 378228 0 0 0 0 8 13 1 1 0 0 8 0 0 20016 27660 378072 0 0 0 218 15 29 0 2 0 3 6 0 0 37884 27672 378048 0 0 0 14 9 11 3 1 0 0 4 0 0 30808 27684 378048 0 0 0 11 9 10 1 1 0 0 4 0 0 23740 27692 378056 0 0 0 10 8 8 2 1 0 0 4 0 0 30676 27692 378104 0 0 0 0 10 10 1 1 0 0 5 0 0 26220 27700 378064 0 0 0 9 7 14 6 2 0 1 5 0 0 21012 27712 378120 0 0 0 10 9 10 1 0 0 0 5 0 0 27336 27720 378064 0 0 0 21 13 10 1 1 0 0 1 0 0 29444 27732 378064 0 0 0 14 149 97 39 19 0 0 1 0 0 33420 27744 378084 0 0 6 12 250 166 67 30 0 0 2 0 0 41108 27756 378100 0 0 0 37 207 148 60 29 0 0 6 0 0 33668 27768 378068 0 0 0 14 8 9 1 1 0 0 5 0 0 37008 27780 378068 0 0 0 10 10 15 4 1 0 0 4 0 0 30808 27788 378072 0 0 0 18 11 9 2 0 0 0 5 0 0 24360 27796 378092 0 0 0 9 8 7 2 0 0 0 2 0 0 19896 27796 378140 0 0 0 0 8 9 1 1 0 0 6 0 0 27584 27804 378152 0 0 0 7 8 12 1 1 0 0 6 0 0 22864 27812 378148 0 0 0 9 10 12 2 1 0 0 7 0 0 19136 27820 378152 0 0 0 10 8 9 1 1 0 0 6 0 0 26096 27828 378148 0 0 0 12 10 7 2 1 0 0 6 0 0 20640 27828 378156 0 0 0 19 13 8 2 1 0 0 6 0 0 27956 27836 378156 0 0 0 11 9 12 1 1 0 0 6 0 0 22864 27844 378156 0 0 0 6 9 12 2 1 0 0 6 0 0 19020 27844 378156 0 0 0 1 9 9 1 1 0 0 2 0 0 46896 21504 368588 0 0 518 18 261 291 47 29 1 7 1 0 0 35372 21692 368788 0 0 0 43 253 174 65 32 0 0 1 0 0 43060 21796 368600 0 0 0 62 149 112 66 32 0 1 5 0 0 38100 21808 368600 0 0 0 46 11 10 1 1 0 0 5 0 0 45788 21816 368592 0 0 0 7 8 12 2 1 0 0 7 0 0 38464 21816 368600 0 0 0 0 7 8 2 1 0 0 7 0 0 45912 21824 368596 0 0 0 11 9 9 2 1 0 0 7 0 0 39216 21832 368600 0 0 0 7 9 8 1 0 0 0 4 0 0 35496 21840 368596 0 0 0 19 11 9 4 1 0 0 5 0 0 43060 21848 368600 0 0 0 29 10 10 2 1 0 0 5 0 0 37480 21856 368592 0 0 0 11 9 10 1 1 0 0 5 0 0 45044 21864 368596 0 0 0 7 9 10 1 1 0 0 5 0 0 38340 21872 368600 0 0 0 8 8 8 2 1 0 0 4 0 0 46284 21880 368596 0 0 0 10 10 11 1 1 0 0 6 0 0 38836 21888 368592 0 0 0 8 8 8 2 1 0 0 1 0 0 38340 21888 368544 0 0 0 15 53 41 12 7 0 0 1 0 0 40828 21900 368568 0 0 2 46 255 218 66 33 0 0 1 0 0 39960 21912 368608 0 0 0 26 237 153 63 28 0 0 3 0 0 50632 21924 368540 0 0 0 16 58 44 32 15 0 0 4 0 0 46284 21932 368540 0 0 0 7 8 11 1 1 0 0 4 0 0 45400 21940 368540 0 0 0 6 9 10 1 1 0 0 5 0 0 45292 21948 368552 0 0 0 11 8 14 0 1 0 0 6 0 0 37720 21948 368584 0 0 0 17 12 6 2 1 0 0
Apparently, the “small amount of consistent CPU resources” is about 3% of the CPU.
Moral of the story for me? Next time, pay the big bucks and launch an m1.small spot instance.
Deploying Ubuntu on Rackspace using Fog and Cloud-Init
This post is an amalgamation of Vladimir Vuksan’s Provision to cloud in 5 minutes using fog (EC2-specific) and Jeff Gran’s Bootstrapping an Ubuntu Server on Rackspace Using Cloud-Init and Fog – I contributed little more than (inexpertly) gluing them together.
Assuming you already have the Fog gem installed:
First, as a prerequisite and as Jeff Gran notes, you’ll need to create a Rackspace image with the cloud-init package installed.
Next, similar to what Vladimir Vuksan describes, create a config.rb file, and populate the following values as appropriate for your environment:
#!/usr/bin/env ruby @flavor_id = 3 @image_id = 1234567 @rackspace_username = 'example' @rackspace_api_key = '1234....' @private_key_path = './ssh/id_rsa' @public_key_path = './ssh/id_rsa.pub'
The flavor_id values and image_id specify the instance size and the image you built with cloud-init installed (see the “fog” executable’s “Compute[:rackspace].flavors” and “Compute[:rackspace].images”, respectively); the Rackspace username and api_key can be retrieved from within the console under “Your Account: API Access.” The SSH key pair will be what you use to access the new instance as root.
Read the rest of this entry »
Replacing a Failed NetApp Drive with an Un-zeroed Spare
Jason Boche has a post on the method he used to replace a failed drive on a filer with an un-zeroed spare (transferred from a lab machine); my procedure was a little different.
In this example, I’ll be installing a replacement drive pulled from aggr0 on another filer. Note that this procedure is not relevant for drive failures covered by a support contract, where you will receive a zeroed replacement drive directly from NetApp.
- Physically remove failed drive and replace with working drive. This will generate log messages similar to the following:
May 27 11:02:36 filer01 [raid.disk.missing: info]: Disk 1b.51 Shelf 3 Bay 3 [NETAPP X268_SGLXY750SSX AQNZ] S/N [5QD599LZ] is missing from the system May 27 11:03:00 filer01 [monitor.globalStatus.ok: info]: The system's global status is normal. May 27 11:03:16 filer01 [scsi.cmd.notReadyCondition: notice]: Disk device 0a.51: Device returns not yet ready: CDB 0x12: Sense Data SCSI:not ready - Drive spinning up (0x2 - 0x4 0x1 0x0)(7715). May 27 11:03:25 filer01 [sfu.firmwareUpToDate: info]: Firmware is up-to-date on all disk shelves. May 27 11:03:27 filer01 [diskown.changingOwner: info]: changing ownership for disk 0a.51 (S/N P8G9SMDF) from unowned (ID -1) to filer01 (ID 135027165) May 27 11:03:27 filer01 [raid.assim.rg.missingChild: error]: Aggregate foreign:aggr0, rgobj_verify: RAID object 0 has only 1 valid children, expected 14. May 27 11:03:27 filer01 [raid.assim.plex.missingChild: error]: Aggregate foreign:aggr0, plexobj_verify: Plex 0 only has 0 working RAID groups (2 total) and is being taken offline May 27 11:03:27 filer01 [raid.assim.mirror.noChild: ALERT]: Aggregate foreign:aggr0, mirrorobj_verify: No operable plexes found. May 27 11:03:27 filer01 [raid.assim.tree.foreign: error]: raidtree_verify: Aggregate aggr0 is a foreign aggregate and is being taken offline. Use the 'aggr online' command to bring it online. May 27 11:03:27 filer01 [raid.assim.tree.dupName: error]: Duplicate aggregate names found, an instance of foreign:aggr0 is being renamed to foreign:aggr0(1). May 27 11:03:28 filer01 [sfu.firmwareUpToDate: info]: Firmware is up-to-date on all disk shelves. May 27 11:04:40 filer01 [asup.smtp.sent: notice]: System Notification mail sent: System Notification from filer01 (RAID VOLUME FAILED) ERROR May 27 11:04:42 filer01 [asup.post.sent: notice]: System Notification message posted to NetApp: System Notification from filer01 (RAID VOLUME FAILED) ERROR
Note line 6, where it identifies the newly-added disk as part of “foreign:aggr0″ and missing the rest of its RAID group; “foreign:aggr0″ is taken offline in line 9. In line 10, “foreign:aggr0″ is renamed to “foreign:aggr0(1)” because the filer already has an aggr0, as you might expect. Be sure to note the new aggregate name, as you will need it for later steps.
- Verify aggregate status and names:
filer01> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root aggr1 online raid_dp, aggr aggr0(1) failed raid_dp, aggr diskroot, lost_write_protect=off, foreign partial aggr2 online raid_dp, aggr nosnap=on - Double-check the name of the foreign, offline aggregate that was brought in with the replacement drive, and destroy it:
filer01> aggr destroy aggr0(1) Are you sure you want to destroy this aggregate? yes Aggregate 'aggr0(1)' destroyed.
- Verify that the aggregate has been removed:
netapp03> aggr status Aggr State Status Options aggr0 online raid_dp, aggr root aggr1 online raid_dp, aggr aggr2 online raid_dp, aggr nosnap=on - Zero the new spare. First, confirm it is un-zeroed:
filer01> vol status -s Spare disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- Spare disks for block or zoned checksum traditional volumes or aggregates spare 0a.53 0a 3 5 FC:B - ATA 7200 635555/1301618176 635858/1302238304 (not zeroed) spare 0a.69 0a 4 5 FC:B - ATA 7200 635555/1301618176 635858/1302238304 spare 1b.51 1b 3 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304 (not zeroed) spare 1b.61 1b 3 13 FC:A - ATA 7200 635555/1301618176 635858/1302238304 spare 1b.87 1b 5 7 FC:A - ATA 7200 847555/1735794176 847827/1736350304 spare 1b.89 1b 5 9 FC:A - ATA 7200 847555/1735794176 847827/1736350304
In this example, we actually have two un-zeroed spares – the newly replaced drive (1b.51) and another drive (0a.53). Zero them both:
filer01> disk zero spares
And verify that they have been zeroed:
filer01> vol status -s Spare disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- Spare disks for block or zoned checksum traditional volumes or aggregates spare 0a.53 0a 3 5 FC:B - ATA 7200 635555/1301618176 635858/1302238304 spare 0a.69 0a 4 5 FC:B - ATA 7200 635555/1301618176 635858/1302238304 spare 1b.51 1b 3 3 FC:A - ATA 7200 635555/1301618176 635858/1302238304 spare 1b.61 1b 3 13 FC:A - ATA 7200 635555/1301618176 635858/1302238304 spare 1b.87 1b 5 7 FC:A - ATA 7200 847555/1735794176 847827/1736350304 spare 1b.89 1b 5 9 FC:A - ATA 7200 847555/1735794176 847827/1736350304
- Done. You have replaced a failed drive with a zeroed spare.
HAProxy and Keepalived: Example Configuration
HAProxy is load balancer software that allows you to proxy HTTP and TCP connections to a pool of back-end servers; Keepalived – among other uses – allows you to create a redundant pair of HAProxy servers by moving an IP address between HAProxy hosts in an active-passive configuration.
Read the rest of this entry »
S3fs, or, 256TB of Storage on the Cheap
There’s something pretty satisfying about seeing 256TB of storage available on a machine and knowing that you’re only paying pennies for what you’re using:
> df -h /cloud/hrc/src/ Filesystem Size Used Avail Use% Mounted on s3fs-1.35 256T 0 256T 0% /cloud/hrc/src