thinking sysadmin

qstat -u aleonard -s z

Git pre-commit hook for DNS zone data

leave a comment

If you’re storing your DNS configuration in Git, a pre-commit hook to automatically run named-checkzone before zone file changes are committed may be useful to you. The pre-commit hook I use assumes that zone files (and only zone files) are in the format db.<zonename> (e.g. “db.andyleonard.com”), and only tests zone files (e.g. named-checkconf is not run against configuration files).

This pre-commit hook’s structure is based heavily on a Puppet 2.7 pre-commit published elsewhere. Read the rest of this entry »

Written by Andy

January 21st, 2012 at 7:46 pm

Posted in dns

Tagged with , , ,

Git-driven BIND (plus Fabric)

leave a comment

Step 0. Store your DNS configuration in Git. If you aren’t using some sort of version control system for your zone files and other BIND configuration, you ought to be. May I recommend Git? Put your entire configuration directory in there, but do read the “Downsides” section below for some important security considerations.

Step 1. Create a bare Git repository on your DNS server. Using Fabric, you’d do it something like this:

def config_git():

    # Create bare git repo for direct DNS data pushes:
    sudo('/bin/mkdir /srv/bind.git')
    sudo('/bin/chown ubuntu:ubuntu /srv/bind.git')
    with cd('/srv/bind.git'):
        run('/usr/bin/git init --bare .')
    git_post_receive()

(The above assumes an Ubuntu system, where the “ubuntu” user has sudo privileges, such as on EC2; adjust to your environment as needed.)
Read the rest of this entry »

Written by Andy

December 28th, 2011 at 7:46 pm

Posted in dns

Tagged with , , , , ,

What’s Wrong With OpenDNS

leave a comment

First off, before I get to anything that’s wrong, there’s a lot that’s right about OpenDNS: It’s a simple, effective and flexible tool for content filtering. As a company, they’re trying to improve the state of DNS for end users with tools like DNSCrypt. You can’t beat their entry-level price – free. Their anycast network is good, especially if you’re on the west coast of the United States, like I am (in fact, it’s better for me than surely-much-larger Google’s 8.8.8.8 and 8.8.4.4). Their dashboard is pretty neat, too.

Second, let’s get the most common complaint about OpenDNS – one that isn’t going to be discussed here any further – out of the way: Their practice of returning ads on blocked or non-existent sites in your browser, via a bogus A RR of 67.215.65.132 (if you don’t go with one of their paid options). OpenDNS is upfront about doing this, so you can decide if the trade-off is worthwhile before you sign up – and you can quit using them any time you want.

Those two preliminaries covered, here’s a case study of what I think is a serious problem with OpenDNS, plus some thoughts on how they could fix it.
Read the rest of this entry »

Written by Andy

December 20th, 2011 at 5:54 am

Posted in dns

Tagged with ,

What t1.micro CPU Bursting Looks Like

leave a comment

Amazon’s smallest and least expensive instance type, the t1.micro “provide[s] a small amount of consistent CPU resources and allow[s] you to burst CPU capacity when additional cycles are available. [It is] well suited for lower throughput applications and web sites that consume significant compute cycles periodically.” (source)

Running a cpu-bound workload (building Perl modules) on an Ubuntu 11.10 t1.micro instance in us-west-2 tonight, I noticed the following curious CPU usage pattern of approximately 15 seconds on, 60 seconds off:

> vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0  38528  29524 370540    0    0    86   423   84  216 12  5 35  4
 1  0      0   6800  30288 388856    0    0  5356    26  660 1433 27 27  6 40
 5  0      0  21752  27624 378088    0    0    30   211  150  159 40 22  0  8
 6  0      0  21256  27636 378104    0    0     0    27    9    7  1  1  0  0
 7  0      0  21256  27644 378108    0    0     0    10    9    9  1  1  0  0
 7  0      0  21256  27652 378112    0    0     0     8    9    9  2  1  0  0
 7  0      0  20256  27652 378228    0    0     0     0    8   13  1  1  0  0
 8  0      0  20016  27660 378072    0    0     0   218   15   29  0  2  0  3
 6  0      0  37884  27672 378048    0    0     0    14    9   11  3  1  0  0
 4  0      0  30808  27684 378048    0    0     0    11    9   10  1  1  0  0
 4  0      0  23740  27692 378056    0    0     0    10    8    8  2  1  0  0
 4  0      0  30676  27692 378104    0    0     0     0   10   10  1  1  0  0
 5  0      0  26220  27700 378064    0    0     0     9    7   14  6  2  0  1
 5  0      0  21012  27712 378120    0    0     0    10    9   10  1  0  0  0
 5  0      0  27336  27720 378064    0    0     0    21   13   10  1  1  0  0
 1  0      0  29444  27732 378064    0    0     0    14  149   97 39 19  0  0
 1  0      0  33420  27744 378084    0    0     6    12  250  166 67 30  0  0
 2  0      0  41108  27756 378100    0    0     0    37  207  148 60 29  0  0
 6  0      0  33668  27768 378068    0    0     0    14    8    9  1  1  0  0
 5  0      0  37008  27780 378068    0    0     0    10   10   15  4  1  0  0
 4  0      0  30808  27788 378072    0    0     0    18   11    9  2  0  0  0
 5  0      0  24360  27796 378092    0    0     0     9    8    7  2  0  0  0
 2  0      0  19896  27796 378140    0    0     0     0    8    9  1  1  0  0
 6  0      0  27584  27804 378152    0    0     0     7    8   12  1  1  0  0
 6  0      0  22864  27812 378148    0    0     0     9   10   12  2  1  0  0
 7  0      0  19136  27820 378152    0    0     0    10    8    9  1  1  0  0
 6  0      0  26096  27828 378148    0    0     0    12   10    7  2  1  0  0
 6  0      0  20640  27828 378156    0    0     0    19   13    8  2  1  0  0
 6  0      0  27956  27836 378156    0    0     0    11    9   12  1  1  0  0
 6  0      0  22864  27844 378156    0    0     0     6    9   12  2  1  0  0
 6  0      0  19020  27844 378156    0    0     0     1    9    9  1  1  0  0
 2  0      0  46896  21504 368588    0    0   518    18  261  291 47 29  1  7
 1  0      0  35372  21692 368788    0    0     0    43  253  174 65 32  0  0
 1  0      0  43060  21796 368600    0    0     0    62  149  112 66 32  0  1
 5  0      0  38100  21808 368600    0    0     0    46   11   10  1  1  0  0
 5  0      0  45788  21816 368592    0    0     0     7    8   12  2  1  0  0
 7  0      0  38464  21816 368600    0    0     0     0    7    8  2  1  0  0
 7  0      0  45912  21824 368596    0    0     0    11    9    9  2  1  0  0
 7  0      0  39216  21832 368600    0    0     0     7    9    8  1  0  0  0
 4  0      0  35496  21840 368596    0    0     0    19   11    9  4  1  0  0
 5  0      0  43060  21848 368600    0    0     0    29   10   10  2  1  0  0
 5  0      0  37480  21856 368592    0    0     0    11    9   10  1  1  0  0
 5  0      0  45044  21864 368596    0    0     0     7    9   10  1  1  0  0
 5  0      0  38340  21872 368600    0    0     0     8    8    8  2  1  0  0
 4  0      0  46284  21880 368596    0    0     0    10   10   11  1  1  0  0
 6  0      0  38836  21888 368592    0    0     0     8    8    8  2  1  0  0
 1  0      0  38340  21888 368544    0    0     0    15   53   41 12  7  0  0
 1  0      0  40828  21900 368568    0    0     2    46  255  218 66 33  0  0
 1  0      0  39960  21912 368608    0    0     0    26  237  153 63 28  0  0
 3  0      0  50632  21924 368540    0    0     0    16   58   44 32 15  0  0
 4  0      0  46284  21932 368540    0    0     0     7    8   11  1  1  0  0
 4  0      0  45400  21940 368540    0    0     0     6    9   10  1  1  0  0
 5  0      0  45292  21948 368552    0    0     0    11    8   14  0  1  0  0
 6  0      0  37720  21948 368584    0    0     0    17   12    6  2  1  0  0

Apparently, the “small amount of consistent CPU resources” is about 3% of the CPU.

Moral of the story for me? Next time, pay the big bucks and launch an m1.small spot instance.

Written by Andy

December 9th, 2011 at 10:26 pm

Posted in utility computing

Tagged with , , ,

Deploying Ubuntu on Rackspace using Fog and Cloud-Init

2 comments

This post is an amalgamation of Vladimir Vuksan’s Provision to cloud in 5 minutes using fog (EC2-specific) and Jeff Gran’s Bootstrapping an Ubuntu Server on Rackspace Using Cloud-Init and Fog – I contributed little more than (inexpertly) gluing them together.

Assuming you already have the Fog gem installed:

First, as a prerequisite and as Jeff Gran notes, you’ll need to create a Rackspace image with the cloud-init package installed.

Next, similar to what Vladimir Vuksan describes, create a config.rb file, and populate the following values as appropriate for your environment:

#!/usr/bin/env ruby

@flavor_id = 3
@image_id = 1234567

@rackspace_username =  'example'
@rackspace_api_key = '1234....'

@private_key_path = './ssh/id_rsa'
@public_key_path = './ssh/id_rsa.pub'

The flavor_id values and image_id specify the instance size and the image you built with cloud-init installed (see the “fog” executable’s “Compute[:rackspace].flavors” and “Compute[:rackspace].images”, respectively); the Rackspace username and api_key can be retrieved from within the console under “Your Account: API Access.” The SSH key pair will be what you use to access the new instance as root.
Read the rest of this entry »

Written by Andy

November 28th, 2011 at 1:53 pm

Replacing a Failed NetApp Drive with an Un-zeroed Spare

leave a comment

Jason Boche has a post on the method he used to replace a failed drive on a filer with an un-zeroed spare (transferred from a lab machine); my procedure was a little different.

In this example, I’ll be installing a replacement drive pulled from aggr0 on another filer. Note that this procedure is not relevant for drive failures covered by a support contract, where you will receive a zeroed replacement drive directly from NetApp.

  • Physically remove failed drive and replace with working drive. This will generate log messages similar to the following:
    May 27 11:02:36 filer01 [raid.disk.missing: info]: Disk 1b.51 Shelf 3 Bay 3 [NETAPP   X268_SGLXY750SSX AQNZ] S/N [5QD599LZ] is missing from the system
    May 27 11:03:00 filer01 [monitor.globalStatus.ok: info]: The system's global status is normal.
    May 27 11:03:16 filer01 [scsi.cmd.notReadyCondition: notice]: Disk device 0a.51: Device returns not yet ready: CDB 0x12: Sense Data SCSI:not ready - Drive spinning up (0x2 - 0x4 0x1 0x0)(7715).
    May 27 11:03:25 filer01 [sfu.firmwareUpToDate: info]: Firmware is up-to-date on all disk shelves.
    May 27 11:03:27 filer01 [diskown.changingOwner: info]: changing ownership for disk 0a.51 (S/N P8G9SMDF) from unowned (ID -1) to filer01 (ID 135027165)
    May 27 11:03:27 filer01 [raid.assim.rg.missingChild: error]: Aggregate foreign:aggr0, rgobj_verify: RAID object 0 has only 1 valid children, expected 14.
    May 27 11:03:27 filer01 [raid.assim.plex.missingChild: error]: Aggregate foreign:aggr0, plexobj_verify: Plex 0 only has 0 working RAID groups (2 total) and is being taken offline
    May 27 11:03:27 filer01 [raid.assim.mirror.noChild: ALERT]: Aggregate foreign:aggr0, mirrorobj_verify: No operable plexes found.
    May 27 11:03:27 filer01 [raid.assim.tree.foreign: error]: raidtree_verify: Aggregate aggr0 is a foreign aggregate and is being taken offline. Use the 'aggr online' command to bring it online.
    May 27 11:03:27 filer01 [raid.assim.tree.dupName: error]: Duplicate aggregate names found, an instance of foreign:aggr0 is being renamed to foreign:aggr0(1).
    May 27 11:03:28 filer01 [sfu.firmwareUpToDate: info]: Firmware is up-to-date on all disk shelves.
    May 27 11:04:40 filer01 [asup.smtp.sent: notice]: System Notification mail sent: System Notification from filer01 (RAID VOLUME FAILED) ERROR
    May 27 11:04:42 filer01 [asup.post.sent: notice]: System Notification message posted to NetApp: System Notification from filer01 (RAID VOLUME FAILED) ERROR
    

    Note line 6, where it identifies the newly-added disk as part of “foreign:aggr0″ and missing the rest of its RAID group; “foreign:aggr0″ is taken offline in line 9. In line 10, “foreign:aggr0″ is renamed to “foreign:aggr0(1)” because the filer already has an aggr0, as you might expect. Be sure to note the new aggregate name, as you will need it for later steps.

  • Verify aggregate status and names:
    filer01> aggr status
               Aggr State           Status            Options
              aggr0 online          raid_dp, aggr     root
              aggr1 online          raid_dp, aggr
           aggr0(1) failed          raid_dp, aggr     diskroot, lost_write_protect=off,
                                    foreign
                                    partial
              aggr2 online          raid_dp, aggr     nosnap=on
    
  • Double-check the name of the foreign, offline aggregate that was brought in with the replacement drive, and destroy it:
    filer01> aggr destroy aggr0(1)
    Are you sure you want to destroy this aggregate? yes
    Aggregate 'aggr0(1)' destroyed.
    
  • Verify that the aggregate has been removed:
    netapp03> aggr status
               Aggr State           Status            Options
              aggr0 online          raid_dp, aggr     root
              aggr1 online          raid_dp, aggr
              aggr2 online          raid_dp, aggr     nosnap=on
    
  • Zero the new spare. First, confirm it is un-zeroed:
    filer01> vol status -s
    
    Spare disks
    
    RAID Disk	Device	HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
    ---------	------	------------- ---- ---- ---- ----- --------------    --------------
    Spare disks for block or zoned checksum traditional volumes or aggregates
    spare   	0a.53	0a    3   5   FC:B   -  ATA   7200 635555/1301618176 635858/1302238304 (not zeroed)
    spare   	0a.69	0a    4   5   FC:B   -  ATA   7200 635555/1301618176 635858/1302238304
    spare   	1b.51	1b    3   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304 (not zeroed)
    spare   	1b.61	1b    3   13  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
    spare   	1b.87	1b    5   7   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
    spare   	1b.89	1b    5   9   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
    

    In this example, we actually have two un-zeroed spares – the newly replaced drive (1b.51) and another drive (0a.53). Zero them both:

    filer01> disk zero spares
    

    And verify that they have been zeroed:

    filer01> vol status -s
    
    Spare disks
    
    RAID Disk	Device	HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
    ---------	------	------------- ---- ---- ---- ----- --------------    --------------
    Spare disks for block or zoned checksum traditional volumes or aggregates
    spare   	0a.53	0a    3   5   FC:B   -  ATA   7200 635555/1301618176 635858/1302238304
    spare   	0a.69	0a    4   5   FC:B   -  ATA   7200 635555/1301618176 635858/1302238304
    spare   	1b.51	1b    3   3   FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
    spare   	1b.61	1b    3   13  FC:A   -  ATA   7200 635555/1301618176 635858/1302238304
    spare   	1b.87	1b    5   7   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
    spare   	1b.89	1b    5   9   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
    
  • Done. You have replaced a failed drive with a zeroed spare.

Written by Andy

May 28th, 2011 at 5:42 am

Posted in storage

Tagged with , , ,

HAProxy and Keepalived: Example Configuration

2 comments

HAProxy is load balancer software that allows you to proxy HTTP and TCP connections to a pool of back-end servers; Keepalived – among other uses – allows you to create a redundant pair of HAProxy servers by moving an IP address between HAProxy hosts in an active-passive configuration.
Read the rest of this entry »

Written by Andy

February 1st, 2011 at 9:17 pm

Posted in Applications

Tagged with , , , ,

S3fs, or, 256TB of Storage on the Cheap

leave a comment

There’s something pretty satisfying about seeing 256TB of storage available on a machine and knowing that you’re only paying pennies for what you’re using:

> df -h /cloud/hrc/src/
Filesystem            Size  Used Avail Use% Mounted on
s3fs-1.35             256T     0  256T   0% /cloud/hrc/src

Read the rest of this entry »

Written by Andy

January 25th, 2011 at 6:59 am

Posted in utility computing

Tagged with , , ,

Dropping Tarballs with Puppet

leave a comment

I frequently find myself using Puppet to expand tarballs in various locations, sometimes fiddling with a directory name here or there. In fact, I do it so often, that I created a “define” for it earlier this week. This could be a little more polished, but in the spirit of sharing first drafts, here goes:

# Small define to expand a tarball at a location; assumes File[$title]
# definition of tarball and installation of pax:

define baselayout::drop_tarball($dest, $dir_name, $dir_sub='') {

  # $dest: cwd in which expansion is done
  # $dir_name: name of top level directory created in $dest
  # $dir_sub: regexp to -s for pax - not supported for .zip archives

  if ($dir_sub) {
    $regexp = "-s $dir_sub"
  } else {
    $regexp = ''
  }

  # CentOS' pax doesn't support "-j" flag; therefore, run pax after
  # bzcat in a pipeline. Twiddle path to bzcat as distro-appropriate:
  case $operatingsystem {
    CentOS: {
      $bzcat = "/usr/bin/bzcat"
    }
    Ubuntu: {
      $bzcat = "/bin/bzcat"
    }
  }

  # Choose expansion method based on file suffix:
  if (($title =~ /\.tar.gz$/) or ($title =~ /\.tgz$/)) {
    $expand = "/usr/bin/pax -rz $regexp < $title"
  } elsif (($title =~ /\.tar.bz2$/) or ($title =~ /\.tbz$/)) {
    $expand = "$bzcat $title | /usr/bin/pax -r $regexp"
  } elsif ( $title =~ /\.zip$/ ) {
    $expand = '/usr/bin/unzip $title'
  }

  exec { "drop_tarball $title":
    command => $expand,
    cwd => $dest,
    creates => "${dest}/${dir_name}",
    require => File[$title],
  }

}

The definition is written for Ubuntu and CentOS, assumes pax is installed on the system, and that a file resource for the tarball is defined before the definition is called. Pax is used instead of tar to facilitate renaming the top-level directory of the tarball. Zipped directories are also support, but without rename functionality.

I’ll update a gist as I develop the definition.

Comments welcome.

Written by Andy

January 20th, 2011 at 8:15 pm

Posted in configuration management

Tagged with , , , ,

Sunday Project: Installing CyanogenMod on an HTC Hero

2 comments

This weekend, I finally rooted my old but functional Sprint HTC Hero, and installed CyanogenMod 6.1.0 on it. Below are my notes on the process.

(Note: This post is more of a compilation than any original work of my own; I’ve tried to reference sources for the information that I present here as best as I recall; please post any links I should have included in the comments.)
Read the rest of this entry »

Written by Andy

January 17th, 2011 at 7:23 am