<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>thinking sysadmin &#187; s3</title>
	<atom:link href="http://andyleonard.com/tag/s3/feed/" rel="self" type="application/rss+xml" />
	<link>http://andyleonard.com</link>
	<description>qstat -u aleonard -s z</description>
	<lastBuildDate>Sun, 22 Jan 2012 03:46:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>S3fs, or, 256TB of Storage on the Cheap</title>
		<link>http://andyleonard.com/2011/01/25/s3fs-or-256tb-of-storage-on-the-cheap/</link>
		<comments>http://andyleonard.com/2011/01/25/s3fs-or-256tb-of-storage-on-the-cheap/#comments</comments>
		<pubDate>Tue, 25 Jan 2011 14:59:14 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[utility computing]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[puppet]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[s3fs]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=624</guid>
		<description><![CDATA[There&#8217;s something pretty satisfying about seeing 256TB of storage available on a machine and knowing that you&#8217;re only paying pennies for what you&#8217;re using: In the words of its authors, &#8220;s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s something pretty satisfying about seeing 256TB of storage available on a machine and knowing that you&#8217;re only paying <a href="http://aws.amazon.com/s3/#pricing">pennies</a> for what you&#8217;re using:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
&gt; df -h /cloud/hrc/src/
Filesystem            Size  Used Avail Use% Mounted on
s3fs-1.35             256T     0  256T   0% /cloud/hrc/src
</pre>
<p><span id="more-624"></span><br />
In the words of its authors, &#8220;<a href="http://code.google.com/p/s3fs/">s3fs</a> is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files).&#8221;</p>
<p>Now, make no mistake about it &#8211; since s3fs is backed by object storage in a remote data center, this is not for high- or even moderate-IOPS workloads.  Routine tasks like expanding tarballs containing many small files or compiling code on an s3fs file system can be painful.  But for &#8220;colder&#8221; storage applications &#8211; think online archives, or possibly some backup applications &#8211; it shines.</p>
<p>The <a href="http://code.google.com/p/s3fs/wiki/FuseOverAmazon">installation procedure</a> for s3fs is straightforward.  I&#8217;ve also put a Puppet module for installing s3fs and managing its mounts on <a href="https://github.com/anl/puppet-s3fs">GitHub</a>, although you may want to adapt it to distribute your own package of s3fs instead of building it locally on each machine.</p>
<p>S3fs is licensed under the GPL, as is my Puppet module.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2011/01/25/s3fs-or-256tb-of-storage-on-the-cheap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Duplicity to Amazon S3 on FreeBSD: Building on the work of others</title>
		<link>http://andyleonard.com/2009/03/02/duplicity-to-amazon-s3-on-freebsd-building-on-the-work-of-others/</link>
		<comments>http://andyleonard.com/2009/03/02/duplicity-to-amazon-s3-on-freebsd-building-on-the-work-of-others/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 19:47:53 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[freebsd]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[duplicity]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=226</guid>
		<description><![CDATA[(This post adds only a couple small details to work described at randys.org and cenolan.com &#8211; go there for background on this post and useful scripts for automated Duplicity backup to S3.) First off, if you want to use Duplicity installed from FreeBSD Ports to backup to Amazon S3, be sure to also install the [...]]]></description>
			<content:encoded><![CDATA[<p>(This post adds only a couple small details to work described at <a href="http://www.randys.org/2007/11/16/how-to-automated-backups-to-amazon-s-s3-with-duplicity/">randys.org</a> and <a href="http://www.cenolan.com/2008/12/how-to-incremental-daily-backups-amazon-s3-duplicity/">cenolan.com</a> &#8211; go there for background on this post and useful scripts for automated Duplicity backup to S3.)</p>
<p>First off, if you want to use Duplicity installed from FreeBSD Ports to backup to Amazon S3, be sure to also install the <code>devel/py-boto</code> and <code>security/pinentry-curses</code> ports.</p>
<p>If you attempt to run the backup script described at randys.org or cenolan.com from cron, you may run into an error similar to the following:<br />
<span id="more-226"></span></p>
<pre>2009-03-01_01:05:05: ... backing up filesystem
Cleanup of temporary directory /tmp/duplicity-gM4CN9-tempdir failed - this
is probably a bug.
Cleanup of temporary directory /tmp/duplicity-gM4CN9-tempdir failed - this
is probably a bug.
Traceback (most recent call last):
File "/usr/local/bin/duplicity", line 583, in &lt;module&gt;
with_tempdir(main)
File "/usr/local/bin/duplicity", line 577, in with_tempdir
fn()
File "/usr/local/bin/duplicity", line 558, in main
full_backup(col_stats)
File "/usr/local/bin/duplicity", line 234, in full_backup
bytes_written = write_multivol("full", tarblock_iter, globals.backend)
File "/usr/local/bin/duplicity", line 148, in write_multivol
globals.gpg_profile, globals.volsize)
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 240,
in GPGWriteFile
bytes_to_go = data_size - get_current_size()
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 232,
in get_current_size
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory:
'/tmp/duplicity-gM4CN9-tempdir/mktemp-iZknw0-2'

Traceback (most recent call last):
File "/usr/local/bin/duplicity", line 583, in &lt;module&gt;
with_tempdir(main)
File "/usr/local/bin/duplicity", line 577, in with_tempdir
fn()
File "/usr/local/bin/duplicity", line 558, in main
full_backup(col_stats)
File "/usr/local/bin/duplicity", line 232, in full_backup
sig_outfp = get_sig_fileobj("full-sig")
File "/usr/local/bin/duplicity", line 210, in get_sig_fileobj
fh = globals.backend.get_fileobj_write(sig_filename)
File "/usr/local/lib/python2.5/site-packages/duplicity/backend.py", line
354, in get_fileobj_write
fh = dup_temp.FileobjHooked(tdp.filtered_open("wb"))
File "/usr/local/lib/python2.5/site-packages/duplicity/path.py", line 716,
return gpg.GPGFile(1, self, gpg_profile)
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 112,
in __init__
'logger': self.logger_fp})
File "/usr/local/lib/python2.5/site-packages/GnuPGInterface.py", line 357,
in run
create_fhs, attach_fhs)
File "/usr/local/lib/python2.5/site-packages/GnuPGInterface.py", line 401,
in _attach_fork_exec
if process.pid == 0: self._as_child(process, gnupg_commands, args)
File "/usr/local/lib/python2.5/site-packages/GnuPGInterface.py", line 442,
in _as_child
os.execvp( command[0], command )
File "/usr/local/lib/python2.5/os.py", line 354, in execvp
_execvpe(file, args)
File "/usr/local/lib/python2.5/os.py", line 390, in _execvpe
func(fullname, *argrest)
OSError: [Errno 2] No such file or directory

Traceback (most recent call last):
File "/usr/local/bin/duplicity", line 583, in &lt;module&gt;
with_tempdir(main)
File "/usr/local/bin/duplicity", line 577, in with_tempdir
fn()
File "/usr/local/bin/duplicity", line 558, in main
full_backup(col_stats)
File "/usr/local/bin/duplicity", line 234, in full_backup
bytes_written = write_multivol("full", tarblock_iter, globals.backend)
File "/usr/local/bin/duplicity", line 148, in write_multivol
globals.gpg_profile, globals.volsize)
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 237,
in GPGWriteFile
file = GPGFile(True, path.Path(filename), profile)
File "/usr/local/lib/python2.5/site-packages/duplicity/gpg.py", line 112,
in __init__
'logger': self.logger_fp})
File "/usr/local/lib/python2.5/site-packages/GnuPGInterface.py", line 357,
in run
create_fhs, attach_fhs)
File "/usr/local/lib/python2.5/site-packages/GnuPGInterface.py", line 401,
in _attach_fork_exec
if process.pid == 0: self._as_child(process, gnupg_commands, args)
File "/usr/local/lib/python2.5/site-packages/GnuPGInterface.py", line 442,
in _as_child
os.execvp( command[0], command )
File "/usr/local/lib/python2.5/os.py", line 354, in execvp
_execvpe(file, args)
File "/usr/local/lib/python2.5/os.py", line 390, in _execvpe
func(fullname, *argrest)
OSError: [Errno 2] No such file or directory</pre>
<p>The solution to the above is simple &#8211; make sure the path includes <code>/usr/local/bin</code>, perhaps by including this at the start of the backup script:</p>
<pre>export PATH=${PATH}:/usr/local/bin</pre>
<p>Finally, when running an incremental backup, you may get this error:</p>
<pre>Fatal Error: Neither remote nor local manifest is readable.</pre>
<p>This can be solved by setting the <code>HOME</code> environment variable to <code>/root</code> assuming you&#8217;re running the backup as root (instead of the default <code>/var/log</code> for cron jobs):</p>
<pre>export HOME=/root</pre>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2009/03/02/duplicity-to-amazon-s3-on-freebsd-building-on-the-work-of-others/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Links, 9/18/2008</title>
		<link>http://andyleonard.com/2008/09/18/links-9182008/</link>
		<comments>http://andyleonard.com/2008/09/18/links-9182008/#comments</comments>
		<pubDate>Thu, 18 Sep 2008 20:52:07 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[link dump]]></category>
		<category><![CDATA[amazon aws]]></category>
		<category><![CDATA[cdn]]></category>
		<category><![CDATA[hsm]]></category>
		<category><![CDATA[netapp]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=83</guid>
		<description><![CDATA[We&#8217;re Never Content &#8211; Amazon announces a forthcoming CDN layered on top of S3 with &#8220;edge locations on three continents&#8221; &#8211; presumably North America, Europe and Asia &#8211; &#8220;in order to deliver your content from the most appropriate location.&#8221; Presumably Amazon is planning to use this in-house for their digital media sales, or possibly for [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li><a href="http://aws.typepad.com/aws/2008/09/were-never-cont.html">We&#8217;re Never Content</a> &#8211; Amazon announces a forthcoming CDN layered on top of S3 with &#8220;edge locations on three continents&#8221; &#8211; presumably North America, Europe and Asia &#8211; &#8220;in order to deliver your content from the most appropriate location.&#8221;  Presumably Amazon is planning to use this in-house for their digital media sales, or possibly for static content on their website.</li>
<li><a href="http://blogs.netapp.com/extensible_netapp/2008/09/tape-roman-char.html">Tape, Roman Chariots and Data Management</a> &#8211; &#8220;But here&#8217;s where it gets insidious, we know look at the mess that tape has created, and instead of asking the question: &#8216;Is a data protection infrastructure predicated on creating whole copies on a regular basis flawed?&#8217;  We ask the question: &#8216;How can I make creating and storing full copies more efficient?&#8217;&#8221;  An interesting read &#8211; nothing new &#8211; but somehow I don&#8217;t think that the solution the author would propose involves tape in an HSM scenario.  Which is too bad, because an HSM environment using tape really can address the problems mentioned in the article, as well as other issues such as capacity and power.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/09/18/links-9182008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Links 7/27/2008: S3 Outage Post-Mortem, Update 2 for VI 3 version 3.5</title>
		<link>http://andyleonard.com/2008/07/27/links-7272008-s3-outage-post-mortem-update-2-for-vi-3-version-35/</link>
		<comments>http://andyleonard.com/2008/07/27/links-7272008-s3-outage-post-mortem-update-2-for-vi-3-version-35/#comments</comments>
		<pubDate>Sun, 27 Jul 2008 15:26:01 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[link dump]]></category>
		<category><![CDATA[amazon aws]]></category>
		<category><![CDATA[esx]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[vss]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=56</guid>
		<description><![CDATA[Amazon S3 Availability Event: July 20, 2008 &#8211; Amazon&#8217;s post-mortem on the 7/20 S3 outage. Excerpt: &#8220;We&#8217;ve now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li><a href="http://status.aws.amazon.com/s3-20080720.html">Amazon S3 Availability Event: July 20, 2008</a> &#8211; Amazon&#8217;s post-mortem on the 7/20 S3 outage.  Excerpt: &#8220;We&#8217;ve now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect.&#8221;  (Seen first at <a href="http://arstechnica.com/news.ars/post/20080726-week-in-storage-cloud-storage-fumbles-tape-sets-records.html">Ars Technica</a>.)</li>
<li>VMware has released Update 2 for VMware Infrastructure 3 version 3.5 (I think that&#8217;s the Full Official Name That Only A Committee Could Love&#8230;).  <a href="http://blog.scottlowe.org/2008/07/26/vmware-releases-update-2/">Scott Lowe</a> has a good summary; release notes are <a href="http://www.vmware.com/support/vi3/doc/vi3_esx35u2_vc25u2_rel_notes.html">here</a>.  Most notable among the updates is the ability to use VSS to quiesce Windows VMs prior to snapshotting.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/07/27/links-7272008-s3-outage-post-mortem-update-2-for-vi-3-version-35/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linkage, 6/24/2008</title>
		<link>http://andyleonard.com/2008/06/24/linkage-6242008/</link>
		<comments>http://andyleonard.com/2008/06/24/linkage-6242008/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 23:19:37 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[link dump]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[corruption]]></category>
		<category><![CDATA[data integrity]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=39</guid>
		<description><![CDATA[S3 data corruption: &#8220;We&#8217;ve isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20. It was taken out of service at 11am PDT Sunday, 6/22. While it was in service it handled a small fraction of Amazon S3&#8242;s total requests in the US. Intermittently, under load, [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li><a href="http://developer.amazonwebservices.com/connect/message.jspa?messageID=93408#93408">S3 data corruption</a>: &#8220;We&#8217;ve isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20.  It was taken out of service at 11am PDT Sunday, 6/22.  While it was in service it handled a small fraction of Amazon S3&#8242;s total requests in the US.  Intermittently, under load, it was corrupting single bytes in the byte stream.  When the requests reached Amazon S3, if the Content-MD5 header was specified, Amazon S3 returned an error indicating the object did not match the MD5 supplied.  When no MD5 is specified, we are unable to determine if transmission errors occurred, and Amazon S3 must assume that the object has been correctly transmitted.&#8221;  (Seen at <a href="http://www.daemonology.net/blog/2008-06-24-amazon-s3-data-corruption.html">Daemonic Dispatches</a>.)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/06/24/linkage-6242008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Suggested Reading, 6/3/2008, Evening Edition</title>
		<link>http://andyleonard.com/2008/06/03/suggested-reading-632008-evening-edition/</link>
		<comments>http://andyleonard.com/2008/06/03/suggested-reading-632008-evening-edition/#comments</comments>
		<pubDate>Wed, 04 Jun 2008 04:52:36 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[link dump]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://andyleonard.com/?p=22</guid>
		<description><![CDATA[SkyNet Lives! (aka EC2 @ SmugMug) &#8211; Blog post about how SmugMug uses (and doesn&#8217;t use) Amazon Web Services; I found the comment that EC2 Persistent Storage &#8220;isn’t performant enough&#8221; intriguing &#8211; I&#8217;ll be interested to see what its performance characteristics are once it&#8217;s available to the public. Excerpt from the post: &#8220;Let me be [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li><a href="http://blogs.smugmug.com/don/2008/06/03/skynet-lives-aka-ec2-smugmug/">SkyNet Lives! (aka EC2 @ SmugMug)</a> &#8211; Blog post about how SmugMug uses (and doesn&#8217;t use) Amazon Web Services; I found the comment that EC2 Persistent Storage &#8220;isn’t performant enough&#8221; intriguing &#8211; I&#8217;ll be interested to see what its performance characteristics are once it&#8217;s available to the public.  Excerpt from the post: &#8220;Let me be very clear here: I really don’t want to operate datacenters anymore despite the fact that we’re pretty good at it. It’s a necessary evil because we’re an Internet company, but our mission is to be the best photo sharing site. We’d rather spend our time giving our customers great service and writing great software rather than managing physical hardware. I’d rather have my awesome Ops team interacting with software remotely for 100% of their duties (and mostly just watching software like SkyNet do its thing). We’ll get there &#8211; I’m confident of that &#8211; we’re just not there yet.&#8221;  (Seen at the <a href="http://aws.typepad.com/aws/2008/06/the-forthcoming.html">Amazon Web Services Blog</a>.)</li>
<li><a href="http://www.roughtype.com/archives/2008/06/microsoft_to_pu.php">Rough Type: Microsoft to put &#8220;many millions&#8221; of servers in cloud</a><br />
 &#8211; Nicholas Carr reports on Microsoft&#8217;s cloud plans.  My thoughts: Does anyone see a latency problem with the following &#8211; think of the speed of light in glass, or the AWS Blog post mentioned above and its comments about locating services in the same cloud , or why SmugMug still runs some of its own servers: &#8220;We&#8217;re taking everything we do at the server level, and saying that we will have a service that mirrors that exactly. The simplest one of those is to say, okay, I can run Exchange on premise, or I can connect up to it as a service. But even at the BizTalk level, we&#8217;ll have BizTalk Services. For SQL, we&#8217;ll have SQL Server Data Services, and so you can connect up, build the database. It will be hosted in our cloud with the big, big data center, and geo-distributed automatically. This is kind of fascinating because it&#8217;s getting us to think about data centers at a scale that never existed before. Literally today we have, in our data center, many hundreds of thousands of servers, and in the future we&#8217;ll have many millions of those servers.&#8221;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://andyleonard.com/2008/06/03/suggested-reading-632008-evening-edition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

