Archive for January, 2010

Where ARE the Network Virtual Appliances?

// January 31st, 2010 // Comments Off // Uncategorized

A good friend of mine ping'd me by email on Friday, asking me about Allan Leinwand's article on GigaOM entitled Where Are the Network Virtual Appliances? As server virtualization moves into the enterprise and cloud data centers, networking needs to follow with virtual appliances. I'm a long-standing believer in Allan's vision for network virtual appliances. Yep. I've often taken to...

New Ubuntu 8.04.3 Hardy AMIs for Amazon EC2

// January 30th, 2010 // Comments Off // Uncategorized

Scott Moser (Canonical) built and released new Ubuntu 8.04.3 LTS Hardy images and AMIs for Amazon EC2. I also published new EBS boot AMIs using the same images. I’ve listed all of the AMI ids on http://alestic.com (pick your region at the top).

These AMIs should work better in the us-west-1 region (apt sources.list) and have updated software packages so upgrades on new instances should be faster.

Southern California Linux Expo – Februrary 19-21, 2010 at the Westin LAX

// January 29th, 2010 // Comments Off // Uncategorized

SCaLE 8x

The 8th Southern California Linux Expo (aka SCaLE 8x) is a community organized, non-profit event. Those words and the incredibly cheap price might lead you to believe that it is not worth going to, but if this is your first time you’ll be amazed by the size, scope, and professionalism of the event with nearly a hundred exhibits and dozens of informative talks.

Even though you’re not paying hundreds of dollars for the conference fee, it’s still worth traveling to if you’re not in Los Angeles. If you are in LA, then you have no excuse.

Just like last year at SCaLE, I will be leading another “Try-It Lab” where we’ll help folks get started with using Amazon EC2 and Ubuntu Linux. More information about preparation will be posted on the SCaLE blog, so be sure to review it before attending if you’re interested in a hands-on, guided, workshop experience with EC2. The lab seats “sold out” quickly last year, so make sure you get in early.

Deal for readers of Alestic.com: When you register for SCaLE, use the code “ERIC” for 50% off of the listed price. If you sign up today, that gives you a full access pass for a ridiculously low $35. Prices may go up as the weekend gets closer.

[Upate 2010-02-16: Link to preparation instructions on SCALE blog]

Webinar Thursday Feb 4th

// January 28th, 2010 // Comments Off // Uncategorized

Join us on February 4th for a webinar on the latest features of Landscape.   Ken Drachnik, the Landscape product manager will demo the latest version and review the newest features, including Cloud computing support, configuration management updates, the latest GUI updates and will discuss what new features can be expected from Landscape in 2010.  The webinar is being offered twice on Thursday, at 1500 and 2000 UTC, so that people across the many time zones we service can attend.  To participate, please register

Public EBS Boot AMIs for Ubuntu on Amazon EC2

// January 25th, 2010 // Comments Off // Uncategorized

If you’ve been following along, you probably know that I have been recommending that folks using EC2 switch to the official Ubuntu AMIs published by Canonical (Hardy or Karmic). I have been building and publishing Ubuntu AMIs since 2007 (including Dapper, Edgy, Feisty, Gutsy, Hardy, Intrepid, Karmic), but the last year my focus on this project has been to transition these responsibilities to Canonical who have more time and resources to support the initiative.

I’m happy to say that I’ve finally followed my own advice. For my personal Amazon EC2 servers (including for the Alestic.com web site) I am using Ubuntu 9.10 Karmic images published for EC2 by Canonical.

While I was making the transition, I also switched to EBS boot AMIs. However, since it sounds like Canonical is not planning to publish EBS boot AMIs until Lucid, I decided to continue in service to the community and make available EBS boot AMIs for running Ubuntu on EC2.

I have published EBS boot AMIs for Ubuntu 9.10 Karmic and Ubuntu 8.04 Hardy, both 32- and 64-bit architectures, in all current EC2 regions, for a total of a dozen new AMIs.

I chose to use the exact Ubuntu images which Canonical built for running Ubuntu on EC2. This means that these EBS boot AMIs work exactly the same as the official Canonical AMIs including ssh to the ubuntu user. Again, even though I’m publishing the EBS boot AMIs for Karmic and Hardy, the contents of the image were built by Canonical.

The EBS boot AMIs are listed on Alestic.com. I have restructured the table to better feature Canonical AMIs, and now you need to pick an EC2 region to see the IDs.

Give the EBS boot AMIs a spin and let me know if you run into any issues.

How to Report Bugs with Ubuntu on Amazon EC2: ubuntu-bug

// January 25th, 2010 // Comments Off // Uncategorized

The official Ubuntu AMIs published by Canonical for EC2 starting in October have proven to be solid and production worthy. However, you may still on occasion run into an issue which deserves to be brought to the attention of the Ubuntu server team developing these AMIs and the software which enables Ubuntu integration with EC2.

The easiest, most efficient, and most complete way to report problems with Ubuntu on EC2 is to use the ubuntu-bug tool which comes pre-installed on all Ubuntu systems.

The ubuntu-bug command requires a single argument which is one of:

  1. the name of an Ubuntu software package experiencing a problem,

  2. the path to a program related to the problem,

  3. the process id of the program experiencing the problem, or

  4. the path of a crash file.

When reporting EC2 startup issues with an Ubuntu instance, the involved package is generally ec2-init so the command to run would be:

ubuntu-bug ec2-init

This command should be run on the EC2 instance that is experiencing the problem. The ubuntu-bug command will collect relevant information about the instance and file it with the bug report to assist in tracking down and correcting the issue.

If the instance with the problem is no longer running or accessible, try to run another instance of the same AMI to report the bug. This will help submit the correct AMI information with the bug report.

If ubuntu-bug reports “This is not a genuine Ubuntu package” you might have to first run

sudo apt-get update

and then try again.

Unfortunately, ubuntu-bug is an interactive program which does not accept command line options to set choices, so you will need to respond to a couple prompts and then copy and paste a URL it provides to you. First, it asks:

What would you like to do? Your options are:
  S: Send report (1.5 KiB)
  V: View report
  K: Keep report file for sending later or copying to somewhere else
  C: Cancel
Please choose (S/V/K/C): S

Respond by hitting the “S” key because you really do want to report a problem.

ubuntu-bug then displays a URL and asks if you would like to launch a browser.

Choices:
  1: Launch a browser now
  C: Cancel
Please choose (1/C): C

Respond by hitting the “C” key as ubuntu-bug running on the EC2 instance can’t launch the web browser on your local system and you probably don’t want to use a terminal based browser.

Make a note of the URL displayed in:

*** To continue, you must visit the following URL:
  https://bugs.launchpad.net/ubuntu/+source/ec2-init/+filebug/LONGSTRINGHERE?

Copy the URL and paste it into your web browser. You will continue reporting the problem through your browser and the system information will be attached after you submit.

If this is the first time you have used Launchpad.net, you will be prompted to create an account. Use a valid email address as you will need to confirm it.

Launchpad will prompt you to enter a “Summary” which should be a short description of the bug. If it is not a duplicate of one of the bugs already entered, click “No I need to report a new bug” and enter the “Further Information”. Include as much information as possible relevant to the issue. If a developer can reproduce the bug using this description, then it will be addressed more easily.

For general information on submitting bugs in Ubuntu, please see:

https://help.ubuntu.com/community/ReportingBugs

You can see also see a current list of open ec2-images bugs.

If you are reporting Ubuntu on EC2 bugs directly using Launchpad without ubuntu-bug (not recommended) make sure you include the AMI id and tag the bug with “ec2-images”.

Note that ubuntu-bug is not a mechanism to support general support questions. One place to get help with running Ubuntu on EC2 is from the community in the ec2ubuntu Google group and there’s always the general Amazon EC2 forum. You can occasionally get live help with Ubuntu on EC2 on the #ubuntu-server IRC channel on irc.freenode.net

Nick Barcet: PHP libraries for EC2?

// January 12th, 2010 // Comments Off // Uncategorized

Following up on the action I accepted during last week's server meeting, I searched the web a bit for PHP libraries for EC2 (or AWS in general).

As far as I can tell, a few exist, but only a few of those seem to be maintained regularly:

As I might be missing some, if you have been using those or other PHP libraries to control EC2, could you please speak up and let us know what library you used and how you liked it?

read more

PHP libraries for EC2?

// January 12th, 2010 // Comments Off // Uncategorized

Following up on the action I accepted during last week's server meeting, I searched the web a bit for PHP libraries for EC2 (or AWS in general).

As far as I can tell, a few exist, but only a few of those seem to be maintained regularly:

As I might be missing some, if you have been using those or other PHP libraries to control EC2, could you please speak up and let us know what library you used and how you liked it?

read more

Exploring S3 based filesystems S3FS and S3Backer

// January 11th, 2010 // Comments Off // Uncategorized

In the last couple of days I've been researching Amazon S3 based filesystems, to figure out if maybe we could integrate that into an easy to use backup solution for TurnKey Linux appliances.

Note that S3 could only be a part of the solution. It wouldn't be a good idea to rely exclusively on S3 based automatic backups because of the problematic security architecture it creates. If an attacker compromises your server, he can easily compromise and subvert or destroy any S3 based automatic backups. That's bad news.

S3 performance, limitations and costs

S3 performance

  • S3 itself is faster than I realized. I've been fully saturating our server's network connection and uploading/downloading objects to S3 at 10MBytes/s.

  • Each S3 transaction comes with fixed overhead of 200ms for writes and about 350ms for reads.

    This means you can only access about 3 objects a second sequentially, which will of course massively impact your data throughput. (e.g., if you read many 1 bytes objects sequentially you'll get 3 bytes a second)

S3 performance variability

S3 is usually very fast, but it's based on a complex distributed storage network behind the scenes that is known to vary in its behavior and performance characteristics.

Use it long enough and you will come across requests that take 10 seconds to complete instead of 300ms. Point is, you can't rely on the average behavior ALWAYS happening.

S3 limitations

  • Objects can contain a maximum of 5GB.
  • You can't update part of an object. If you want to update 1 byte in a 1GB object you'll have to reupload the entire GB.

S3 costs

Three components:

  1. Storage: $0.15 GB/month
  2. Data transfer: $0.1 GB in, $0.17 GB out
  3. Requests: $0.01 per 1000 PUT requests, $0.01 per 10,000 GET and other requests

A word of caution, some people using S3 based filesystems have made the mistake of focusing on just the storage costs and forgotten about other expenses, especially requests, which look so inexpensive.

You need to watch out for that because using one filesystem under the default configuration (4KB blocks), storing 50GB of data cost $130 just in PUT request fees, more than 17X the storage fees!

Filesystems

s3fs

Which s3fs?

It's a bit confusing but there are two working projects competing for the name "s3fs", both based on FUSE.

One is implemented in C++, last release Aug 2008:

http://code.google.com/p/s3fs/wiki/FuseOverAmazon

Another implemented in Python, last release May 2008:

https://fedorahosted.org/s3fs/

I've only tried the C++ project, which is better known and more widely used (e.g., the Python project comes with warnings regarding data loss) so when I say s3fs I mean the C++ project on Google Code.

Description

s3fs is a direct mapping of S3 to a filesystem paradigm. Files are mapped to objects. Filesystem metadata (e.g., ownership and file modes) are stored inside the object's meta data. Filenames are keys, with "/" as the delimiter to make listing more efficient, etc.

That's significant because it means there is nothing terribly magical about a bucket being read/written to by s3fs, and in fact you can mount any bucket with s3fs to explore it as a filesystem.

s3fs's main advantage is its simplicity. There are however a few gotchas:

  • If you're using s3fs to access a bucket it didn't create and have objects in it that have directory-like components in their names (e.g., mypath/myfile), you'll need to create a dummy directory in order to see them (e.g., mkdir mypath).

  • The project seems to be "regretware". The last open source release was in August 2008. Since then the author seems to have continued all development of new features (e.g., encryption, compression, multi-user access) as a commercial license (subcloud), and with that inherent conflict of interest the future of the GPLed licensed open source version is uncertain.

    In fact a few of the unresolved bugs (e.g., deep directory renames) in the open source version have been long fixed in the proprietary version.

  • No embedded documentation. Probably another side-effect of the proprietary version, though the available options are documented no the web site.

  • Inherits S3's limitations: no file can be over 5GB, and you can't partially update a file so changing a single byte will re-upload the entire file.

  • Inherits S3's performance characteristics: operation on many small files are very efficient (each is a separate S3 object after all)

  • Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.

    OTOH, s3fs supports a disk cache, which can be used to mitigate this limitation.

  • Watch out, the ACL for objects/files you update/write to will be reset to s3fs's global ACL (e.g., by default "private"). So if you rely on a richer ACL configuration for objects in your bucket you'll want to access your S3FS bucket in read-only mode.

  • By default, s3fs doesn't use SSL, but you can get that to work by using the -o url option to specify https://s3.amazonaws.com/ instead of http://s3.amazonaws.com/

    It's not documented very well of course. Cough. Proprietary version. Cough.

S3Backer

S3Backer is a true open source project under active development, which has a very clever design.

Also based on FUSE but instead of implementing usable filesystem on top of S3 it implements a virtual loopback device on top of S3:

mountpoint/
    file       # (e.g., can be used as a virtual loopback)
    stats      # human readable statistics

Except for this simple virtual filesystem S3Backer doesn't know anything about filesystems itself. It just maps that one virtual file to a series of dynamically allocated blocks inside S3.

The advantage to this approach is that it allows you to leverage well tested code built into the kernel to take care of the higher level business of storing files. For all intents and purposes, it's just a special block device (e.g., use any filesystem, LVM, software Raid, kernel encryption, etc.).

In practice it seems to work extremely well, thanks to a few clever performance optimizations. For example:

  • In-memory block cache: so rereads of the same unchanged block don't have to go across the network if it's still cacehd.
  • Delayed, multi-threaded write queue: dirty blocks aren't written immediately to S3 because that can be very inefficient (e.g., the smallest operation would update the entire block). Instead changes seem to be accumulated for a couple of seconds and then written out in parallel to S3.
  • Read-ahead algorithm: will detect and try to predict sequential block reads so the data is available in your cache before you actually ask for it.

Bottom line, under the right configuration S3Backer works well enough that it can easily saturate our Linode's 100Mbit network connection. Impressive.

In my testing with Reiserfs I found the performance good enough so that it would be conceivable to use it as an extension of working storage (e.g., as an EBS alternative for a system outside of EC2).

There are a few gotchas however:

  • High risk for data corruption, due to the delayed writes (e.g., your system or the connection to AWS fails). Journaling doesn't help because as far as the filesystem is concerned the blocks have already been written (I.e., to S3Backer's cache).

    In other words, I wouldn't use this as a back up drive. You can reduce the risk by turning off some of the performance optimizations to minimize the amount of data in limbo for write to Amazon.

  • too small block sizes (e.g., the 4K default) can add significant extra costs (e.g., $130 for 50GB with 4K blocks worth of storage)

  • too large block sizes can add significant data transfer and storage fees.

  • memory usage can be prohibitive: by default it caches 1000 blocks. With the default 4K block size that's not an issue but most users will probably want to increase block size.

    So watch out 1000 x 256KB block = 256MB.

    You can adjust the amount of cached blocks to control memory usage.

Future versions of S3Backer will probably include disk based caching which will mitigate data corruption and memory usage issues.

Tips:

  • Use Reiserfs. It can store multiple small files in the same blocks.

    It also doesn't populate the filesystem with too many empty blocks when the filesystem is created which makes filesystem creation faster and more efficient.

    Also, with Reiserfs I tested expanding and shrinking off the filesystem (after increasing/decreasing the size of the virtual block device) and it seemed to work just fine.

  • Supports storing multiple block devices in the same bucket using prefixes.

Conclusions

  • s3fs: safe, efficient storage of medium-large files. Perfect for backup / archiving purposes.
  • S3Backer: high performing live storage on top of S3. EBS alternative outside of EC2. Not safe for backups at this stage.

Blodget on Apple’s impending thrashing by Google …

// January 10th, 2010 // Comments Off // Uncategorized