Elastic Computing - WT8P's Notes to Self

With my products’ release imminent, I spent last weekend coaxing Amazon’s Elastic Computing Cloud (ECC) into action. The ECC is appealing for a few reasons:

It’s someone else’s high-throughput network – product distributions, especially with demo files, are in the hundreds of megabytes. The T1 at work is 1.4Mb/s. Sharing, firewalls, and proxies reduce my desktop’s throughput to 80Kb/s. If it’s a weekend, or freaky hours, it’s sometimes higher. Downloading an 120Mb file would take over 20 minutes. Amazon offers a 250Mb/s link. I couldn’t afford one of my own.
Scalable up and down. For the first few weeks of a release, there is a spike in downloads. During my last major release, there were times when we had hundreds of concurrent connections. We set up an external download site, but even that had bogged down. As many want a year-long contract, it’s not cost effective to have more than one for the one month that we really need it. (See pricing, below.) With the EC2, I can replicate “instances” within hours. When the load subsides, they’re easily removed.
I pay for what I use. Pricing for is based on a formula:

$0.10 per hour for each server instance
$0.10 per GB – uploaded data
$0.18 per GB – first 10 TB / month downloaded ($0.16 next 40TB, $0.13 over 50Tb)

Multiplied out, it’s a very reasonable way to temporarily scale up the resources. For the download peaks, I’m estimating it’ll cost me about $200 for the first month after the product release. It scales back to $80/month. And that’s for a pretty reliable network.

In comparison, the file hosting services I’ve contacted wanted $200/month for allotments rooted in a 1990s mindset: 5Gb of storage and 20Gb transfer per month. Overage was $1.00/Gb. The overage during the first month of a release would buy a new bicycle. The hosting services were nonplussed as to why I thought this was a bad deal. They were also unwilling to budge on price.

For my use case, ECC sounds great. Except the system appears to have been acquired from different companies. I needed three sets of accounts:

Amazon Web Services (AWS). Associated with this are an account ID (e.g. 5150–1984–0812), a subscription ID (optional, since I’m an old customer), and a key pair whose public key that looks like “123LOVE28CHOCOLATE5.” I had one from 2002 when I had set up an AWS store, but I needed to dig out the information by logging into my separate, Amazon Associates account.
Amazon S3 (simple storage solution). I had set up one of these accounts in March. The sample tools provided were cryptic in their use. I managed to hook it up to my web server, but found the model useless because my web server would first try to download the file before sending it to the end-user. What was amusing was being billed $0.02 a month for the test file I had sitting there. I thought they ought to just batch it up at $5 minimum, or pre-bill me in $5 increments. Writing up the expense report for the two bits is a waste of time.
EC2 (Elastic Computing Cloud). Setting this up required generation of an X.509 certificate.

Keeping the accounts, logins, certificates, and keys straight made installation fomplicated. Steps were:

Download the command-line tools. To run them, I needed to install a newer Java Runtime Environment on my web host. The ec2 command processes kept getting killed for exceeding cpu/size limits. Fallback: my Windows desktop at work. I ensured the auto-update feature was shut off.
Set up a bunch of environment variables. This is where the instructions would have benefited from a technical writer. Just list all of the environment variables up front: JAVA_HOME, EC2_HOME, EC2_PRIVATE_KEY, EC2_CERT, and PATH.
Generate a PEM encoded X.509 key pair. I have no idea what this means, but I do know that choosing long, unique names that only our robotic overlords would love affects usability. For example, instead of using this in every subsequent example:

/mnt/pk.Th1snAmE5UcK5A55WHyTh37734d0TH3yD007h15.pem

Try this:

/mnt/pk.pem
Using puttygen, convert the private key to something “putty” (the secure shell for windows) uses.
Install and start a server “instance.”
Authorize network access to ports 22 (ssh) and 80 (http).
Use secure shell to login. Since there’s no password for root, I had to manually configure putty to load the certificate I generated previously. As far as I can tell, this isn’t saved anywhere for reuse. (This is a limitation of putty.)
Everything on the system was already running.
Uploading 4Gb of files took about twelve hours. I didn’t wait.
Make a system snapshot for future restoration and replication. Their instructions had some errors, but once corrected, the script runs for about two hours creating a virtual file system, compressing it, then splitting it into digestible subunits.
Upload the snapshot to S3. This was very quick.

Network performance with one server instance during the initial beta period has exceeded expectations. An European customer saw throughput of over 15x what they’d been used to. Best endorsement: “Do you have a server in Europe?” Accurate answer: “It’s on the Internets. Somewhere.” No one has complained about the increased download size.