Simulations in the Cloud

Update: Here’s another post with an alternate way of configuring SSH on your VM Configuring a VM for SSHing Out

Updated at the end to add an alternative script including password-protection of your script and transcript logging.

How I would have loved Amazon EC2 back when I was working on my Ph.D. thesis! My simulations frequently ran for hours, and would take over my laptop’s CPU and memory while they were running. I eventually invested in a Mac Mini and was able to do other work on my laptop while they were running, but simulations are one of those classic cloud use cases where you have a short-term unpredictable need for rather a lot of computing power.

So, today I want to describe a workflow for getting a simulation, or any other batch job, up and running on an Amazon EC2 instance in an automated and repeatable fashion. I am going to use Arch Linux since that’s the Linux flavour I’m most familiar with. Arch has a very nice packaging system called pacman which makes this task very smooth. I’m sure you could make this work with other flavours of Linux, and in fact they might have the advantage of a more mature selection of ready-made machine images.

The basic premise is to have a script which configures a bare virtual machine with all the software you need, and either stops there for you to step in and run your simulation, or carries on and runs your simulation for you. We are going to use Parallels to create our own Virtual Machine locally (from scratch!) and then once we have our script developed, we are going to run it on an Amazon EC2 instance in the cloud.

To start with, you’ll need to activate your Amazon account (the same one you order books with) to work with EC2. Click on the Sign in to the AWS Console link and enter your Amazon username and password (or create a new account if you want to keep things separate). If you’ve never done this before, you’ll be asked to verify your account by telephone. An Amazon robot will phone the number you give them, and you’ll have to input a PIN number. I had to try this twice, the first time my phone never rang, but the second time it worked perfectly. This just takes a few minutes, and then you’re ready to start spending money by renting chunks of cloud. Hooray!

The instructions you’ll see after you log in to the AWS console are geared towards the command line tools. You actually don’t need to worry about creating certificates or anything like that just yet. We’re just going to use the web interface and this will guide you step by step. If you want to, this would be a good time to download the Getting Started Guide and read through it. It’s available from the Documentation Centre along with lots of other goodies. Also, make yourself aware of the fees for Amazon EC2. There’s no charge for signing up, only when you actually launch and use an instance.

Now, we are actually going to move away from Amazon for a little while and create a Virtual Machine locally using Parallels. We will use this VM to develop our setup script. If you don’t have access to Parallels (or a 14 day free trial thereof) or any alternative, you can use an EC2 instance to develop your install script, but you’ll be paying by the hour and it won’t be as easy to reset the VM and start over. Of course if you just want to try this out, you’re welcome to just run my script in an EC2 instance. Just change or delete the WebDAV credentials or else the script will fail.

It’s quite straightforward to create a Virtual Machine from scratch using Parallels. You’ll need to download an ISO image of the operating system you want to install. In this case, the Arch Linux core i686 ISO. (I’m on my little laptop which is only 32 bit. If you have a 64 bit machine then you can download the x86_64 ISO instead.) These are large files, over 300 MB, so be patient and use Bit/Torrent if possible. The download page for Arch Linux is here.

Once the ISO is downloaded, then in Parallels choose “New…” from the File menu. This will bring up the New Virtual Machine Assistant. Click on the Skip Detection button, and in the next dialog choose More Linux → Other Linux.

Finish navigating the assistant, defaults are fine or tweak as you see fit. When you finish, you should have a blank machine which look something like this (for Parallels 5):

When you start this machine up, it will tell you that it’s about to install the operating system. In the CD/DVD drive selection, choose the downloaded ISO image. The Arch Linux install should begin now. If you haven’t used Arch before, then check out the Beginner’s Guide and Install Guide for help. Parallels should take care of network connectivity for you. We don’t have a “Parallels Tools” for this VM though, so things like copying and pasting from the parent OS won’t be possible. Also, resizing the VM window won’t actually make the screen any bigger, it will just give you black space around your console. I assume these are solvable issues, but it hasn’t been worth my while to do so yet. I didn’t time the install (sorry!) but it shouldn’t take more than 10 or 15 minutes. You should be prompted to reboot when the install is finished, and then you should have a shiny new system which looks something like this:

Now, before we do anything further, we want to switch everything off. Type shutdown -h now to power down the virtual machine. When it’s off, choose “Take Snapshot…” from the Virtual Machine menu in Parallels. Name the snapshot “Clean Install”. Now, we have a fresh install snapshot which we can revert to at any time. This is what will make it easy for us to develop and test our setup script which we will eventually be putting to good use on our EC2 instance. Of course, you might also find it useful to simply run simulations in this local virtual machine in case all you want is a repeatable clean environment. Having a standard clean image like this is great if you are writing a tutorial and want to be sure you have captured all the prerequisites people need to install. It’s very easy to forget that there’s a hidden dependency on tool X or Y when you installed it 2 years ago and don’t even remember it’s there! If you test your install script on a clean VM, that sort of error is much less likely, and at least someone having difficulty can recreate the conditions you used by constructing a similar VM.

Go to “Manage Snapshots…” to double check and you should see a nice little graphic like this:

Right. The boring stuff is done and now we can get to work on our setup script. Actually, we are going to need to do 1 step manually. We need to install curl on our shiny new VM so that we can fetch our setup script. Type pacman -Sy curl. If you get an error message, you might need to uncomment your local mirror in /etc/pacman.d/mirrorlist. You might be told that you need to upgrade pacman first. If so, go ahead, then run pacman -Sy curl again when that’s finished. Now, if you want, take another Snapshot at this point so you don’t have to do this part again.

Here is a sample script which illustrates some of the considerations needed to run simulations automatically. I am using the SimPy framework, and running all the examples in the SimPyModels directory as a demonstration. When finished, we are using WebDAV to export the simulation results for later analysis.

# Upgrade the system and install developer tools.
pacman --noconfirm -Sy pacman
pacman --noconfirm -Syu
pacman --noconfirm -Sy base-devel 

# Install language and language packaging system
pacman --noconfirm -Sy python

curl -O
bash setuptools-0.6c11-py2.6.egg

# Install simulation framework
easy_install simpy

# Install subversion to export examples more easily
pacman --noconfirm -Sy subversion

# Download code to run
svn export

# Create a directory to store results
mkdir results

# Execute the code
for m in $MODELS
  OUTFILE=`basename $m`
  python $m > results/$OUTFILE.out

# Bundle files we want to save for export
export FILENAME=results-`uuidgen`.tgz
tar -czvf $FILENAME results

# Install cadaver for WebDAV
pacman --noconfirm -Sy cadaver

# Create a config file for cadaver
cat <<EOF > ~/.netrc
login temp
password temp

# Upload the tgz of results to remote webdav store
echo "put $FILENAME" | cadaver

I developed this script by trial and error, and a lot of iteration on a fresh VM to validate it. I stored the script on a remote server and ran it via:

curl -O

This way the script wouldn’t get erased when I reset the VM, and I am mimicking the procedure I will eventually use on EC2. All I need to do to test the script is to revert the VM to the post-install-curl state, then type these 2 lines of code and watch the results scroll by.

The goal should be to install as little as possible to get the desired software to run. Pacman and easy_install do a lot of the work, helped out by curl when you need to fetch a binary from somewhere and subversion to export examples from the SimPy repository.

The most difficult part of developing this script was deciding how to export the data out of the VM when finished. I eventually settled on WebDAV since I think it’s the simplest option, and pretty widely available. It also has the nice feature that your results can be automatically published to the web if you want to do so. I’ll discuss this further below after we’ve looked at Amazon. Incidentally, I used UUID rather than timestamps to differentiate the results files because VMs can lose the correct time if they are stopped and reset a lot.

Right. Now to run this on EC2! Log in to your EC2 management console and click on AMIs. Set the select boxes to “All Images” and “All Platforms” and search for “arch”. Make sure the region is US-East (it doesn’t matter where in the world YOU are, this is where your EC2 instance will run, US-East is less expensive and has more public images to choose from). I use ami-092ac960, as described in this article This is a 64-bit AMI which means that we will need to run a large instance, which is a few times more expensive than running a small instance (see price list). (I haven’t been able to get the 32-bit version of this to work, which would work on a small instance. I’ll post an update if/when I do.)

There are various other Arch images up there, some of them might be better options, I haven’t researched them all but please leave a comment if you have any suggestions. Remember when you launch an image that you are running someone else’s code. It’s possible, however unlikely, that someone could have uploaded a nasty image which you will be paying to let loose on the internets! Use at your own risk etc., and here are some tips from Amazon.

Okay, if you want to go through with this step, then right-click on the image and choose Launch Instance. This will bring up a Launch wizard. The first step is to create a Key Pair. Just stick in your name and click “Create & Download Your Key Pair”. You will end up with a file called “something.pem” in your Downloads directory. Just leave it there for now.

Next, you’ll be asked to set up a security group. This means you’re setting access permissions for the virtual machine. In order to SSH into this machine from outside, which you’ll probably want to do, you have to enable this explicitly. There is already a “default” security group which DOESN’T have SSH access. So, let’s create another one called ssh. Make sure the SSH box is checked and click Continue.

Now we get to the final screen (and your last chance to Cancel and not pay Amazon any money!). Enter “1” in the number of instances. Make sure the Key Pair Name and Security Groups are set to the ones you just created (they will be by default). Now, you know our nice setup script? The one we just got ready? Well, click on “Advanced Options” and you’ll see a User Data field. Paste your script right in here and as soon as your instance is launched, your script will start to run. Because we’ve tested it on a generic Arch linux install, it will hopefully work straight away when we run it on an EC2 Arch install. That’s the idea, anyway. You can either paste your actual script, or the 2 line curl -O … script as above. Curl is already installed on this image.

If you wish, go ahead and launch this instance. It can take a few minutes to launch, but pretty soon you should see a green dot in your instances list.

If you right click on this green instance, you’ll see a few useful things. If you right click and choose Connect, you’ll get instructions on how to SSH in to your virtual machine.

To make life easy, cd into your downloads directory (so you’re in the same directory as the pem file you downloaded a few minutes ago). As per the instructions, chmod 400 the pem file. Then copy and paste the ssh instructions onto your command line. You’ll get a known hosts warning message since you’ve never connected to this host before. If all has gone well, you’ll now be in the cloud. How cool is that!!

The management console also has a built-in system log viewer, so you can see if something has gone wrong or follow the progress of your startup script.

Finally, when you’re finished with everything, don’t forget to right click and terminate this instance. You’re billed in whole-hour increments.

So, hopefully you’ll actually have very little to do in the cloud since we’ve gotten all the prep work done in our local Virtual Machine and the script will be launched automatically and will WebDAV out the data when it’s finished.

You can experiment with using larger images, there’s Quadruple Extra Large or High-CPU Extra Large, and see if they improve your simulation run times. If you are going to make a habit of this, then you’ll want to be systematic and build a little minimal benchmarking into your scripts so you can assess what the benefits are of using a larger image. Remember, even though it’s “the cloud”, there is still energy being used to power your CPU units, so don’t be wasteful. The good news is that this is much more efficient and cost effective than buying a machine just for the occasional simulation run. It’s also accessible from anywhere in the world, any time.

We have run all this via the web interface for EC2, so we were clicking on buttons and doing other interactive things. However, this process could be fully automated if you used Amazon’s command line interface for EC2. For now, I don’t need that level of automation, but it is something I plan to develop and it would be very nice to have that built in to a simulation framework. The command line tools are a little finicky to set up, but they would be a good long-term investment if you’re likely to do a lot of this. Also, you might find that developing a custom AMI would be a useful investment.

I mentioned the issue of how to get your data out of EC2 when the simulation has finished. I settled on WebDAV since I wasn’t too concerned about having WebDAV credentials exposed in a public script temporarily. The obvious alternative of using scp would mean I’d need to get a private SSH key into the virtual machine somehow, perhaps by pasting it into the user data field and writing it to some location on the new machine which a script could access. Here are some other alternatives I considered:

  • Email yourself the file. Drawbacks: (a) having to install some mail-sending capability and (b) Amazon image IP addresses have been abused by spammers, so you might never get your email.
  • FTP yourself the file. Similar enough to WebDAV.
  • scp from outside, using the pem file you downloaded. scp has the same -i option as ssh, as described above. Fine, and very secure, but it’s a manual job unless you have a cron which checks every 5 minutes for you.
  • Amazon Elastic Block Store (EBS) This is a promising option for future development. EBS is persistent storage designed for EC2. However, in order to get data OUT of an EBS volume you need to store it to S3 (or extract it by some other means in a running EC2 instance). So, it’s a little involved. But, once you had this set up, it might be the most convenient option long-term. There are fees for using EBS and S3.

Okay, so there are a lot of options here, and you can even get more creative and perhaps run a web server on the remote machine so you can download or interact with your data there. Or, maybe someone can come up with a nice tool accessible via curl which generates a temporary SSH key to let you scp a file somewhere secure, which you can access later. My goal was to just find 1 way of doing this, and the WebDAV works quite nicely for me.

I think the idea of having a clean, reproducible environment based on VMs and AMIs gives you a lot of flexibility in terms of getting access to extra computing power when you need it, and sharing scripts without necessarily needing to have access to the same exact image. My script should work with any recent build of Arch Linux, not my particular VM, as demonstrated by the fact that I developed it on one VM and then ran it on another one. Of course, you may find differences between various environments, but the fact that you are using a standardized process will make it easier to identify and compensate for such differences. I will now consider developing standard install scripts for some of my software projects, especially with regard to producing automated documentation, so that it’s easy for people to reproduce my results and to help themselves to troubleshoot installation issues they may be having on their own machines.

Update: Here is another version of this script with some more ideas.

This script installs python and ruby, along with some ruby gems, and installs the bazaar version control system with paramiko (needed for ssh/sftp). It uses bazaar’s sftp facility to access a private bazaar repo via ssh. You are expected to redirect the output from this script into a file script.out, and script.out is included with the results bundle so you have a record of what happened to produce your data. The first few lines ensure that a copy of the script is included in the script.out transcript.

This script is intended to be stored in a password-protected web directory, possibly the same as the WebDAV directory which results will be pushed to at the end of the script. You would call this script via:

curl -O
bash &> script.out

In interactive use, i.e. when developing and validating your script, you would probably want to do something like this:

curl -O
cat # Double check you got what you expected
bash &> script.out &
tail -f script.out # Follow what's happening

Here is the script:

cat $0

printf "=================================================="
printf "\nOutput from running above script:\n\n\n"

# Upgrade the system and install developer tools.
pacman --noconfirm -Sy pacman
pacman --noconfirm -Syu
pacman --noconfirm -Sy base-devel 

# Install language and language packaging system.
pacman --noconfirm -Sy python

curl -O
bash setuptools-0.6c11-py2.6.egg

pacman --noconfirm -Sy ruby

# Install gems.
gem install do_sqlite3
gem install datamapper

# Install bazaar.
easy_install bzr
easy_install paramiko

# Download code to run.
bzr branch s

# Create a directory to store results.
mkdir results

# Execute the code.
cd simcode
for m in $MODELS
  OUTFILE=`basename $m`
  printf "\nRunning file: $m\n\n"
  ruby $m > ../results/$OUTFILE.out
  mkdir ../results/$OUTFILE.dir
  mv results/* ../results/$OUTFILE.dir/
cd ..

# Copy results of script.out into results.
# Assuming you run this script with:
# bash &> script.out &; tail -f script.out
cp script.out results/

# Bundle files we want to save for export.
export FILENAME=results-`uuidgen`.tgz
tar -czvf $FILENAME results

# Install cadaver for WebDAV.
pacman --noconfirm -Sy cadaver

# Create a config file for cadaver.
cat <<EOF > ~/.netrc
login webdabuser
password webdavpassword

# Upload the tgz of results to remote webdav store.
echo "put $FILENAME" | cadaver

Austin 12 Nov 2009

A few alternatives to your suggestions. Not necessarily better, but I will post them for others to decide. First, the alestic Ubuntu (and probably Debian) AMIs come with something called runurl, it allows you to include URLs of scripts to be run through your user-data. Check out the blogpost here:

Next, as far as getting data out of an AMI, two additional ideas. You mentioned S3 and EBS, well, there is an s3sync tool that works like rsync, but can sync with S3 as the source or destination, so syncing EBS to S3 is a oneliner. The other option that came to mind was to run an rsync server somewhere and rsync the data to that.