September 12, 2012 10:34 / 6 comments / aws flask python

The other day I noticed I had a couple thumbdrives kicking around with various versions of my "absolutely do not lose" files...stuff like my private keys, tax documents, zips of papers I wrote in college, etc. These USB drives were all over the house, and many contained duplicate versions of the same files. I thought it would be neat to write a little app to give me a web-based interface to store and manage these files securely. In this post I'll talk about how I built a web-based file storage app using flask, pycrypto, and amazon S3.

Design goals

The goal was simple: replace my USB drives with a secure way of storing my files in the cloud. I wanted a web app so I could access my files anywhere. Furthermore, if I use a service like S3, then I could feel safe knowing my files were going to be there tomorrow (99.999999999% durability). Amazon S3 also provides an extra layer of versioning and server-side encryption if I want to use it.

A quick note on amazon's encryption

The server-side encryption provided by AWS ensures that the data written to disk in their data centers is encrypted using a 256-bit AES block cipher. All encryption key management is taken care of by amazon transparently. I did a little test using boto to see how this worked because I was unsure what would happen if I made that key publicly readable. What I wondered was, if I store some data using server-side encryption then access it via a public API (e.g. curl), would the data still be encrypted? The answer was "no":

>>> import boto
>>> s3 = boto.connect_s3(<my access key>, <my secret key>)
>>> bucket = s3.get_bucket('media.charlesleifer.com')
>>> key = bucket.new_key('testing')

>>> # store a string and encrypt it
>>> key.set_contents_from_string('testing', encrypt_key=True)
7
>>> # make key public readable
>>> key.set_acl('public-read')

Then in another terminal I did:

$ curl http://media.charlesleifer.com/testing
testing

As you can see, the encryption only applies to the data as it resides on amazon's hardware. When it comes back it is unencrypted. This is great for ensuring my data is safe when it sits on Amazon's hardware, but it really doesn't help me ensure only I can unencrypt my files when I need them.

If you're using the AWS Java SDK there is support for client-side encryption using "envelope encryption", which involves a combination of 1-time use keys and a private encryption key you generate. Full details are here. This is what I was looking for, but since the python API did not have support I ended up needing to look elsewhere for a library.

Setting up a virtualenv

If you want to replicate the setup I will describe below, I suggest creating a virtualenv and installing:

In the interests of this post not being a "wall of code" I'm only going to include the relevant parts and will assume that you can fill-in-the-gaps to include templating and such.

Start with the web frontend and file model

Let's start with the web app. I really like using Flask for things like this -- it's lightweight, stays out of the way, and has great documentation. For simplicity, there will only be three views:

In order to keep track of what files have been uploaded, I used peewee to create a lightweight file model. The File model might look something like this:

import mimetypes
from peewee import *

class File(Model):
    filename = CharField()
    created_date = DateTimeField(default=datetime.datetime.now)
    encrypted = BooleanField(default=False)

    def mimetype(self):
        return mimetypes.guess_type(self.filename)[0]

Displaying a list of files

The index view is very simple and just displays a list of File instances:

from flask import render_template

@app.route('/')
def index():
    return render_template('index.html', files=File.select().order_by('filename'))

A simple upload view and handler

Adding a file is also easy. The add view will call a little "upload_handler" which for now will just stash the file on disk but in a little bit we'll replace it with the code to upload to S3.

from flask import redirect, render_template
from werkzeug import secure_filename

@app.route('/add/', methods=['GET', 'POST'])
def add():
    if request.method == 'POST':
        file_obj = request.files['file']
        instance = File(filename=secure_filename(file_obj.filename))
        upload_handler(instance, file_obj)
        instance.save()
        return redirect(url_for('index'))
    return render_template('add.html')

Here is how we might implement a very simple file-based upload handler. This is really just for reference as in the next section we'll be replacing it and using boto to upload to S3:

def upload_handler(instance, file_obj):
    dest_filename = os.path.join('/var/www/media/', instance.filename)
    with open(dest_filename, 'wb') as fh:
        fh.write(file_obj.read())

Download view

Lastly, here is a glimpse at a minimal download view. It simply determines the filename by looking the file up by id, then redirects to a static media server. Later we will add logic for handling decryption:

import os
from flask import redirect

@app.route('/download/<int:file_id>/', methods=['GET', 'POST'])
def download(file_id):
    try:
        file = File.get(id=file_id)
    except File.DoesNotExist:
        abort(404)

    return redirect(os.path.join('/media/', file.filename))

Hopefully the workings of these three views is clear. The index displays a list of File objects, the add view handles uploads, and the download view looks up the file based on ID and redirects to the appropriate location.

Uploading to S3

Before covering the encryption/decryption logic, let's look into uploading the files to S3. If this is your first time reading about S3 I recommend looking at Amazon's S3 guide.

First let's write a little helper function to retrieve our S3 bucket. This is where all of our files will be stored. The credentials can be found on your AWS "Security Credentials" page.

import boto

def get_bucket(access_key_id, secret_access_key, bucket_name):
    conn = boto.connect_s3(access_key_id, secret_access_key)
    return conn.get_bucket(bucket_name)

Modifying the upload handler to store data in S3

The first function we will modify is the "upload_handler". Now, instead of writing the uploaded file to the local filesystem we will put it in our S3 bucket:

def upload_handler(instance, file_obj):
    # access our S3 bucket and create a new key to store the file data
    bucket = get_bucket(<access key>, <secret access key>, <bucket name>)
    key = bucket.new_key(instance.filename)
    key.set_metadata('Content-Type', instance.mimetype())

    # seek to the beginning of the file and read it into the key
    file_obj.seek(0)
    key.set_contents_from_file(file_obj)

    # make the key publicly available
    key.set_acl('public-read')

Modifying the download view to redirect to the file on S3

The next change will be to the download view. Since the file is going to be hosted on S3, we will need to redirect to the URL of the file inside the bucket.

from flask import redirect
from urlparse import urljoin

@app.route('/download/<int:file_id>/', methods=['GET', 'POST'])
def download(file_id):
    try:
        file = File.get(id=file_id)
    except File.DoesNotExist:
        abort(404)

    # redirect to the url of the file hosted on S3
    return redirect(urljoin(<bucket url>, file.filename))

The fun part, encrypting and decrypting the file

In order to perform the encryption on the client-side, I chose to use the pycrypto library. It is written in C, so if you install it you will need to build the python extension. The API is fairly low-level, so I ended up writing a small wrapper library on top called "beefish". beefish provides functions for encrypting and decrypting file-like objects using the blowfish cipher.

Adding a password to the upload form

I only care about encrypting some of my files, so I added a "password" field to the upload form. When a password is present, the file will be encrypted using that password. The "add file" form looks something like this:

File upload form

Once again, let's modify the upload handler. This time we'll check to see if the submitted data contains a password, and if so encrypt the uploaded file before sending it off to S3. I'm using StringIO to store the encrypted contents of the file in memory:

from beefish import encrypt
from StringIO import StringIO

def upload_handler(instance, file_obj):
    bucket = get_bucket(<access key>, <secret access key>, <bucket name>)
    key = bucket.new_key(instance.filename)
    key.set_metadata('Content-Type', instance.mimetype())

    # new code here:

    if request.form.get('password'):
        # we received a password, so will need to encrypt the file data
        # before sending it off to S3
        password = request.form['password']
        instance.encrypted = True
        output_buffer = StringIO()
        encrypt(file_obj, output_buffer, password)
        file_obj = output_buffer
    else:
        instance.encrypted = False

    # end of new code

    file_obj.seek(0)
    key.set_contents_from_file(file_obj)
    key.set_acl('public-read')

Decrypting the file in the download view

Since the file is encrypted on the server, our download view will now need to become a little smarter. The logic will change now so that:

Here is how my download form looks:

File download form

Here is the modified download view:

from beefish import decrypt
from flask import send_file
from urlparse import urljoin

@app.route('/download/<int:file_id>/', methods=['GET', 'POST'])
def download(file_id):
    try:
        file = File.get(id=file_id)
    except File.DoesNotExist:
        abort(404)

    if not file.encrypted:
        return redirect(url_join(<bucket url>, file.filename))

    # new logic:
    if request.method == 'POST' and request.form.get('password'):
        # fetch the encrypted file contents from S3 and store in a memory
        bucket = get_bucket(<access key>, <secret access key>, <bucket name>)
        key_obj = bucket.get_key(file.filename)

        # read the contents of the key into an in-memory file
        enc_buffer = StringIO()
        key.get_contents_to_file(enc_buffer)
        enc_buffer.seek(0)

        # decrypt contents and store in dec_buffer
        dec_buffer = StringIO()
        decrypt(enc_buffer, dec_buffer, request.form['password'])
        dec_buffer.seek(0)

        # efficiently send the decrypted file as an attachment
        return send_file(
            dec_buffer,
            file.get_mimetype(),
            as_attachment=True,
            attachment_filename=file.filename,
        )

    # display a password form
    return render_template('download.html', file=file)

Ideas for improvements

There are a lot of things you might do to improve this little app. Here are just a few ideas I had:

  1. Put the site behind SSL if you haven't!
  2. Support putting files into folders -- S3 handles this transparently
  3. Investigate other tools for encryption, such as the python gpg wrapper
  4. If memory becomes a problem, look into encrypting and decrypting on disk
  5. Look into AWS' versioning support

Thanks so much for reading! Feel free to post any questions or comments below.

Links of interest

More like this

Comments (6)

Mitch Garnaat | sep 2012, at 08:22am

Nice article! You can actually set the ACL for the S3 object at the time you upload it rather than using the separate call to key.set_acl(). Just add a "policy='public-read'" keyword param to the set_contents_from_file() call.

Andreas Porevopoulos | sep 2012, at 02:26am

Hello and thank you for your excellent article. I think I have found a typo. At the last download method urljoin is imported from urlparse but url_join is used.

Best regards, Andreas

Andreas Porevopoulos | sep 2012, at 08:14am

I have made a github repository at https://github.com/sv1jsb/encrypted-flask-aws of your article. I have added folder and search capability. Best regards, Andreas

Charles Leifer | sep 2012, at 06:32pm

Thank you for the comments! Andreas -- that is awesome, thanks for putting it into a GH repo - the added functionality is really nice. I've added it to the list of resources in the body of the post.

Aaron | sep 2012, at 03:39pm

Any thoughts about glacier? (Amazon Glacier)

Charles Leifer | sep 2012, at 06:29pm

I'm definitely interested in trying out glacier -- now that its part of boto I'll probably be much more inclined to use it. The biggest thing is that "several hours" retrieval time. However, for storing a large amount of data, its price seems great.


Commenting has been closed, but please feel free to contact me