The other day I noticed I had a couple thumbdrives kicking around with various versions of my "absolutely do not lose" files...stuff like my private keys, tax documents, zips of papers I wrote in college, etc. These USB drives were all over the house, and many contained duplicate versions of the same files. I thought it would be neat to write a little app to give me a web-based interface to store and manage these files securely. In this post I'll talk about how I built a web-based file storage app using flask, pycrypto, and amazon S3.
The goal was simple: replace my USB drives with a secure way of storing my files in the cloud. I wanted a web app so I could access my files anywhere. Furthermore, if I use a service like S3, then I could feel safe knowing my files were going to be there tomorrow (99.999999999% durability). Amazon S3 also provides an extra layer of versioning and server-side encryption if I want to use it.
The server-side encryption provided by AWS ensures that the data written to disk in their data centers is encrypted using a 256-bit AES block cipher. All encryption key management is taken care of by amazon transparently. I did a little test using boto to see how this worked because I was unsure what would happen if I made that key publicly readable. What I wondered was, if I store some data using server-side encryption then access it via a public API (e.g. curl), would the data still be encrypted? The answer was "no":
>>> import boto
>>> s3 = boto.connect_s3(<my access key>, <my secret key>)
>>> bucket = s3.get_bucket('media.charlesleifer.com')
>>> key = bucket.new_key('testing')
>>> # store a string and encrypt it
>>> key.set_contents_from_string('testing', encrypt_key=True)
7
>>> # make key public readable
>>> key.set_acl('public-read')
Then in another terminal I did:
$ curl http://media.charlesleifer.com/testing
testing
As you can see, the encryption only applies to the data as it resides on amazon's hardware. When it comes back it is unencrypted. This is great for ensuring my data is safe when it sits on Amazon's hardware, but it really doesn't help me ensure only I can unencrypt my files when I need them.
If you're using the AWS Java SDK there is support for client-side encryption using "envelope encryption", which involves a combination of 1-time use keys and a private encryption key you generate. Full details are here. This is what I was looking for, but since the python API did not have support I ended up needing to look elsewhere for a library.
If you want to replicate the setup I will describe below, I suggest creating a virtualenv and installing:
In the interests of this post not being a "wall of code" I'm only going to include the relevant parts and will assume that you can fill-in-the-gaps to include templating and such.
Let's start with the web app. I really like using Flask for things like this -- it's lightweight, stays out of the way, and has great documentation. For simplicity, there will only be three views:
In order to keep track of what files have been uploaded, I used peewee to create
a lightweight file model. The File model might look something like this:
import mimetypes
from peewee import *
class File(Model):
filename = CharField()
created_date = DateTimeField(default=datetime.datetime.now)
encrypted = BooleanField(default=False)
def mimetype(self):
return mimetypes.guess_type(self.filename)[0]
The index view is very simple and just displays a list of File instances:
from flask import render_template
@app.route('/')
def index():
return render_template('index.html', files=File.select().order_by('filename'))
Adding a file is also easy. The add view will call a little "upload_handler" which for now will just stash the file on disk but in a little bit we'll replace it with the code to upload to S3.
from flask import redirect, render_template
from werkzeug import secure_filename
@app.route('/add/', methods=['GET', 'POST'])
def add():
if request.method == 'POST':
file_obj = request.files['file']
instance = File(filename=secure_filename(file_obj.filename))
upload_handler(instance, file_obj)
instance.save()
return redirect(url_for('index'))
return render_template('add.html')
Here is how we might implement a very simple file-based upload handler. This is really just for reference as in the next section we'll be replacing it and using boto to upload to S3:
def upload_handler(instance, file_obj):
dest_filename = os.path.join('/var/www/media/', instance.filename)
with open(dest_filename, 'wb') as fh:
fh.write(file_obj.read())
Lastly, here is a glimpse at a minimal download view. It simply determines the filename by looking the file up by id, then redirects to a static media server. Later we will add logic for handling decryption:
import os
from flask import redirect
@app.route('/download/<int:file_id>/', methods=['GET', 'POST'])
def download(file_id):
try:
file = File.get(id=file_id)
except File.DoesNotExist:
abort(404)
return redirect(os.path.join('/media/', file.filename))
Hopefully the workings of these three views is clear. The index displays a list of File objects, the add view handles uploads, and the download view looks up the file based on ID and redirects to the appropriate location.
Before covering the encryption/decryption logic, let's look into uploading the files to S3. If this is your first time reading about S3 I recommend looking at Amazon's S3 guide.
First let's write a little helper function to retrieve our S3 bucket. This is where all of our files will be stored. The credentials can be found on your AWS "Security Credentials" page.
import boto
def get_bucket(access_key_id, secret_access_key, bucket_name):
conn = boto.connect_s3(access_key_id, secret_access_key)
return conn.get_bucket(bucket_name)
The first function we will modify is the "upload_handler". Now, instead of writing the uploaded file to the local filesystem we will put it in our S3 bucket:
def upload_handler(instance, file_obj):
# access our S3 bucket and create a new key to store the file data
bucket = get_bucket(<access key>, <secret access key>, <bucket name>)
key = bucket.new_key(instance.filename)
key.set_metadata('Content-Type', instance.mimetype())
# seek to the beginning of the file and read it into the key
file_obj.seek(0)
key.set_contents_from_file(file_obj)
# make the key publicly available
key.set_acl('public-read')
The next change will be to the download view. Since the file is going to be hosted on S3, we will need to redirect to the URL of the file inside the bucket.
from flask import redirect
from urlparse import urljoin
@app.route('/download/<int:file_id>/', methods=['GET', 'POST'])
def download(file_id):
try:
file = File.get(id=file_id)
except File.DoesNotExist:
abort(404)
# redirect to the url of the file hosted on S3
return redirect(urljoin(<bucket url>, file.filename))
In order to perform the encryption on the client-side, I chose to use the pycrypto library. It is written in C, so if you install it you will need to build the python extension. The API is fairly low-level, so I ended up writing a small wrapper library on top called "beefish". beefish provides functions for encrypting and decrypting file-like objects using the blowfish cipher.
I only care about encrypting some of my files, so I added a "password" field to the upload form. When a password is present, the file will be encrypted using that password. The "add file" form looks something like this:
Once again, let's modify the upload handler. This time we'll check to see if the
submitted data contains a password, and if so encrypt the uploaded file before sending
it off to S3. I'm using StringIO to store the encrypted contents of the file
in memory:
from beefish import encrypt
from StringIO import StringIO
def upload_handler(instance, file_obj):
bucket = get_bucket(<access key>, <secret access key>, <bucket name>)
key = bucket.new_key(instance.filename)
key.set_metadata('Content-Type', instance.mimetype())
# new code here:
if request.form.get('password'):
# we received a password, so will need to encrypt the file data
# before sending it off to S3
password = request.form['password']
instance.encrypted = True
output_buffer = StringIO()
encrypt(file_obj, output_buffer, password)
file_obj = output_buffer
else:
instance.encrypted = False
# end of new code
file_obj.seek(0)
key.set_contents_from_file(file_obj)
key.set_acl('public-read')
Since the file is encrypted on the server, our download view will now need to become a little smarter. The logic will change now so that:
Here is how my download form looks:
Here is the modified download view:
from beefish import decrypt
from flask import send_file
from urlparse import urljoin
@app.route('/download/<int:file_id>/', methods=['GET', 'POST'])
def download(file_id):
try:
file = File.get(id=file_id)
except File.DoesNotExist:
abort(404)
if not file.encrypted:
return redirect(url_join(<bucket url>, file.filename))
# new logic:
if request.method == 'POST' and request.form.get('password'):
# fetch the encrypted file contents from S3 and store in a memory
bucket = get_bucket(<access key>, <secret access key>, <bucket name>)
key_obj = bucket.get_key(file.filename)
# read the contents of the key into an in-memory file
enc_buffer = StringIO()
key.get_contents_to_file(enc_buffer)
enc_buffer.seek(0)
# decrypt contents and store in dec_buffer
dec_buffer = StringIO()
decrypt(enc_buffer, dec_buffer, request.form['password'])
dec_buffer.seek(0)
# efficiently send the decrypted file as an attachment
return send_file(
dec_buffer,
file.get_mimetype(),
as_attachment=True,
attachment_filename=file.filename,
)
# display a password form
return render_template('download.html', file=file)
There are a lot of things you might do to improve this little app. Here are just a few ideas I had:
Thanks so much for reading! Feel free to post any questions or comments below.
Nice article! You can actually set the ACL for the S3 object at the time you upload it rather than using the separate call to key.set_acl(). Just add a "policy='public-read'" keyword param to the set_contents_from_file() call.
Hello and thank you for your excellent article. I think I have found a typo. At the last download method urljoin is imported from urlparse but url_join is used.
Best regards, Andreas
I have made a github repository at https://github.com/sv1jsb/encrypted-flask-aws of your article. I have added folder and search capability. Best regards, Andreas
Thank you for the comments! Andreas -- that is awesome, thanks for putting it into a GH repo - the added functionality is really nice. I've added it to the list of resources in the body of the post.
Any thoughts about glacier? (Amazon Glacier)
I'm definitely interested in trying out glacier -- now that its part of boto I'll probably be much more inclined to use it. The biggest thing is that "several hours" retrieval time. However, for storing a large amount of data, its price seems great.
Commenting has been closed, but please feel free to contact me