Web-based encrypted file storage using Flask and AWS
The other day I noticed I had a couple thumbdrives kicking around with various versions of my "absolutely do not lose" files...stuff like my private keys, tax documents, zips of papers I wrote in college, etc. These USB drives were all over the house, and many contained duplicate versions of the same files. I thought it would be neat to write a little app to give me a web-based interface to store and manage these files securely. In this post I'll talk about how I built a web-based file storage app using flask, pycrypto, and amazon S3.
The goal was simple: replace my USB drives with a secure way of storing my files in the cloud. I wanted a web app so I could access my files anywhere. Furthermore, if I use a service like S3, then I could feel safe knowing my files were going to be there tomorrow (99.999999999% durability). Amazon S3 also provides an extra layer of versioning and server-side encryption if I want to use it.
A quick note on amazon's encryption
The server-side encryption provided by AWS ensures that the data written to disk in their data centers is encrypted using a 256-bit AES block cipher. All encryption key management is taken care of by amazon transparently. I did a little test using boto to see how this worked because I was unsure what would happen if I made that key publicly readable. What I wondered was, if I store some data using server-side encryption then access it via a public API (e.g. curl), would the data still be encrypted? The answer was "no":
>>> import boto >>> s3 = boto.connect_s3(<my access key>, <my secret key>) >>> bucket = s3.get_bucket('media.charlesleifer.com') >>> key = bucket.new_key('testing') >>> # store a string and encrypt it >>> key.set_contents_from_string('testing', encrypt_key=True) 7 >>> # make key public readable >>> key.set_acl('public-read')
Then in another terminal I did:
$ curl http://media.charlesleifer.com/testing testing
As you can see, the encryption only applies to the data as it resides on amazon's hardware. When it comes back it is unencrypted. This is great for ensuring my data is safe when it sits on Amazon's hardware, but it really doesn't help me ensure only I can unencrypt my files when I need them.
If you're using the AWS Java SDK there is support for client-side encryption using "envelope encryption", which involves a combination of 1-time use keys and a private encryption key you generate. Full details are here. This is what I was looking for, but since the python API did not have support I ended up needing to look elsewhere for a library.
Setting up a virtualenv
If you want to replicate the setup I will describe below, I suggest creating a virtualenv and installing:
- beefish (for client-side encryption)
- boto (for AWS)
- flask (for web frontend)
- peewee (lightweight persistence for our files)
- pycrypto (needed by beefish)
In the interests of this post not being a "wall of code" I'm only going to include the relevant parts and will assume that you can fill-in-the-gaps to include templating and such.
Start with the web frontend and file model
Let's start with the web app. I really like using Flask for things like this -- it's lightweight, stays out of the way, and has great documentation. For simplicity, there will only be three views:
- index: a list of files and links to download them)
- add: a form to upload a file and optionally specify an encryption key
- download: if the file is encrypted, ask for a key to decrypt
In order to keep track of what files have been uploaded, I used peewee to create
a lightweight file model. The
File model might look something like this:
import mimetypes from peewee import * class File(Model): filename = CharField() created_date = DateTimeField(default=datetime.datetime.now) encrypted = BooleanField(default=False) def mimetype(self): return mimetypes.guess_type(self.filename)
Displaying a list of files
The index view is very simple and just displays a list of
from flask import render_template @app.route('/') def index(): files = File.select().order_by(File.filename) return render_template('index.html', files=files)
A simple upload view and handler
Adding a file is also easy. The add view will call a little "upload_handler" which for now will just stash the file on disk but in a little bit we'll replace it with the code to upload to S3.
from flask import redirect, render_template from werkzeug import secure_filename @app.route('/add/', methods=['GET', 'POST']) def add(): if request.method == 'POST': file_obj = request.files['file'] instance = File(filename=secure_filename(file_obj.filename)) upload_handler(instance, file_obj) instance.save() return redirect(url_for('index')) return render_template('add.html')
Here is how we might implement a very simple file-based upload handler. This is really just for reference as in the next section we'll be replacing it and using boto to upload to S3:
def upload_handler(instance, file_obj): dest_filename = os.path.join('/var/www/media/', instance.filename) with open(dest_filename, 'wb') as fh: fh.write(file_obj.read())
Lastly, here is a glimpse at a minimal download view. It simply determines the filename by looking the file up by id, then redirects to a static media server. Later we will add logic for handling decryption:
import os from flask import redirect @app.route('/download/<int:file_id>/', methods=['GET', 'POST']) def download(file_id): try: file = File.get(id=file_id) except File.DoesNotExist: abort(404) return redirect(os.path.join('/media/', file.filename))
Hopefully the workings of these three views is clear. The index displays a list of
File objects, the add view handles uploads, and the download view looks up the file based on ID and redirects to the appropriate location.
Uploading to S3
Before covering the encryption/decryption logic, let's look into uploading the files to S3. If this is your first time reading about S3 I recommend looking at Amazon's S3 guide.
First let's write a little helper function to retrieve our S3 bucket. This is where all of our files will be stored. The credentials can be found on your AWS "Security Credentials" page.
import boto def get_bucket(access_key_id, secret_access_key, bucket_name): conn = boto.connect_s3(access_key_id, secret_access_key) return conn.get_bucket(bucket_name)
Modifying the upload handler to store data in S3
The first function we will modify is the "upload_handler". Now, instead of writing the uploaded file to the local filesystem we will put it in our S3 bucket:
def upload_handler(instance, file_obj): # access our S3 bucket and create a new key to store the file data bucket = get_bucket(<access key>, <secret access key>, <bucket name>) key = bucket.new_key(instance.filename) key.set_metadata('Content-Type', instance.mimetype()) # seek to the beginning of the file and read it into the key file_obj.seek(0) key.set_contents_from_file(file_obj) # make the key publicly available key.set_acl('public-read')
Modifying the download view to redirect to the file on S3
The next change will be to the download view. Since the file is going to be hosted on S3, we will need to redirect to the URL of the file inside the bucket.
from flask import redirect from urlparse import urljoin @app.route('/download/<int:file_id>/', methods=['GET', 'POST']) def download(file_id): try: file = File.get(id=file_id) except File.DoesNotExist: abort(404) # redirect to the url of the file hosted on S3 return redirect(urljoin(<bucket url>, file.filename))
The fun part, encrypting and decrypting the file
In order to perform the encryption on the client-side, I chose to use the pycrypto library. It is written in C, so if you install it you will need to build the python extension. The API is fairly low-level, so I ended up writing a small wrapper library on top called "beefish". beefish provides functions for encrypting and decrypting file-like objects using the blowfish cipher.
Adding a password to the upload form
I only care about encrypting some of my files, so I added a "password" field to the upload form. When a password is present, the file will be encrypted using that password. The "add file" form looks something like this:
Once again, let's modify the upload handler. This time we'll check to see if the
submitted data contains a password, and if so encrypt the uploaded file before sending
it off to S3. I'm using
StringIO to store the encrypted contents of the file
from beefish import encrypt from StringIO import StringIO def upload_handler(instance, file_obj): bucket = get_bucket(<access key>, <secret access key>, <bucket name>) key = bucket.new_key(instance.filename) key.set_metadata('Content-Type', instance.mimetype()) # new code here: if request.form.get('password'): # we received a password, so will need to encrypt the file data # before sending it off to S3 password = request.form['password'] instance.encrypted = True output_buffer = StringIO() encrypt(file_obj, output_buffer, password) file_obj = output_buffer else: instance.encrypted = False # end of new code file_obj.seek(0) key.set_contents_from_file(file_obj) key.set_acl('public-read')
Decrypting the file in the download view
Since the file is encrypted on the server, our download view will now need to become a little smarter. The logic will change now so that:
- if the file is not encrypted, simply redirect to S3 -- this logic stays the same
- if the file is encrypted, display a password form and send the decrypted contents down as an attachment
Here is how my download form looks:
Here is the modified download view:
from beefish import decrypt from flask import send_file from urlparse import urljoin @app.route('/download/<int:file_id>/', methods=['GET', 'POST']) def download(file_id): try: file = File.get(id=file_id) except File.DoesNotExist: abort(404) if not file.encrypted: return redirect(url_join(<bucket url>, file.filename)) # new logic: if request.method == 'POST' and request.form.get('password'): # fetch the encrypted file contents from S3 and store in a memory bucket = get_bucket(<access key>, <secret access key>, <bucket name>) key_obj = bucket.get_key(file.filename) # read the contents of the key into an in-memory file enc_buffer = StringIO() key.get_contents_to_file(enc_buffer) enc_buffer.seek(0) # decrypt contents and store in dec_buffer dec_buffer = StringIO() decrypt(enc_buffer, dec_buffer, request.form['password']) dec_buffer.seek(0) # efficiently send the decrypted file as an attachment return send_file( dec_buffer, file.get_mimetype(), as_attachment=True, attachment_filename=file.filename, ) # display a password form return render_template('download.html', file=file)
Ideas for improvements
There are a lot of things you might do to improve this little app. Here are just a few ideas I had:
- Put the site behind SSL if you haven't!
- Support putting files into folders -- S3 handles this transparently
- Investigate other tools for encryption, such as the python gpg wrapper
- If memory becomes a problem, look into encrypting and decrypting on disk
- Look into AWS' versioning support
Thanks so much for reading! Feel free to post any questions or comments below.
Links of interest
- Github Repo of working app based on this post, thanks to Andreas Porevopoulos
- EncFS -- an encrypted filesystem that runs in user-space
- dropbox -- automatic syncing of your files to the cloud
- dropbox with EncFS
- idrive with api
If you're interested in more projects like this, check out the saturday-morning hack posts.
Commenting has been closed, but please feel free to contact me