Nginx: a caching, thumbnailing, reverse proxying image server?

A month or two ago, I decided to remove Varnish from my site and replace it with Nginx's built-in caching system. I was already using Nginx to proxy to my Python sites, so getting rid of Varnish meant one less thing to fiddle with. I spent a few days reading up on how to configure Nginx's cache and overhauling the various config files for my Python sites (so much for saving time). In the course of my reading I bookmarked a number of interesting Nginx modules to return to, among them the Image Filter module.

I thought it would be neat to combine Nginx's reverse proxying, caching, and image filtering to create a thumbnailing server for my images hosted on S3. If you look closely at the <img> tag below (and throughout this site), you can see Nginx in action.

photos/nginx-logo.png

In this post I'll describe how I configured Nginx to efficiently and securely serve thumbnails for images hosted on S3. As a bonus, I'll also show how I'm using the Secure Links module to prevent people from maliciously generating thumbnails.

Getting started

In order for all the pieces to work, your Nginx needs to be compiled with the image filter, proxy, and secure links modules. You can check what modules you have by running nginx -V. If you're using Ubuntu, an easy fix is to just install the nginx-extras package.

Once Nginx is ready, we can start in on configuring Nginx.

Configuration

The first thing we'll want to declare is our proxy cache. This declaration goes in the http section of the nginx.conf file and describes the file-based cache that will store our generated thumbnails. Because a cache miss means fetching the full image from S3 and then resizing it, we want to configure the cache to be large enough to hold most of our thumbnails. For my sites I just estimated that 200 MB would be sufficient.

To define your cache, add this line somewhere in the http section of the nginx config:

# Nginx will create a cache capable of storing 16MB of keys and 200MB of data.
proxy_cache_path /tmp/nginx-thumbnails levels=1:2 keys_zone=thumbnail_cache:16M inactive=60d max_size=200M;

Now we need to create two server definitions: a caching server and a resizing server. The resizing server will be responsible for acting as a reverse-proxy to S3, generating and serving the resized images. The caching server will sit in front of the resizing server, caching and serving the resized images. Although I didn't think it would have been necessary to have two servers, after a bit of googling around because my cache wasn't being populated, I came across several posts that indicated this was the case.

The caching server

The caching server will be the one that is exposed to the public (I put mine at m.charlesleifer.com). Because the sole responsibility of this server is to cache responses from the resizing server, the configuration is pretty minimal. Here is how I've set mine up:

server {
  listen 80;
  server_name m.charlesleifer.com;

  location / {
    proxy_pass http://localhost:10199;
    proxy_cache thumbnail_cache;
    proxy_cache_key "$host$document_uri$is_args$arg_key";
    proxy_cache_lock on;
    proxy_cache_valid 30d;  # Cache valid thumbnails for 30 days.
    proxy_cache_valid any 15s;  # Everything else gets 15s.
    proxy_cache_use_stale error timeout invalid_header updating;
    proxy_http_version 1.1;
    expires 30d;
  }
}

Whenever a request comes in to the caching server, the "thumbnail_cache" is checked first. If no match is found then we proxy back to the resizing server, which is running on localhost. Valid responses from resizing server are cached for 30 days, while anything else is cached for 15 seconds.

The resizing server

All the interesting stuff lives in the resizing server. The job of the resizing server is to fetch images from S3 and resize them on-the-fly, based on dimensions specified in the URL. Additionally, the resizing server checks that a security key is present with each request to prevent people from generating arbitrary thumbnails.

There are a couple distinct sections of the server config block, so let's start with the ones we've seen already: proxying.

server {
  listen 10199;
  server_name localhost;

  set $backend 'your.s3.bucket_name.s3.amazonaws.com';

  resolver 8.8.8.8;  # Use Google for DNS.
  resolver_timeout 5s;

  proxy_buffering off;
  proxy_http_version 1.1;
  proxy_pass_request_body off;  # Not needed by AWS.
  proxy_pass_request_headers off;

  # Clean up the headers going to and from S3.
  proxy_hide_header "x-amz-id-2";
  proxy_hide_header "x-amz-request-id";
  proxy_hide_header "x-amz-storage-class";
  proxy_hide_header "Set-Cookie";
  proxy_ignore_headers "Set-Cookie";
  proxy_set_header Host $backend;
  proxy_method GET;
}

There's really not too much going on here besides telling our server how to talk to S3, so let's keep going. The next thing we'll want to configure is the Nginx image filter module. It only takes a couple directives, some of which we will define at the server level.

Below the proxy_ settings, add the following image_filter settings:

server {
  # ...

  image_filter_jpeg_quality 85;  # Adjust to your preferences.
  image_filter_buffer 12M;
  image_filter_interlace on;
}

Lastly, we'll define a location block that will:

  1. look for well-formed URLs.
  2. Validate the request signature.
  3. Extract the dimensions from the URL.
  4. Fetch the image from S3 and load it into the image_filter_buffer.
  5. Resize and respond.

Item number 2 is particularly interesting. The author of a similar blog post used Lua to validate a request signature, but that seems like a lot of work. The Nginx secure_link extension was surprisingly easy to get working.

The secure_link module works by generating a hash, which is the concatenation of the URL of the requested image, and a secret string known only to your app. Because of hash length extension, we append our key to the URL rather than prepending it. Since you know the secret, you can generate valid hashes whenever you wish display a thumbnail in your application.

Here is the final piece of configuration:

server {
  # ...
  error_page 404 =404 /empty.gif;

  location ~ ^/t/([\d-]+)x([\d-]+)/(.*) {
    secure_link $arg_key;  # The hash is stored in the `key` querystring arg.
    secure_link_md5 "$uri my-secret-key";
    if ($secure_link = "") {
      # The security check failed, invalid key!
      return 404;
    }
    set $image_path '$3';
    image_filter resize $1 $2;

    proxy_pass http://$backend/$3;
  }
}

And that's all there is to it!

Generating hashes

If you are using Python, here is the code I wrote to generate the hash, given a particular thumbnail URI:

import base64
import hashlib

def thumbnail_url(filename, width, height='-'):
    uri = '/t/%sx%s/%s' % (width, height, filename)
    md5_digest = hashlib.md5(uri + ' my-secret-key').digest()
    key = base64.b64encode(md5_digest)
    # Make the key look like Nginx expects.
    key = key.replace('+', '-').replace('/', '_').rstrip('=')

    return 'http://m.charlesleifer.com%s?key=%s' % (uri, key)

Thanks for reading

Thanks for taking the time to read this post, I hope you found it interesting. Feel free to leave a comment if you have any questions and I'll do my best to answer. If you notice that I'm doing something wrong in the above configuration, please let me know as well, and I'll update the post.

Comments (9)

Henrique Vicente | feb 19 2016, at 07:57am

Thumbor is really good and powerful for image processing. A large brazilian media company is behind it and uses on several of its sites. https://github.com/thumbor/thumbor

I myself have used it on a project that didn't went trough (a web market for cars). You can see it working in an open setting at vehikel.trazqueeupago.com/henvic

A few months ago I started studying Go and wrote this small micro-service image processing service. Maybe you want to check it out. https://github.com/henvic/picel

The idea was to make it something like a dedicated gateway to Imagick or something and that's all. Everything else, like cache would be dealt by another layer. The downside is see to this approach is that I can't, say, resize to several different sizes or crop to several different views at once without having to do a full reload.

non | feb 19 2016, at 04:12am

What is the use of secure link hashing like that without including an expiration?

Tim | feb 19 2016, at 04:06am

Nice post and nice idea. Thanks!

Charles | feb 18 2016, at 07:32pm

Hi Peter, thanks for your help. I've updated the post to append the key rather than prepending it.

booi | feb 18 2016, at 04:32pm

@peter, I think it's more of a trade off because HMAC isn't available in nginx without a recompile. While the knowledge of secret isn't required, knowledge of the hash is. As long as the hash isn't used somewhere in the URL, it'd probably be fine for an initial deployment.

Peter | feb 18 2016, at 03:19pm

Awesome guide, will definitely give this a try.

Also, as someone on HN mentioned, the regular expression on the caching server's location directive seems unnecessary; esp since the captured match isn't used in the block.

Bruno souza | feb 18 2016, at 03:10pm

Nice post! But unfortunately ngx_http_image_filter_module has so few features :( As an alternative I'm using Thumbor (https://github.com/thumbor/thumbor). Definitely recommend for anyone that needs something more full featured than ngx_http_image_filter_module.

peter | feb 18 2016, at 03:07pm

Hi,

md5_digest = hashlib.md5('my awesome secret ' + uri).digest()

This kind of code tends to be vulnerable to the length extension attack (https://en.wikipedia.org/wiki/Length_extension_attack).

It can be demonstrated with e.g. HashPump:

# You
hashlib.md5('my awesome secret http://example.org/?x=1').hexdigest()
'e3ffa54b2391b9604c190a2ac77beaee'

# Attacker, with the knowledge of the hash e3ffa54b2391b9604c190a2ac77beaee:
hashpumpy.hashpump('e3ffa54b2391b9604c190a2ac77beaee', 'http://example.org/?x=1', '&x=2', 18)
('1493655e959a801bc0c1ba31ab95f8ab', 'http://example.org/?x=1\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00H\x01\x00\x00\x00\x00\x00\x00&x=2')

# Attacher uses the new string and hash returned by hashpump

# You
hashlib.md5('my awesome secret http://example.org/?x=1\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00H\x01\x00\x00\x00\x00\x00\x00&x=2').hexdigest()
'1493655e959a801bc0c1ba31ab95f8ab'

i.e. an attacker can extend the URL with arbitrary string (up to the need of padding) and produce a valid hash without the knowledge of the secret (length of the secret needs to be known/guessed though). In this case, the query string variable x was overridden with 2.

The secret should never be prepended before the string to be signed, or better use HMAC instead of a simple hash function.

Tom A | feb 18 2016, at 04:40am

This is an excellent approach to serving images. I was looking at expensive hosted services to achieve something similar. Thanks!


Commenting has been closed.