Caching trick for Python web applications

I'd like to share a simple trick I use to reduce roundtrips pulling data from a cache server (like Redis or Kyoto Tycoon. Both Redis and Kyoto Tycoon support efficient bulk-get operations, so it makes sense to read as many keys from the cache as we can when performing an operation that may need to access multiple cached values. This is especially true in web applications, as a typical web-page may multiple chunks of data and rendered HTML from a cache (fragment-caching) to build the final page that is sent as a response.

If we know ahead-of-time which cache-keys we need to fetch, we could just grab the cached data in one Redis/KT request and hold onto it in memory for the duration of the request. The problem is, in the data-generation itself, how do we differentiate from "pull-from the Redis cache" versus "pull-from an in-memory cache"? For example:

class Note(Model):
    ...

    @cache.cached(timeout=60, key_fn=lambda args, kwargs: args[0].id)
    def note_markup(self):
        return markdown.markdown(note)

The above function uses a common idiom in many Python cache libraries. The decorator will transparently attempt to retrieve the data from the cache. If the data is not available, then it is computed, stored in the cache, and returned to the caller.

Django and Jinja template fragment caching works the same way. The HTML is retrieved from the cache. If it's not available, it is re-generated and stored in the cache automatically:

{{ note.note_markup() }}

{% cache 60, "note_comments", note.id %}
  {% for comment in note.comments %}
    <div class="comment">
      <p>{{ comment.user.username }}</p>
      <p>{{ comment.comment|markdown }}</p>
    </div>
  {% endfor %}
{% endcache %}

The way I solve this problem in ucache is to use Python's context-managers to create a simple "scope" within the cache itself. It looks like this:

def note_detail_view(request, note_id):
    note = get_object_or_404(Note, id=note_id)
    keys = (
        'note_markup:%s' % note.id,  # Key for markdown output of note content.
        'note_comments:%s' % note.id,  # Rendered comments template fragment.
    )

    # Bulk-load both cache-keys when the context-manager is entered.
    with cache.preload(keys):
        # Now accessing either cache-key through the cache will pull from the
        # in-memory scope (if the data was available), saving round-trips.
        return render_template('note_detail.html', note=note)

This approach would not be beneficial in this example situation, but it provides a significant speed-up when there are lots of cache-keys (or a few cache-keys that are accessed multiple times).

One thing I do to make this more manageable is provide methods on my models which generate cache-keys. I then use this method to refer to the cache-keys wherever possible. For example:

class Note(Model):
    ...

    def cache_key(self, name):
        return 'note_%s:%s' % (name, self.id)

...

def note_detail_view(request, note_id):
    note = get_object_or_404(Note, id=note_id)

    # Get the cache-keys used for this note's rendered content and comments.
    keys = (note.cache_key('markup'), note.cache_key('comments'))

    # Bulk-load both cache-keys when the context-manager is entered.
    with cache.preload(keys):
        # Now accessing either cache-key through the cache will pull from the
        # in-memory scope (if the data was available), saving round-trips.
        return render_template('note_detail.html', note=note)

If you're interested in a very simple implementation, check out the code for ucache - specifically the preload() method. Hope you found this helpful!

Comments (0)


Commenting has been closed, but please feel free to contact me