ucache, a lightweight caching library for python

January 15, 2019 15:32 / kyototycoon python redis sqlite / 0 comments

I recently wrote about Kyoto Tycoon (KT), a fast key/value database server. KT databases may be ordered (B-Tree / red-black tree) or unordered (hash table), and persistent or stored completely in-memory. Among other things, I'm using KT's hash database as a cache for things like HTML fragments, RSS feed data, etc. KT supports automatic, time-based expiration, so using it as a cache is a natural fit.

Besides using KT as a cache, in the past I have also used Redis and Sqlite. So I've released a small library I'm calling ucache which can be used with these storage backends and has a couple nice features. I will likely flesh it out and add support for additional storages as I find time to work on it.

Quick overview

The usage is dead-simple. ucache supports the following operations:

get(key) and get_many(keys)
set(key, value, timeout) and set_many(data_dict, timeout)
delete(key) and delete_many(keys)
flush() (delete everything)

For convenience, there is also a general-purpose decorator for caching functions or methods. This is a nice pattern, because the bulk of your code can be written as if there were no cache:

from ucache import KTCache

cache = KTCache(host='127.0.0.1', port=1979)

@cache.cached(timeout=300)
def rss_feed(feed_name):
    # Generate the RSS feed as you would normally.
    feed = AtomFeed('%s feed' % feed_name, feed_url=request.url)
    for item in get_item_list():
        feed.add(item.title, item.html, ...)
    return feed.get_response()


@cache.cached(timeout=86400)
def validate_akismet_key():
    if not akismet_key_is_still_valid():
        logger.error('Akismet key is not valid!')  # At most logs once/day.
        return False
    else:
        return True

In our code, we can call these functions without concerning ourselves about the existince of the cache. If cached data is available, it will be returned. If not, the data will be generated and stored in the cache for future calls.

You may have noticed that the rss_feed() function accepts a single argument, while validate_akismet_key() takes no arguments. ucache will, by default, generate a cache-key that is unique to the arguments specified by the caller. So each unique feed_name would result in its own distinct cache key.

Bulk-loading cached data

KT, Redis and Sqlite all support efficient bulk-get operations, so it makes sense to read as many keys from the cache as possible when performing an operation that may need to access multiple cached values (like rendering a template). As part of developing ucache, I added a transparent second layer of in-memory caching that allows one to take advantage of efficient bulk-gets, while continuing to use the single-get API.

For example, on many of the sites I maintain, including this blog, there are small but expensive chunks of HTML to render. My blog posts are written in markdown, and also make use of oembed for converting simple URLs into rich content objects (using my Python oembed client micawber). In order to avoid regenerating this HTML every time someone views my blog, I store the HTML output of the rendered markdown in a cache. Additionally, I cache other things, like the list of tags, the list of comments, etc., so a single page may need to make multiple trips to the cache.

Here's an over-simplified example, which shows how two commonly-used values (the tag list and the rendered HTML) might be stored and retrieved from the cache:

cache = ucache.KTCache()  # or RedisCache, or SqliteCache, etc.

class BlogEntry(Model):
    # ... ordinary model definition stuff ...
    def cache_key(self, key):
        return 'entry.%s.%s' % (self.id, key)

    def get_tag_list(self):
        tag_list = cache.get(self.cache_key('tag_list'))
        if tag_list is None:
            tag_list = [tag.tag for tag in self.tag_query()]
            cache.set(self.cache_key('tag_list'), tag_list, 60)
        return tag_list

    def html(self):
        html = cache.get(self.cache_key('html'))
        if html is None:
            html = markdown(oembed(self.content))
            cache.set(self.cache_key('html'), html, 600)
        return html

In reality, I would use a custom Jinja2 tag (Django has one built-in) and do the caching in the template itself:

<div class="entry-header">
  {% cache entry.cache_key('tag_list'), 60 %}
    {% for tag in entry.tag_query() %}
      <a href="...">{{ tag.tag }}</a>
    {% endfor %}
  {% endcache %}
</div>

<div class="entry-body">
  {% cache entry.cache_key('html'), 600 %}
    {{ entry.content|oembed|markdown }}
  {% endcache %}
</div>

This is where pre-loading comes in. Since I know which cache-keys I will need to access when rendering the page, I can preload them all in a single operation. The pre-load feature of ucache allows me to use the same API for reading from the cache (cache.get) -- the only difference is that the value is pre-loaded and calling cache.get() will return the pre-fetched value rather than making a request to the cache server itself.

It is implemented as a context manager. The best part is that none of the above code or templates would need to change. I can simply wrap the template rendering step with a context-manager that pre-loads the needed cache keys:

@app.route('/blog/<entry_id>/')
def entry_detail(entry_id):
    # Load the entry from the database.
    try:
        entry = Entry.public().where(Entry.id == entry_id).get()
    except Entry.DoesNotExist:
        abort(404)

    with cache.preload([entry.cache_key('tag_list'), entry.cache_key('html')]):
        return render_template('entry_detail.html', entry=entry)

When the context-manager is entered, the given keys are loaded in a single operation. Then, while the template is being rendered, any time one of our preloaded keys is requested from the cache, the pre-loaded value is returned. The improvement in responsiveness is quite noticeable on list-views which may have 10's or 100's of such cache reads.

Additional helpers

In order to reduce the amount of data I'm pushing to-and-from the cache server (over a socket, typically), ucache also supports transparent compression of the cached data using zlib. I found that compressing the cached data, particularly HTML fragments, improved the performance 2x.

When instantiating a cache object, you can specify whether to use compression, and the minimum content-length for considering content to be compressible.

Example:

from ucache import KTCache

# Configure cache so values over 256 bytes in length will be
# compressed automatically.
cache = KTCache(compression=True, compression_len=256)

Cached data is transparently serialized into a binary representation using the serializer of your choice. By default, ucache will use pickle, but you can also use msgpack, or disable serialization altogether. When compression is enabled, the compression occurs after serialization (of course).

Trying it out

The code for ucache can be found on GitHub or installed using pip install ucache. The implementation is super short, a couple hundred lines, so feel free to take it and adapt it to your needs.

Thanks for taking the time to read, I hope this post was helpful.

Comments (0)

Commenting has been closed.