Walrus: Lightweight Python utilities for working with Redis

January 11, 2015 19:49 / nosql python redis walrus / 5 comments

A couple weekends ago I got it into my head that I would build a thin Python wrapper for working with Redis. Andy McCurdy's redis-py is a fantastic low-level client library with built-in support for connection-pooling and pipelining, but it does little more than provide an interface to Redis' built-in commands (and rightly so). I decided to build a project on top of redis-py that exposed pythonic containers for the Redis data-types. I went on to add a few extras, including a cache and a declarative model layer. The result is walrus.

Installation

If you'd like to try it out, you can install walrus using pip. Note that you will also need to install redis-py and have a Redis server running.

$ pip install walrus

At the time of writing, the current version of walrus is 0.1.9.

Containers

Redis supports five data-types, and each of these types supports a number of special-purpose commands. To make working with these types easier, I wrote container objects that look like their built-in analogues. For instance walrus hashes look like dict objects, have familiar methods like keys(), items(), update(), and support item access using square-brackets. Walrus sets behave like python sets, and so on.

walrus comes with support for the five data-types, as well as an additional Array type implemented using lua scripts (as opposed to Redis' linked-list implementation):

Working with containers is easy, as they can be instantiated by calling the corresponding method on the walrus database instance. Let's see how it works:

>>> from walrus import *
>>> db = Database(host='localhost', db=0)
>>> huey = db.Hash('huey')
>>> huey.update(color='white', temperament='ornery', type='kitty')
<Hash "huey": {'color': 'white', 'type': 'kitty', 'temperament': 'ornery'}>

>>> huey.keys()
['color', 'type', 'temperament']
>>> 'color' in huey
True
>>> huey['color']
'white'

There are similar APIs for the other data-types, which you can read about in the documentation.

Originally these containers were all I had planned on implementing, but I had such a good time working on this project that I just kept going.

Models

I thought it would be cool to add a lightweight structured data modelling API, something with a declarative API like Django or peewee. To that end, walrus supports declarative model classes and a number of field types for things like text, dates, integers, floats, and more.

Here is how I modeled a twitter-like app (which you can find in the examples):

class User(BaseModel):
    username = TextField(primary_key=True)
    password = TextField(index=True)
    email = TextField()

    followers = ZSetField()
    following = ZSetField()


class Message(BaseModel):
    username = TextField(index=True)
    content = TextField(fts=True)
    timestamp = DateTimeField(default=datetime.datetime.now)

    def get_user(self):
        return User.load(self.username)

There are already a number of projects that do this, some of them quite well, such as Stdnet. Redisco, Rom, and limpyd are also similar projects. Stdnet looks to be the most sophisticated, but it relies on a ton of lua scripts. Redisco, Rom, and limpyd (wtf is a limpyd?) all seem to offer only very basic column filters. The goal for walrus models was to support flexible, composable filtering using a combination of secondary indexes and set operations.

Walrus's model layer is built on top of the Redis hash, but all the interesting stuff happens in the indexes. Each field can have a number of secondary indexes which provide different ways to filter/query. For instance, the default index type is simply a big set of all values, and can be used to perform equality/inequality tests. For scalar values, the index is a sorted set, which can be sliced by value to perform greater-than and less-than queries. By combining filter options with set operations, walrus is able to support arbitrarily complex queries.

Full-text search

I'd like to take a quick detour to discuss the full-text search feature, since I think it's kind of neat. The full-text index is a basic inverted index where tokens correspond to sets of matching document IDs. The full-text search index implements the porter stemming algorithm and also supports the double-metaphone algorithm and automatic stop-word removal.

The cool part is that I built a very simple search query parser that executes boolean expressions against the full-text index. This makes it possible to write things like:

expr = Message.content.search('python AND (walrus OR redis)')
messages = Message.query(expr)

This translates into the following sequence (roughly) of Redis commands being executed:

"ZINTERSTORE" "temp.629a" "1" "message:content.fts.python"
"ZINTERSTORE" "temp.ebe2" "1" "message:content.fts.walru"
"ZINTERSTORE" "temp.bb37" "1" "message:content.fts.redi"
"ZUNIONSTORE" "temp.72a8" "2" "temp.ebe2" "temp.bb37"
"ZINTERSTORE" "temp.7fc3" "2" "temp.629a" "temp.72a8"
"ZREVRANGE" "temp.7fc3" "0" "-1"

This sequence of operations means:

Store all the document IDs matching "python" in key 629a
Store all the document IDs matching "walrus" in key ebe2
Store all the document IDs matching "redis" in key bb37
Take the union of the walrus and redis sets and store it in key 72a8
Take the intersection of the python set and the redis/walrus set and return all the matching document IDs.

All of this is handled transparently by the backend!

Creating objects and performing queries

The model layer is hopefully easy to work with and understand. Walrus makes use of operator overloads to create the query tree, which is then translated into a series of Redis statements and set operations.

Message.create(content='this is a message', username='huey')
msg = Message(content='this is another message', username='mickey')
msg.save()

# Get messages by "huey".
messages = Message.query(
    Message.username == 'huey',
    order_by=Message.timestamp.desc())

# Get messages by huey or mickey.
messages = Message.query(
    (Message.username == 'huey') | (Message.username == 'mickey'),
    order_by=Message.timestamp.desc())

# Find messages by huey matching a search query.
search_expr = Message.content.search('python AND (peewee OR huey OR walrus)')
messages = Message.query(
    search_expr & (Message.username == 'huey'),
    order_by=Message.timestamp)

If you'd like to see more examples, check out the model documentation, the example twitter app, or the example diary app.

Caching

The final component of walrus is a Caching API. The cache implements the standard get and set operators, and also provides a decorator which can be used to wrap expensive / cache-friendly functions or methods.

Here is how you might use the cache:

cache = db.cache(default_timeout=600)

@cache.cached()
def get_recommendations(person):
    # Perform some expensive calculation that can be cached.
    return RecommendationEngine(person).get_recommendations()