December 30, 2010 18:08 / 2 comments / autocomplete django python redis solr

One of the nicest UI's around when dealing with a large dataset is a good autocomplete. Facebook's search is a great example, same for Netflix, and recently Google launched "Google Instant", which returns search results as you type. Autocomplete can really complement hierarchical drill-down search (which is useful for discovery), as the goal of autocomplete is more for helping users find something they already know about with a minimum of effort.

django-completion

The goal of django-completion is to make adding autocompletion to your django project super easy. It follows the by-now-standard pluggable backend/registry approach, so its on you to write "providers" for the models you want to enable autocomplete on. Luckily, there's only a handful of methods you need to implement.

Supposing you wanted to allow autocomplete on users, here's what a "User" provider might look like:

from django.contrib.auth.models import User

from completion.sites import AutocompleteProvider, site


class UserProvider(AutocompleteProvider):
    def get_title(self, obj):
        """This is the phrase that will be 'autocompleted'"""
        return obj.username

    def get_pub_date(self, obj):
        return obj.last_login

    def get_data(self, obj):
        """
        Any arbitrary data you want stored in the index, which will
        be returned when a search is performed
        """
        return {
            'username': obj.username,
            'email': obj.email,
            'avatar_url': obj.profile.avatar,
            'profile_url': obj.get_absolute_url()
        }

    def get_queryset(self):
        return self.model._default_manager.filter(active=True)

site.register(User, UserProvider)

The important thing to note is that any data returned by the get_data() method is stored in the index and returned as the search results. If you wanted, you could even render a template and store the rendered HTML with this method. Storing the username, email, avatar and url should allow us to display useful information on the frontend while avoiding a database hit for each record. Granted, if you're using a SQL database as the autocomplete backend this is a moot point, but if you choose to use either the Solr or the Redis backend this can save some overhead on each search. Which brings me to the next point...

Choosing a Backend

Autocomplete can be expensive, as you are performing a partial string match across potentially a large amount of records. Each python backend is written to try and leverage the strengths of the database it targets. The following databases are supported:

My strong recommendation would be to use the Solr backend, as it handles so much of the complexity for you automatically. If you are already using haystack for search, adding autocomplete should be no sweat. Check this gist if you're interested in adding autocomplete to your haystack setup -- but keep in mind that you'll need to be using the Solr backend and you will need to reindex anything you want to provide autocompletion on. If you've never set Solr up before, I've written a short post on installing Solr on ubuntu.

On the other hand, if you're looking for something simple, the SQL backend will work fine for smaller datasets. It works by breaking up the input into chunks and performing a LIKE query. And of course, if you're already using Redis and want to experiment, check out the Redis backend. It is based on the ideas from these two posts:

Doing the basics

Assuming you want to provide autocomplete on the django User model, the provider described earlier in this post will work fine. This section will illustrate some common tasks you might encounter in your app.

Getting suggestions

The application-wide registry of AutocompleteProviders (like the UserProvider) has a suggest() method, which takes a partial match and a limit:

>>> from completion import site
>>> site.suggest('pytho', 4)
[
    {'title': 'pythonic idioms', 'url': '/posts/235/'},
    {'title': 'python testing', 'url': '/posts/482/'},
    {'title': 'python tips and tricks', 'url': '/posts/523/'},
    {'title': 'web testing with python', 'url': '/posts/789/'}
]

Storing objects in the index

In order to get results, you need to make sure you objects are indexed. The registry has a store_providers() method which iterates over every registered provider and stores all the objects returned by the get_queryset() method:

>>> from completion import site
>>> site.store_providers()

If you want to store only a single object, use the registry's store_object() method:

>>> e = Entry.objects.get(slug='some-entry')
>>> site.store_object(e)

Note: if you want to add objects to the index one at a time, i.e., when a blog post is saved, add it to the autocomplete index, set up a post-save handler:

from django.db.models.signals import post_save
from blog.models import Entry, ENTRY_STATUS_PUBLISHED
from completion import site

def store_blog_entry(sender, instance, created, **kwargs):
    # make sure only to store published blog entries
    if instance.status == ENTRY_STATUS_PUBLISHED:
        site.store_object(instance)

post_save.connect(store_blog_entry, sender=Entry)

Removing objects from the index

Like the above example, you can remove a single object by calling the registry's remove_object() method, or set up a pre-delete signal handler.

You may be wondering why this stuff isn't handled automatically. Because Solr can sometimes take a long time to add or remove an object from the index, it may be a bad idea to bog down your request/response cycle with this overhead. Instead you may want to push updates to an out-of-band worker. At some point down the road I may look into providing some settings toggles, but for now its up to the user to handle this part.

Storing arbitrary data

Overriding the get_data() method of an AutocompleteProvider:

class BlogProvider(AutocompleteProvider):
    ...
    def get_data(self, obj):
        return {
            'title': obj.title,
            'author_name': obj.author.username,
            'url': obj.get_absolute_url(),
        }

Writing your own backend

Writing a backend is easy -- looking at the completion.backends.base module, there are only 4 methods that need to be implemented:

class BaseBackend(object):
    """
    Specify the interface for Autocomplete providers
    """
    def flush(self):
        raise NotImplementedError # remove all objects from the index

    def store_object(self, obj, data):
        raise NotImplementedError

    def remove_object(self, obj, data):
        raise NotImplementedError

    def suggest(self, phrase, limit):
        raise NotImplementedError

Reading more

If you're interested in reading more, check the README on github. The tests are also a good place to look.

Links

More like this

Comments (2)

NiKo | jan 2011, at 02:47am

Really cool. Let my use it on my next project, thank you!

Fangzx | jan 2011, at 07:44pm

Great! Thanks for share.


Commenting has been closed, but please feel free to contact me