Saturday morning hacks: Adding full-text search to the flask note-taking app

Saturday morning hacks

Code compatibility with newer versions of Peewee

SQLite's full-text search extension does not support fast binary-search lookups on any key other than the docid. The code in this post, however, specifies a different column for the primary key. Newer versions of peewee's integration with the SQLite full-text search extension (2.7.1 and up) will raise an error if you use the code as it is presented in this post.

In preparation for the fourth and final installment in the "Flask Note-taking app" series, I found it necessary to improve the search feature of the note-taking app. In this post we will use SQLite's full-text search extension to improve the search feature.

To recap, the note-taking app provides a lightweight interface for storing markdown-formatted notes. Because I frequently find myself wanting to take notes on the spur of the moment, the note-taking app needed to be very mobile-friendly. By using twitter bootstrap and a hefty dose of JavaScript, we made an app that matches our spec and manages to look good doing it!

In part 2, we added email reminders and check-able task lists to the note-taking app. We also converted the backend to use flask-peewee's REST API extension, which made it easy to add pagination and search. And that is how I've left it for the last three months or so.

Below is a screenshot of the latest version of the notes app. The UI is much cleaner thanks to a stylesheet from bootswatch. The bootswatch stylesheet works as a drop-in replacement for the default bootstrap CSS file.

photos/p1412692599.23.png

All together, the note-taking app has the following features:

You can browse or download the finished code from part 2 in this gist. If you're in a hurry, you can find all the code from this post in this gist.

In case you were curious, I've been using the notes app for things like:

In this installment, we'll be using SQLite's full-text search (FTS) extension to dramatically improve the search result quality. Besides giving us better results, SQLite's FTS extension supports boolean-type search expressions as well as custom constructs like NEAR. A full list of supported queries can be found in the SQLite documentation. Most importantly, by using full-text search we are now able to sort our search results by match quality. For a detailed write-up on using SQLite's FTS with Python, check out this post.

To use the FTS extension, we need to create a special table called a virtual table. Virtual tables are meta-tables that present a high-level interface for working with special extension data. The FTS virtual table allows us to store search content, and behind-the-scenes several hidden tables will be created to store the actual search data-structures. The peewee playhouse Sqlite extension provides a model sub-class designed for working with virtual tables, and more specifically, a virtual model subclass for the FTS extension.

So without further ado, let's get started!

Sqlite extension database

The first step is to replace our peewee database class, swapping out the default SqliteDatabase for the playhouse's SqliteExtDatabase, which provides support for virtual tables.

Open app.py and add the following import to the top of the module below the peewee import:

from playhouse.sqlite_ext import SqliteExtDatabase

Below you will find the database declaration. We will replace the original declaration with the following:

db = SqliteExtDatabase(app.config['DATABASE'], threadlocals=True)

Now that we have updated our database class, we can start using the FTS extension in our peewee code.

In order to search our notes, we need to create a separate model to store the search content we wish to index. The playhouse.sqlite_ext module provides the FTSModel class which we will subclass to create the FTSNote model. This model will store, along with the content for the search index, a pointer back to the note it references. Open models.py and add the following model definition (the import goes at the top of the module):

# Place these imports at the top of the module.
import re
from playhouse.sqlite_ext import FTSModel


# The FTSNote class should go just below the Note class.
class FTSNote(FTSModel):
    # Simple regex for stripping out any HTML tags.
    HTML_RE = re.compile('<.+?>')

    note = ForeignKeyField(Note, primary_key=True)
    content = TextField()

    class Meta:
        database = db

Now that we have a place to store our note content, we need to determine exactly what content to store. As is hinted by the HTML_RE attribute, I opted to strip out HTML from the note -- which also would seem to indicate that I'm processing any markdown or oEmbed content as well.

Here is the method I came up with that extracts the useful search content from a note. It belongs as a staticmethod on the FTSNote class. This method calls the note's html() method, which will convert any oEmbed links into rich media objects, and process any markdown. The HTML tags are then stripped out since they will mostly be noise, and finally, the same process is done for each task item attached to the note (if any exist):

@staticmethod
def get_search_content(note):
    content = [FTSNote.HTML_RE.sub('', note.html())]
    for task in note.get_tasks():
        content.append(FTSNote.HTML_RE.sub('', task.html()))
    return '\n'.join(line for line in content if line)

The last method we will add to the FTSNote class will be responsible for the actual storing of the note's search content in the special virtual table:

@classmethod
def store_note(cls, note):
    try:
        fts_note = FTSNote.get(FTSNote.note == note)
    except FTSNote.DoesNotExist:
        fts_note = FTSNote(note=note)
        force_insert = True
    else:
        force_insert = False
    fts_note.content = FTSNote.get_search_content(note)
    fts_note.save(force_insert=force_insert)

The final step is to ensure that whenever a new note is added to the database we simultaneously add it to the search index. To do this, add a single line to the Note.save() method calling the FTSNote.store_note() method:

def save(self, *args, **kwargs):
    # ... note.save() method body ...

    # Right before we return from this method, and after our call
    # to the super-class save() method, we will store the note in
    # the search index.

    # Store the content for full-text search.
    FTSNote.store_note(self)

    return ret

Performing searches

In order to make use of our search index, we will write a new Note.search method. SQLite uses a special operator MATCH to perform a full-text search query, and this operator is exposed, for convenience, as a method on the playhouse FTSModel class. The only other interesting part is to ensure the results are ordered by relevance. SQLite does not provide an out-of-the-box search ranking algorithm, but the playhouse.FTSModel provides two algorithms you can use (see documentation for more details).

Our search method will be responsible for sanitizing any junk user input, then selecting the notes that match the search term and returning them ordered by rank:

@classmethod
def search(cls, search_term):
    words = [word.strip() for word in search_term.split() if word]
    if not words:
        # Return an intentionally empty query.
        return Note.select().where(Note.id == 0)
    else:
        # Recombine the cleaned search phrase.
        search = ' '.join(words)

    return (Note
            .select(Note, FTSNote, FTSNote.rank().alias('score'))
            .join(FTSNote)
            .where(
                (Note.status == Note.STATUS_VISIBLE) &
                (FTSNote.match(search)))
            .order_by(SQL('score').desc(), Note.timestamp.desc()))

Preparing the database

At this point you're probably ready to give the new search function a try! Before we can do that, we need to create a table in our database for the FTSNote model, and then populate it with the content from our existing notes. Using either a one-off script or the interactive interpreter, run the following code:

>>> from models import db, FTSNote, Note
>>> with db.transaction():
...     FTSNote.create_table()
...     for note in Note.select():
...         FTSNote.store_note(note)

And that's it! You can now try searching the notes database:

>>> [note.content for note in Note.search('lua')]
['Learn Lua in 15 minutes: http://tylerneylon.com/a/learn-lua/',
 'Programming in Lua : 9.1 -- http://www.lua.org/pil/9.1.html',
 'Programming in Lua (first edition), free eBook at http://www.lua.org/pil/contents.html']

>>> [note.content for note in Note.search('learn NEAR/3 lua')]
['Learn Lua in 15 minutes: http://tylerneylon.com/a/learn-lua/']

Wiring up the front-end

When we implement search in part 2 of the series, we simply passed an additional filtering parameter to the API endpoint (content__ilike=search terms). In order to use our new search method, we'll need to create a custom search endpoint. Luckily, this is quite easy -- we just add a new URL route to our NoteResource class and call it from the front-end.

Open api.py and make the following changes:

# Add this import to the top of the module.
from flask import request

# Add the following methods to the NoteResource class.
class NoteResource(RestResouce):
    def get_urls(self):
        return (
            ('/search/', self.search),
        ) + super(NoteResource, self).get_urls()

    def search(self):
        query = request.args.get('query')
        notes = Note.search(request.args.get('query') or '')
        notes = self.process_query(notes)  # Apply any filters, etc.
        return self.paginated_object_list(notes)

Finally we will modify the notes.js script to use the search endpoint when performing a search.

The Editor.getList() method controls how we retrieve notes from the API. When a search term is present, we need to use the custom endpoint. Make the following changes to Editor.getList():

Editor.prototype.getList = function(page, search) {
  var requestData = {};
  var self = this;
  var url = search ? '/api/note/search/' : '/api/note/'; // New.

  this.container.empty();

  if (page) requestData['page'] = page;
  if (search) requestData['query'] = search;  // New: include the search query.

  // New: this call changed from a hard-coded URL to the new url variable.
  this.makeRequest(url, 'GET', requestData, function(data) {
    data.objects.reverse();
    $.each(data.objects, function(idx, note) {
      self.addNoteToList(note.rendered);
    });
    imagesLoaded(self.container, function() {
      self.container.masonry('layout');
    });
    self.updatePagination(data);
  });
}

Wrapping up

That's it! Our notes app now has very robust full-text searching capabilities. Thanks for following along, I hope you enjoyed this post. In the next installment we will be building a command-line client for working with the notes app. This client will make it very easy to work with the notes app from the command-line. Since we won't be able to easily browse lists of notes, having a quality search interface will be very helpful for finding notes.

Note-taking app code

You can download or browse the source code for the entire note-taking app up to this point using this Gist:

https://gist.github.com/coleifer/d93d6c43e59698d149c0

This gist also contains some additional changes which were not covered in this post for reasons of brevity, but which may be useful to you.

  • Added capability to edit notes after they've been saved.
  • Re-wrote the reminders task to be more robust.
  • Simplified the Note model, and included a migration script.
  • Added an optional days parameter to the search function to limit our search to a certain number of days in the past.

You can browse a diff of the changes here.

Here are the other posts in the series:

Or simply look at all of the saturday-morning hack posts.

Comments (0)


Commenting has been closed, but please feel free to contact me