Saturday morning hacks: Adding full-text search to the flask note-taking app
In this post we will use SQLite's full-text search extension to improve the search feature.
To recap, the note-taking app provides a lightweight interface for storing markdown-formatted notes. Because I frequently find myself wanting to take notes on the spur of the moment, the note-taking app needed to be very mobile-friendly. By using twitter bootstrap and a hefty dose of JavaScript, we made an app that matches our spec and manages to look good doing it!
In part 2, we added email reminders and check-able task lists to the note-taking app. We also converted the backend to use flask-peewee's REST API extension, which made it easy to add pagination and search. And that is how I've left it for the last three months or so.
Below is a screenshot of the latest version of the notes app. The UI is much cleaner thanks to a stylesheet from bootswatch. The bootswatch stylesheet works as a drop-in replacement for the default bootstrap CSS file.
All together, the note-taking app has the following features:
- Flexible pinterest-style tiled layout that looks great on a variety of screen sizes.
- Easy to create notes and reminders from the phone.
- Notes support markdown and there is also a simple WYSIWYM markdown editing toolbar.
- Links are converted to rich media objects where possible (e.g. a YouTube URL becomes an embedded player).
- To-do lists (or task lists) can be embedded in notes.
- Email reminders can be scheduled for a given note.
- Simple full-text search.
- Pagination.
You can browse or download the finished code from part 2 in this gist. If you're in a hurry, you can find all the code from this post in this gist.
In case you were curious, I've been using the notes app for things like:
- Bookmarking interesting sites to read later.
- Creating short to-do lists or writing down particular items to get from the store, etc.
- Writing down interesting dreams or ideas I get in the middle of the night.
- Appointment reminders, reminders to call people, etc.
- Saving funny cat pics.
- Writing down ideas for programming projects.
- Saving code snippets or useful commands.
Adding full-text search
In this installment, we'll be using SQLite's full-text search (FTS) extension to dramatically improve the search result quality. Besides giving us better results, SQLite's FTS extension supports boolean-type search expressions as well as custom constructs like NEAR. A full list of supported queries can be found in the SQLite documentation. Most importantly, by using full-text search we are now able to sort our search results by match quality. For a detailed write-up on using SQLite's FTS with Python, check out this post.
To use the FTS extension, we need to create a special table called a virtual table. Virtual tables are meta-tables that present a high-level interface for working with special extension data. The FTS virtual table allows us to store search content, and behind-the-scenes several hidden tables will be created to store the actual search data-structures. The peewee playhouse Sqlite extension provides a model sub-class designed for working with virtual tables, and more specifically, a virtual model subclass for the FTS extension.
So without further ado, let's get started!
Sqlite extension database
The first step is to replace our peewee database class, swapping out the default SqliteDatabase
for the playhouse's SqliteExtDatabase
, which provides support for virtual tables.
Open app.py
and add the following import to the top of the module below the peewee
import:
from playhouse.sqlite_ext import SqliteExtDatabase
Below you will find the database declaration. We will replace the original declaration with the following:
db = SqliteExtDatabase(app.config['DATABASE'], pragmas=[('journal_mode', 'wal')])
Now that we have updated our database class, we can start using the FTS extension in our peewee code.
Indexing notes for full-text search
In order to search our notes, we need to create a separate model to store the search content we wish to index. The playhouse.sqlite_ext module provides the FTSModel class which we will subclass to create the FTSNote
model. This model will store, along with the content for the search index, a pointer back to the note it references. Open models.py
and add the following model definition (the import goes at the top of the module):
# Place these imports at the top of the module.
import re
from playhouse.sqlite_ext import FTSModel
# The FTSNote class should go just below the Note class.
class FTSNote(FTSModel):
# Simple regex for stripping out any HTML tags.
HTML_RE = re.compile('<.+?>')
content = TextField()
class Meta:
database = db
Now that we have a place to store our note content, we need to determine exactly what content to store. As is hinted by the HTML_RE
attribute, I opted to strip out HTML from the note -- which also would seem to indicate that I'm processing any markdown or oEmbed content as well.
Here is the method I came up with that extracts the useful search content from a note. It belongs as a staticmethod on the FTSNote class. This method calls the note's html() method, which will convert any oEmbed links into rich media objects, and process any markdown. The HTML tags are then stripped out since they will mostly be noise, and finally, the same process is done for each task item attached to the note (if any exist):
@staticmethod
def get_search_content(note):
content = [FTSNote.HTML_RE.sub('', note.html())]
for task in note.get_tasks():
content.append(FTSNote.HTML_RE.sub('', task.html()))
return '\n'.join(line for line in content if line)
The last method we will add to the FTSNote class will be responsible for the actual storing of the note's search content in the special virtual table:
@classmethod
def store_note(cls, note):
content = FTSNote.get_search_content(note)
try:
FTSNote.get(FTSNote.docid == note.id)
except FTSNote.DoesNotExist:
FTSNote.create(docid=note.id, content=content)
else:
(FTSNote
.update(content=content)
.where(FTSNote.docid == note.id)
.execute())
The final step is to ensure that whenever a new note is added to the database we simultaneously add it to the search index. To do this, add a single line to the Note.save() method calling the FTSNote.store_note() method:
def save(self, *args, **kwargs):
# ... note.save() method body ...
# Right before we return from this method, and after our call
# to the super-class save() method, we will store the note in
# the search index.
# Store the content for full-text search.
FTSNote.store_note(self)
return ret
Performing searches
In order to make use of our search index, we will write a new Note.search method. SQLite uses a special operator MATCH to perform a full-text search query, and this operator is exposed, for convenience, as a method on the playhouse FTSModel class. The only other interesting part is to ensure the results are ordered by relevance. SQLite does not provide an out-of-the-box search ranking algorithm, but the playhouse.FTSModel provides two algorithms you can use (see documentation for more details).
Our search method will be responsible for sanitizing any junk user input, then selecting the notes that match the search term and returning them ordered by rank:
@classmethod
def search(cls, search_term):
words = [word.strip() for word in search_term.split() if word]
if not words:
# Return an intentionally empty query.
return Note.select().where(Note.id == 0)
else:
# Recombine the cleaned search phrase.
search = ' '.join(words)
return (Note
.select(Note, FTSNote.rank().alias('score'))
.join(FTSNote, on=(Note.id == FTSNote.docid))
.where(
(Note.status == Note.STATUS_VISIBLE) &
(FTSNote.match(search)))
.order_by(FTSNote.rank())
Preparing the database
At this point you're probably ready to give the new search function a try! Before we can do that, we need to create a table in our database for the FTSNote model, and then populate it with the content from our existing notes. Using either a one-off script or the interactive interpreter, run the following code:
>>> from models import db, FTSNote, Note
>>> with db.atomic():
... FTSNote.create_table()
... for note in Note.select():
... FTSNote.store_note(note)
And that's it! You can now try searching the notes database:
>>> [note.content for note in Note.search('lua')]
['Learn Lua in 15 minutes: http://tylerneylon.com/a/learn-lua/',
'Programming in Lua : 9.1 -- http://www.lua.org/pil/9.1.html',
'Programming in Lua (first edition), free eBook at http://www.lua.org/pil/contents.html']
>>> [note.content for note in Note.search('learn NEAR/3 lua')]
['Learn Lua in 15 minutes: http://tylerneylon.com/a/learn-lua/']
Wiring up the front-end
When we implement search in part 2 of the series, we simply passed an additional filtering parameter to the API endpoint (content__ilike=search terms). In order to use our new search method, we'll need to create a custom search endpoint. Luckily, this is quite easy -- we just add a new URL route to our NoteResource class and call it from the front-end.
Open api.py
and make the following changes:
# Add this import to the top of the module.
from flask import request
# Add the following methods to the NoteResource class.
class NoteResource(RestResouce):
def get_urls(self):
return (
('/search/', self.search),
) + super(NoteResource, self).get_urls()
def search(self):
query = request.args.get('query')
notes = Note.search(request.args.get('query') or '')
notes = self.process_query(notes) # Apply any filters, etc.
return self.paginated_object_list(notes)
Finally we will modify the notes.js script to use the search endpoint when performing a search.
The Editor.getList()
method controls how we retrieve notes from the API. When a search term is present, we need to use the custom endpoint. Make the following changes to Editor.getList()
:
Editor.prototype.getList = function(page, search) {
var requestData = {};
var self = this;
var url = search ? '/api/note/search/' : '/api/note/'; // New.
this.container.empty();
if (page) requestData['page'] = page;
if (search) requestData['query'] = search; // New: include the search query.
// New: this call changed from a hard-coded URL to the new url variable.
this.makeRequest(url, 'GET', requestData, function(data) {
data.objects.reverse();
$.each(data.objects, function(idx, note) {
self.addNoteToList(note.rendered);
});
imagesLoaded(self.container, function() {
self.container.masonry('layout');
});
self.updatePagination(data);
});
}
Wrapping up
That's it! Our notes app now has very robust full-text searching capabilities. Thanks for following along, I hope you enjoyed this post. In the next installment we will be building a command-line client for working with the notes app. This client will make it very easy to work with the notes app from the command-line. Since we won't be able to easily browse lists of notes, having a quality search interface will be very helpful for finding notes.
Note-taking app code
You can download or browse the source code for the entire note-taking app up to this point using this Gist:
https://gist.github.com/coleifer/d93d6c43e59698d149c0
This gist also contains some additional changes which were not covered in this post for reasons of brevity, but which may be useful to you.
- Added capability to edit notes after they've been saved.
- Re-wrote the reminders task to be more robust.
- Simplified the Note model, and included a migration script.
- Added an optional days parameter to the search function to limit our search to a certain number of days in the past.
You can browse a diff of the changes here.
Links
Here are the other posts in the series:
- Part 1: Building a little note-taking app with Flask and Peewee
- Part 2: Revisiting the note-taking app (adding todo lists, reminders, search, and a REST API)
Or simply look at all of the saturday-morning hack posts.
Comments (0)
Commenting has been closed.