Advanced filtering and SQLite FTS5 with Scout search engine
I've been working on some new features for Scout and thought they might be worth a short blog post. The super-short version is that Scout now supports complex filtering on metadata, adding another layer of filtering besides the full-text search. Additionally, I've added support for SQLite FTS5, using it by default if it's available otherwise falling back to FTS4.
Filtering on metadata
Scout's purpose is to allow users to perform full-text searches over collections of documents. In addition to these documents, which are just blobs of text content, Scout supports storing lightly-structured metadata alongside each document. Metadata, in Scout parlance, is just arbitrary key/value pairs associated with an indexed document. For example, on this blog, I might store as metadata the blog entry's timestamp, URL, title and tease – basically everything necessary to quickly render a nice HTML search result without having to hit the database.
A while back, though, I thought it would be handy to be able to do additional server-side filtering using the metadata. This would allow me to pack some more logic into the search queries and add a bit of implied structure to my data. Until recently, though, the only type of query operation possible on metadata was tests for equality.
With the latest release, Scout now supports a variety of filtering operations. This has greatly increased Scout's utility in my opinion, as I'm already using the new filtering to cut out application code and build new features which would have been inconvenient to implement without the filtering logic.
So now, in addition to equality, Scout supports:
- Not equals (duh)
- Less than / less than or equal
- Greater than / greater than or equal
- In, e.g. the metadata value is one out of a list of options.
- Contains, e.g. substring search
- Starts-with and ends-with for prefix/suffix searching
To use these metadata filters, just do the Django thing and separate the key and operation with a double-underscore. So where before I might only be able to test for exact equality, like
foo=bar, I can now do
foo__contains=bar (substring search).
Here are some example URLs and a description of the results they represent:
/entries/search/?q=python+OR+cython&tags__contains=nosql– full-text search for posts about Python or Cython that are tagged with "nosql".
/entries/search/?q=sqlite&pub_date__ge=2015-01-01– full-text search for posts about SQLite that were published in 2015.
/news/search/?q=ufos&pub_date__ge=2015-11-01&pub_date__lt=2015-12-01&category__ne=hoax– full-text search for news articles related to UFOs, published in November of 2015, whose category is not "hoax".
As you can see this feature opens up some new possibilities for the types of work you can give to Scout. For more information, check out the docs.
SQLite 220.127.116.11 was released in late July this year, and among many notable new features and enhancements, the one that really caught my eye was FTS5. FTS5 is the latest version of the SQLite full-text search extension. The biggest enhancements that are meaningful for Scout users are the powerful mini-language used to query the search engine, and the built-in BM25 ranking (don't worry, Scout implements BM25 for FTS4). While Scout supports BM25 for older FTS versions, the FTS5 implementation is more performant.
A full description of the FTS5 search language would be longer than this post already is, so I will just point you to the docs and say "Have fun!". Here you can read about the search query language.
Scout will automatically use the latest version of the full-text search extension that's available. If you prefer to use an older version, though, you can configure Scout to use a different version.
Scout has always been and always will be a simple, lightweight search server. The two new additions are exciting, but don't really change much in terms of the data being stored. The new filtering is just an obvious enhancement to an existing feature, and the FTS5 support was mostly just adding new classes to Peewee ORM. That said, the filtering feature really got me thinking about new ways I could use Scout, and the search querying capabilities of FTS5 did the same. All in all, I'm looking forward to continuing to hack on this project and hope you are too.
Thanks for taking the time to read.
The following links may be helpful:
- Scout documentation and source code repository
- Earlier blog post announcing Scout, contains good examples and basic background information
- SQLite FTS5 documentation. You might also want to check out the FTS3/4 documentation as many people will not have the latest version of SQLite with FTS5 yet.
- Other posts I've written about SQLite
Commenting has been closed, but please feel free to contact me