Advanced filtering and SQLite FTS5 with Scout search engine

photos/scout-logo.png

I've been working on some new features for Scout and thought they might be worth a short blog post. The super-short version is that Scout now supports complex filtering on metadata, adding another layer of filtering besides the full-text search. Additionally, I've added support for SQLite FTS5, using it by default if it's available otherwise falling back to FTS4.

Filtering on metadata

Scout's purpose is to allow users to perform full-text searches over collections of documents. In addition to these documents, which are just blobs of text content, Scout supports storing lightly-structured metadata alongside each document. Metadata, in Scout parlance, is just arbitrary key/value pairs associated with an indexed document. For example, on this blog, I might store as metadata the blog entry's timestamp, URL, title and tease – basically everything necessary to quickly render a nice HTML search result without having to hit the database.

A while back, though, I thought it would be handy to be able to do additional server-side filtering using the metadata. This would allow me to pack some more logic into the search queries and add a bit of implied structure to my data. Until recently, though, the only type of query operation possible on metadata was tests for equality.

With the latest release, Scout now supports a variety of filtering operations. This has greatly increased Scout's utility in my opinion, as I'm already using the new filtering to cut out application code and build new features which would have been inconvenient to implement without the filtering logic.

So now, in addition to equality, Scout supports:

To use these metadata filters, just do the Django thing and separate the key and operation with a double-underscore. So where before I might only be able to test for exact equality, like foo=bar, I can now do foo__contains=bar (substring search).

Here are some example URLs and a description of the results they represent:

As you can see this feature opens up some new possibilities for the types of work you can give to Scout. For more information, check out the docs.

FTS5

SQLite 3.8.11.1 was released in late July this year, and among many notable new features and enhancements, the one that really caught my eye was FTS5. FTS5 is the latest version of the SQLite full-text search extension. The biggest enhancements that are meaningful for Scout users are the powerful mini-language used to query the search engine, and the built-in BM25 ranking (don't worry, Scout implements BM25 for FTS4). While Scout supports BM25 for older FTS versions, the FTS5 implementation is more performant.

A full description of the FTS5 search language would be longer than this post already is, so I will just point you to the docs and say "Have fun!". Here you can read about the search query language.

Scout will automatically use the latest version of the full-text search extension that's available. If you prefer to use an older version, though, you can configure Scout to use a different version.

Conclusion

Scout has always been and always will be a simple, lightweight search server. The two new additions are exciting, but don't really change much in terms of the data being stored. The new filtering is just an obvious enhancement to an existing feature, and the FTS5 support was mostly just adding new classes to Peewee ORM. That said, the filtering feature really got me thinking about new ways I could use Scout, and the search querying capabilities of FTS5 did the same. All in all, I'm looking forward to continuing to hack on this project and hope you are too.

Thanks for taking the time to read.

The following links may be helpful:

Comments (0)


Commenting has been closed, but please feel free to contact me