Building the SQLite FTS5 Search Extension

photos/p1438286638.84.png

Note: for a more up to date post on this top5ic, chech out my post on using the SQLite json1 and fts5 extensions with Python.

SQLite 3.8.11.1 contains a new, experimental version of the full-text search extension named FTS5. Reviewing the documentation for FTS5, I saw that it includes a couple cool enhancements, namely a more sophisticated query language, and built-in BM25 result ranking.

I decided to give it a try and thought I'd share my notes on compiling the extension in case anyone else is curious.

Here is a little command-line session that will fetch the 3.8.11.1 source code and compile the SQLite executable and static library:

$ cd ~/tmp/
$ export SVER="3081101"  # Current version as of 7/30/2015
$ wget https://www.sqlite.org/2015/sqlite-src-$SVER.zip
$ unzip sqlite-src-$SVER.zip
$ cd sqlite-src-$SVER
$ export CFLAGS="$CFLAGS -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_COLUMN_METADATA=1 -DSQLITE_ENABLE_UNLOCK_NOTIFY -DSQLITE_SECURE_DELETE"
$ ./configure --enable-shared --enable-static --enable-tempstore=yes
$ make

Now we can build the FTS5 extension. From the same directory as above, run the following:

$ make fts5.c
$ gcc -g -fPIC -shared -lm fts5.c -o fts5.so

Let's test it out to make sure it's working. Execute ./sqlite3 and run the following queries to load the fts5 extension and create a fts5 virtual table. The porter stemmer will normalize the tokens based on the root word, so write, writes, writing all reduce to a common base.

sqlite> .load fts5
sqlite> CREATE VIRTUAL TABLE quotes USING fts5(content, tokenize='porter unicode61');

Let's add a few quotes to the table:

sqlite> INSERT INTO quotes (content) VALUES ('A faith is a necessity to a man. Woe to him who believes in nothing.');
sqlite> INSERT INTO quotes (content) VALUES ('All who call on God in true faith, earnestly from the heart, will certainly be heard, and will receive what they have asked and desired.');
sqlite> INSERT INTO quotes (content) VALUES ('Be faithful in small things because it is in them that your strength lies.');
sqlite> INSERT INTO quotes (content) VALUES ('Faith consists in believing when it is beyond the power of reason to believe.');
sqlite> INSERT INTO quotes (content) VALUES ('Faith has to do with things that are not seen and hope with things that are not at hand.');

And finally, let's do some searches. The special hidden column rank will calculate the BM25 score of the match, allowing us to sort by relevance:

sqlite> SELECT * FROM quotes WHERE quotes MATCH 'believe' ORDER BY rank;
Faith consists in believing when it is beyond the power of reason to believe.
A faith is a necessity to a man. Woe to him who believes in nothing.

sqlite> SELECT content, rank FROM quotes WHERE quotes MATCH 'believe' ORDER BY rank;
Faith consists in believing when it is beyond the power of reason to believe.|-0.48955367608742
A faith is a necessity to a man. Woe to him who believes in nothing.|-0.356593499940754

sqlite> SELECT content FROM quotes WHERE quotes MATCH 'content : NEAR("faith" "believe", 5)';
Faith consists in believing when it is beyond the power of reason to believe.

FTS5 supports a variety of queries:

For a complete overview of the query language, check out the documentation.

Note for Python readers

If you're using Python, the Peewee ORM comes with a SQLite extension that contains helpers for working with FTS3 and FTS4. Included in those is a Python implementation of the BM25 ranking algorithm.

Here are links to get you started:

Thanks for reading

I hope you found this information helpful. I'm looking forward to doing more experimentation with FTS5 and possibly adding support to the Peewee SQLite extension. FTS5 also supports user-defined ranking and text processing functions, so it might be cool to write Python bindings to expose those APIs. We'll see!

Here are some links you may find helpful:

Comments (0)


Commenting has been closed, but please feel free to contact me