LSM Key/Value Storage in SQLite3


Several months ago I was delighted to see significant progress on a relatively new extension in the SQLite source tree. The lsm1 extension is based on the LSM key/value database developed as an experimental storage engine for the now-defunct SQLite4 project. Since development has stopped on SQLite4 for the forseeable future, I was happy to see this technology being folded into SQLite3 and was curious to see what the SQLite developers had in mind for this library.

The SQLite4 LSM captured my interest several years ago as it seemed like a viable alternative to some of the other embedded key/value databases floating around (LevelDB, BerkeleyDB, etc), and I went so far as to write a set of Python bindings for the library. As a storage engine, it seems to offer stable performance, with fast reads of key ranges and fast-ish writes, though random reads may be slower than the usual SQLite3 btree. Like SQLite3, the LSM database supports a single-writer/multiple-reader transactional concurrency model, as well as nested transaction support.

The LSM implementation in SQLite3 is essentially the same as that in SQLite4, plus some additional bugfixes and performance improvements. Crucially, the SQLite3 implementation comes with a standalone extension that exposes the storage engine as a virtual table. The rest of this post will deal with the virtual table, its implementation, and how to use it.

Just a heads-up: since this extension has not been officially announced or documented, I believe the developers may not consider it finished. The APIs and functionality described here may change with future releases of SQLite.

Virtual Tables

SQLite provides an API that allows one to programmatically define a complete table, which can then be queried just like any ordinary database table. For example, it's possible that you could expose a Redis database as a virtual table and then operate on your Redis database completely from SQLite. More commonly, perhaps you want to work with some CSV data but do not want to write a script to parse it and load it into a database -- thanks to the CSV virtual table extension, querying a CSV table can be as simple as running a CREATE VIRTUAL TABLE query and then working directly with the auto-generated table.

To put it simply, a virtual table implements several methods, and ultimately returns rows of tabular data. The SQLite virtual machine then executes your queries against the data returned by your application. Hints are provided by SQLite so you can avoid returning the full data-set when only a small portion is needed, allowing for efficient implementations. SQLite may even propose several query plans, and based on the cost your application returns, will choose the most efficient.

LSM Virtual Table

The source code for the LSM1 virtual table is concise, readable, and well-commented so I definitely recommend checking it out. In a nutshell, the virtual table exposes a single key/value database as a regular table. The keys are sorted lexicographically and must be either text, blob or unsigned int (all keys in a given database are required to be the same data-type). The value is a collection of one or more ordinary SQLite columns, which the virtual table packs and encodes into the value field associated with the given key.

Given this organization, queries on values are not going to be efficient, since there are no secondary indexes. But querying ranges of keys is going to be very fast! This makes the LSM a good fit for any ordered data-set where lookups will commonly be linear ranges -- things like time-series, event logs, analytics, etc. The LSM virtual table's cost estimator is smart enough that it can recognize when key ranges are being requested, and will return only as much data is needed.

Data is written to the LSM using the standard INSERT or the SQLite-specific INSERT OR REPLACE. Until recently, UPDATE and DELETE were not supported, but testing the latest version (and reading the code) it appears that both are now working as expected! Due to the lack of secondary indexes, it probably is only wise to use UPDATE queries restricted by individual keys or ranges of consecutive keys.

Compiling the Extension

At the time of writing, the SQLite build scripts do not include any macros for automatically compiling the LSM1 extension. I'll update this post if that changes, but in the meantime, it's pretty straightforward.

First you'll want to get a copy of the latest source code. You can grab the source tarball or use fossil to clone the repository:

fossil clone sqlite.fossil
mkdir sqlite
cd sqlite
fossil open ../sqlite.fossil

To compile the extension, change directories into the ext/lsm1 sub-directory and run the following:

cd ext/lsm1
export CFLAGS="-fPIC -O2"
TCCX="gcc -g -fPIC -O2" make

You should now have a file named, which can be loaded at run-time as a SQLite extension. To test that the extension is loadable, try running the following:

sqlite3  # start up the sqlite3 shell

sqlite> .load ./lsm

Assuming no error is reported, you should be all set.

Working with LSM1

Let's kick the tires on the LSM virtual table. I'll assume you've got a SQLite shell opened and have loaded up the LSM extension.


-- create a simple table keyed by "name" (type TEXT) with address & phone
-- as the associated value.
CREATE VIRTUAL TABLE contacts USING lsm1 ('contacts.lsm', name, TEXT, address, phone);

-- store 4 key/value pairs in the LSM database.
INSERT INTO contacts (name, address, phone) VALUES
  ('charlie', '3000 Main St, Topeka KS', '785-555-1234'),
  ('huey', '1300 Walnut St, Topeka KS', '785-555-9999'),
  ('zaizee', '900 Maple Dr, Topeka KS', '913-555-1800'),
  ('mickey', '123 Chestnut Dr, Lawrence KS', '785-555-1000');

-- retrieve data back out of the LSM store.
SELECT * FROM contacts;

-- returns the following data (note that it is automatically sorted by key):
charlie|3000 Main St, Topeka KS|785-555-1234
huey|1300 Walnut St, Topeka KS|785-555-9999
mickey|123 Chestnut Dr, Lawrence KS|785-555-1000
zaizee|900 Maple Dr, Topeka KS|913-555-1800

-- update the phone number for "mickey". Note that we have to include *all*
-- column values when using "REPLACE".
REPLACE INTO contacts (name, address, phone) VALUES
  ('mickey', '123 Chestnut Dr, Lawrence KS', '785-555-5000');

-- check the row was updated.
SELECT phone FROM contacts WHERE name = 'mickey';

-- returns the following data:

-- we can also use the UPDATE statement.
UPDATE contacts SET phone = '785-555-8888' WHERE name = 'huey';

-- verify.
SELECT phone FROM contacts WHERE name = 'huey';

-- delete "zaizee".
DELETE FROM contacts WHERE name = 'zaizee';

-- verify row deleted.
SELECT COUNT(*) FROM contacts;

Since the LSM database is stored in it's own file (contacts.lsm per our virtual table declaration), we can use LSM-specific tools to interact with the database. python-lsm-db is one such tool. Let's use it to introspect the database. We can see how SQLite is storing our data, which is kind of neat:

>>> from lsm import LSM
>>> db = LSM('./contacts.lsm')
>>> list(db.keys())
['charlie', 'huey', 'mickey']

>>> db['huey']
'\x991300 Walnut St, Topeka KSK785-555-8888'

>>> db['mickey']
'\xab123 Chestnut Dr, Lawrence KSK785-555-5000'

Digging into the data, we can see the value is structured as a prefix plus address data, plus another prefix and the phone number. The field length, because the values we're dealing with are small, can be found by dividing the value by 6 (because the value modulo 6 is the data-type, which for our text fields comes out to 3). So for \x99 (153 in decimal), we end up with 25, which happens to be the length of huey's address. For K (75 in decimal) we get 11, which is the length of huey's phone number (including the two hyphens). For \xab (171) we get 28, which is the length of mickey's address. Pretty neat. If we were storing integers things are a bit more complicated, but hopefully this gives an impression of what SQLite is doing under-the-hood.

Peewee Support

Peewee 3.0 contains robust support for LSM1. LSM tables define one primary key column and an arbitrary number of additional value columns (which are serialized and stored in a single value field in the storage engine). The primary key must be all of the same type and use one of the following field types:

Since the LSM storage engine is a key/value store, primary keys (including integers) must be specified by the application. Example model declaration:

# Use the peewee with all sqlite extensions.
from peewee import *
from playhouse.sqlite_ext import *

db = SqliteExtDatabase('my_app.db')
db.load_extension('')  # Load shared library.

class KV(LSMTable):
    key = TextField(primary_key=True)
    value = TextField()

    class Meta:
        database = db
        filename = 'kv.ldb'


For tables consisting of a single value field, Peewee will return the value directly when getting a single item. You can also request slices of rows, in which case Peewee returns a corresponding Select query, which can be iterated over. Below are some examples:

>>> KV['k0'] = 'v0'
>>> print(KV['k0'])

>>> data = [{'key': 'k%d' % i, 'value': 'v%d' % i} for i in range(20)]
>>> KV.insert_many(data).execute()


>>> KV['k8']

>>> list(KV['k4.1':'k7.x']
[Row(key='k5', value='v5'),
 Row(key='k6', value='v6'),
 Row(key='k7', value='v7')]

>>> list(KV['k6xxx':])
[Row(key='k7', value='v7'),
 Row(key='k8', value='v8'),
 Row(key='k9', value='v9')]

You can also index the LSMTable using expressions:

>>> list(KV[KV.key > 'k6'])
[Row(key='k7', value='v7'),
 Row(key='k8', value='v8'),
 Row(key='k9', value='v9')]

>>> list(KV[(KV.key > 'k6') & (KV.value != 'v8')])
[Row(key='k7', value='v7'),
 Row(key='k9', value='v9')]

You can delete single rows using del or multiple rows using slices or expressions:

>>> del KV['k1']
>>> del KV['k3x':'k8']
>>> del KV[KV.key.between('k10', 'k18')]

>>> list(KV[:])
[Row(key='k0', value='v0'),
 Row(key='k19', value='v19'),
 Row(key='k2', value='v2'),
 Row(key='k3', value='v3'),
 Row(key='k9', value='v9')]

For more examples of using the lsm1 extension with Peewee, see the Peewee LSM1 documentation.


Hope you found this post interesting. The SQLite developers seem to always be coming out with really neat stuff like this (e.g. fts5 and json1). It's a fascinating project to watch!


Comments (0)

Commenting has been closed.