Using SQLite4's LSM Storage Engine as a Stand-alone NoSQL Database with Python

August 14, 2015 00:21 / cython lsm nosql python sqlite / 1 comments

SQLite and Key/Value databases are two of my favorite topics to blog about. Today I get to write about both, because in this post I will be demonstrating a Python wrapper for SQLite4's log-structured merge-tree (LSM) key/value store.

I don't actively follow SQLite's releases, but the recent release of SQLite 3.8.11 drew quite a bit of attention as the release notes described massive performance improvements over 3.8.0. While reading the release notes I happened to see a blurb about a new, experimental full-text search extension, and all this got me to wondering what was going on with SQLite4.

As I was reading about SQLite4, I saw that one of the design goals was to provide an interface for pluggable storage engines. At the time I'm writing this, SQLite4 has two built-in storage backends, one of which is an LSM key/value store. Over the past month or two I've been having fun with Cython, writing Python wrappers for the embedded key/value stores UnQLite and Vedis. I figured it would be cool to use Cython to write a Python interface for SQLite4's LSM storage engine.

After pulling down the SQLite4 source code and reading through the LSM header file (it's very small!), I started coding and the result is python-lsm-db (docs).

What is an LSM tree?

As I understand them, LSM trees consist of an in-memory tree that acts like a buffer, and one or more persistent trees stored on disk. The M in LSM stands for merge, an operation where the buffered writes are merged into the tree(s) on disk. By merging into the disk-backed tree, the cost of all those disk seeks can be greatly reduced, meaning one thing: fast writes. The counter-point is that random reads may be slower because there could be multiple trees to search, and because an LSM tree may be higher than a comparable B-Tree. I gather that another benefit of LSM trees is that data may be stored more contiguously (less fragmented), leading to faster reads of ranges of keys.

Again, this is my understanding, based on reading a couple blog posts and the SQLite4 documentation. If I've got it wrong or have omitted important details, please let me know!.

Features

The SQLite4 LSM implementation has some very nice features:

Embedded database, hosted by your application.
Keys support in-order traversal using cursors.
Transactional (including nested transactions).
Single writer/multiple reader MVCC based transactional concurrency model.
On-disk database stored in a single file.
Data is durable in the face of application or power failure.
Lots of tuning parameters so you can tailor it to your specific needs.

Building the Python library

Alright, let's try it out. To get started, go ahead and create a virtualenv and use pip to install Cython and lsm-db:

$ virtualenv test_lsm
$ cd test_lsm
$ source bin/activate
(test_lsm) $ pip install Cython lsm-db

You can verify the install worked by running the following:

(test_lsm) $ python -c "import lsm, tempfile; lsm.LSM(tempfile.mktemp())"

The above command should produce no output if everything is installed and working correctly. If you encounter problems, be aware that I've really only tested this on Linux using Python 2.7, so if you're running Python 3.4 on Windows (ew, why) you may have to do a little debugging.

Kicking the tires

What follows is a sample interactive console session designed to show some of the basic features and functionality of the lsm-db library. The API documentation contains a complete list of classes, methods, and descriptions of their parameters and return values.

To begin, fire up a Python interpreter in your virtualenv and instantiate an LSM object, specifying a path to a database file:

>>> from lsm import LSM
>>> db = LSM('test.ldb')

In addition to the filename, the LSM class provides a number of tuning parameters you can tweak to adjust things like the block size, page size, and more (doc link).

Key/Value features

The SQLite4 LSM engine is a key/value store, making it roughly analogous to Python's dict object. Rather than follow the C conventions defined in the header file, we will use dictionary APIs to interact with the LSM database.

>>> db['foo'] = 'bar'
>>> print db['foo']
bar

>>> for i in range(4):
...     db['k%s' % i] = str(i)
...

>>> 'k3' in db
True
>>> 'k4' in db
False

>>> del db['k3']
>>> db['k3']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lsm.pyx", line 973, in lsm.LSM.__getitem__ (lsm.c:7142)
  File "lsm.pyx", line 777, in lsm.LSM.fetch (lsm.c:5756)
  File "lsm.pyx", line 778, in lsm.LSM.fetch (lsm.c:5679)
  File "lsm.pyx", line 1289, in lsm.Cursor.seek (lsm.c:12122)
  File "lsm.pyx", line 1311, in lsm.Cursor.seek (lsm.c:12008)
KeyError: 'k3'

Note that when we attempted to access a key we just deleted, a KeyError was raised. By default when you attempt to look up a key, lsm-db will search for an exact match. Interestingly, the SQLite4 LSM allows you to also search for the closest key lexicographically, if the specific key you are searching for does not exist. In addition to matching keys by equality, there are two seek methods: SEEK_LE and SEEK_GE, which will return the next-closest key. Failing to find an exact match, SEEK_LE will return the highest key whose value does not exceed the search key. Similarly, SEEK_GE will return the lowest key whose value exceeds the search key. For example, we know k1.5 does not exist, so let's see how the two seek methods handle this:

>>> from lsm import SEEK_LE, SEEK_GE

>>> # Here we will match "k1", the highest key that is less than k1.5
>>> db['k1.5', SEEK_LE]
'1'

>>> # Here we will match "k2", the lowest key that is greater than k1.5
>>> db['k1.5', SEEK_GE]
'2'

In addition to the methods shown above, LSM also supports other common dictionary methods such as keys(), values() and update().

Slices and Iteration

The SQLite4 LSM database can be iterated through directly, or sliced to return a subset of keys. An interesting feature is that when you request a range of keys, the start and end keys do not need to exist. If either key is missing, the database will rely on the appropriate SEEK_ method to find the next-closest key.

>>> [item for item in db]
[('foo', 'bar'), ('k0', '0'), ('k1', '1'), ('k2', '2')]

>>> db['k0':'k99']
<generator object at 0x7f2ae93072f8>

>>> list(db['k0':'k99'])
[('k0', '0'), ('k1', '1'), ('k2', '2')]

You can use open-ended slices to return all keys in a given direction.

>>> list(db['k0':])
[('k0', '0'), ('k1', '1'), ('k2', '2')]

>>> list(db[:'k1'])
[('foo', 'bar'), ('k0', '0'), ('k1', '1')]

If the lower- or upper-bound is outside the range of keys an empty list is returned.

>>> list(db[:'aaa'])
[]
>>> list(db['z':])
[]

To retrieve keys in reverse order, simply use a higher key as the first parameter of your slice. If you are retrieving an open-ended slice, you can specify True as the step parameter of the slice.

>>> list(db['k1':'aaa'])  # Since 'k1' > 'aaa', keys are retrieved in reverse:
[('k1', '1'), ('k0', '0'), ('foo', 'bar')]

>>> list(db['k1'::True])  # Open-ended slices specify True for step:
[('k1', '1'), ('k0', '0'), ('foo', 'bar')]

You can also delete slices of keys, but note that the delete will not include the keys themselves:

>>> del db['k0':'k99']

>>> list(db)  # Note that 'k0' still exists.
[('foo', 'bar'), ('k0', '0')]

For more examples and a detailed description of how the SEEK_ methods work, check out the LSM.fetch_range() documentation.

Cursors

While slicing may cover most use-cases, for finer-grained control you can use cursors to search and traverse records.

>>> with db.cursor() as cursor:
...     for key, value in cursor:
...         print key, '=>', value
...
foo => bar
k0 => 0

>>> db.update({'k1': '1', 'k2': '2', 'k3': '3'})

>>> with db.cursor() as cursor:
...     cursor.first()
...     print cursor.key()
...     cursor.last()
...     print cursor.key()
...     cursor.previous()
...     print cursor.key()
...
foo
k3
k2

>>> with db.cursor() as cursor:
...     cursor.seek('k0', SEEK_GE)
...     print list(cursor.fetch_until('k99'))
...
[('k0', '0'), ('k1', '1'), ('k2', '2'), ('k3', '3')]

It is very important to close a cursor when you are through using it. For this reason, it is recommended you use the LSM.cursor() context-manager, which ensures the cursor is closed properly.

Transactions

The SQLite4 LSM database supports nested transactions. The simplest way to use transactions is with the LSM.transaction() method, which doubles as a context-manager or decorator.

>>> with db.transaction() as txn:
...     db['k1'] = '1-mod'
...     with db.transaction() as txn2:
...         db['k2'] = '2-mod'
...         txn2.rollback()
...
True
>>> print db['k1'], db['k2']
1-mod 2

You can commit or roll-back transactions part-way through a wrapped block, and a new transaction will begin in the old one's place:

>>> with db.transaction() as txn:
...    db['k1'] = 'outer txn'
...    txn.commit()  # The write is preserved.
...
...    db['k1'] = 'outer txn-2'
...    with db.transaction() as txn2:
...        db['k1'] = 'inner-txn'  # This is commited after the block ends.
...    print db['k1']  # Prints "inner-txn".
...    txn.rollback()  # Rolls back both the changes from txn2 and the preceding write.
...    print db['k1']
...
1              <- Return value from call to commit().
inner-txn      <- Printed after end of txn2.
True           <- Return value of call to rollback().
outer txn      <- Printed after rollback.

If you like, you can also explicitly call LSM.begin(), LSM.commit(), and LSM.rollback().

>>> db.begin()
>>> db['foo'] = 'baze'
>>> print db['foo']
baze
>>> db.rollback()
True
>>> print db['foo']
bar

Performance

I hate bogus benchmarks, but I was pretty curious about the LSM database's performance. I couldn't resist and put together a little benchmark comparing the SQLite4 LSM to LevelDB, BerkeleyDB and KyotoCabinet. These aren't really apples-to-apples comparisons, as KyotoCabinet and BerkeleyDB are built on B-trees, and KyotoCabinet and LevelDB do not support multiple processes accessing the database. I am also not quite clear on whether LevelDB supports transactions. Additionally, the benchmark does not use the database libraries directly, but goes through whatever Python driver was available. Inefficiencies or quirks in the Python libraries could skew results. All that said, let's take a look at the results.

Here are the results (smaller is better):

Testing with N = 100000
------------------------------------

BDBBTree
~~~~~~~~
Writes:        0.469
Reads:         0.479
Range (10%):   0.212
Range (20%):   0.192
Range (40%):   0.185
Range (80%):   0.186

KyotoBTree
~~~~~~~~~~
Writes:        0.208
Reads:         0.203
Range (10%):   0.219
Range (20%):   0.188
Range (40%):   0.188
Range (80%):   0.187

LevelDB
~~~~~~~
Writes:        0.227
Reads:         0.225
Range (10%):   0.031
Range (20%):   0.027
Range (40%):   0.028
Range (80%):   0.027

LSM
~~~
Writes:        0.282
Reads:         0.239
Range (10%):   0.059
Range (20%):   0.052
Range (40%):   0.052
Range (80%):   0.052

My interpretation of these numbers is that BerkeleyDB and KyotoCabinet perform about as one would expect when fetching ranges of keys, that is, similar to how they perform when reading at random. LevelDB and LSM, though, were significantly faster reading ranges of keys, and quite fast at writing as well.

LevelDB outperformed the SQLite4 LSM, but SQLite was still a good deal faster than the B-Trees when it came to reading ranges. I had to do a little debugging with the LSM benchmark because the LSM reads were coming in 4 times slower than the writes! I initially thought something was just slow about reading, until I looked more closely and realized that creating a Python cursor wrapper for each fetch() was the issue. And indeed, it was, because by replacing the Python code with a couple calls directly to the C APIs I was able to get much faster reads. If you've chosen to try out the LSM Python bindings, make sure you're on at least version 0.1.4, as previous versions have the slow fetch() implementation.

A note on SQLite4

If you'd like to build SQLite4 for yourself, you can clone the fossil repository and compile it. You can also build your own copies of the source amalgamation (single source file for easy embedding). To do this:

$ fossil clone http://www.sqlite.org/src4/ sqlite4.fossil
$ mkdir sqlite4-build
$ cd sqlite4-build
$ fossil open ../sqlite4.fossil
$ ln -s Makefile.linux-gcc Makefile
$ export CFLAGS="$CFLAGS -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_COLUMN_METADATA=1 -DSQLITE_ENABLE_UNLOCK_NOTIFY -DSQLITE_SECURE_DELETE
$ make

You will now find sqlite4 (binary), libsqlite4.a, and sqlite4.h.

If you want to create an amalgamation, you can run make sqlite4.c.

I should also mention that the current status of SQLite4 is...well...unknown. Dr. Hipp has indicated (or I seem to remember him say so) that he plans to continue supporting SQLite3. Since that is where his users are, I don't blame him. So, if I were an end-user, I'd look at SQLite4 as an experiment. One that may possibly be revived, but not necessarily so, and even then, possibly not in its current form.

Want more?

If you'd like to learn more, the following links may be helpful:

Comments (1)

Paddy3118 | aug 15 2015, at 05:25am

Thanks Charles. You taught me something and I'll keep a link to explore further :-)

Commenting has been closed.