Announcing sophy: fast Python bindings for Sophia Database

December 19, 2015 15:58 / cython kv nosql python / 0 comments

Sophia is a powerful key/value database with loads of features packed into a simple C API. In order to use this database in some upcoming projects I've got planned, I decided to write some Python bindings and the result is sophy. In this post, I'll describe the features of Sophia database, and then show example code using sophy, the Python wrapper.

Here is an overview of the features of the Sophia database:

Append-only MVCC database
ACID transactions
Consistent cursors
Compression
Ordered key/value store
Range searches
Multi-part keys
Prefix searches

The architecture is unique and definitely worthy of a skim if you plan to use Sophia.

Installing sophy

The Sophia sources are bundled with the sophy source code, so the only thing you need to install is Cython. You can install from GitHub or from PyPI.

Pip instructions:

$ pip install Cython sophy

Git instructions:

$ pip install Cython
$ git clone https://github.com/coleifer/sophy
$ cd sophy
$ python setup.py build
$ python setup.py install

Usage

sophy is very simple to use. It acts primarly like a Python dict object, but in addition to normal dictionary operations, you can read slices of data that are returned efficiently using cursors. Similarly, bulk writes using update() use an efficient, atomic batch operation.

To begin, instantiate and open a Sophia database. If the database and path do not exist, they will be created automatically. Multiple databases can exist under the same path:

>>> from sophy import Sophia
>>> env = Sophia('/tmp/sophia-env', [('test-db', 'string')])
>>> env.open()
True
>>> db = env['test-db']  # Environments can have many dbs.
>>> db2 = env.create_database('another-db', 'u64')  # A database keyed by 64-bit unsigned ints.

We can set values individually or in groups:

>>> db['k1'] = 'v1'
>>> db.update(k2='v2', k3='v3', k4='v4')  # Efficient, atomic.
>>> with db.transaction() as txn:  # Same as .update()
...     txn['k1'] = 'v1-e'
...     txn['k5'] = 'v5'
...

We can read values individually or in groups. When requesting a slice, the third parameter (step) is used to indicate the results should be returned in reverse. Alternatively, if the first key is higher than the second key in a slice, sophy will interpret that as ordered in reverse.

>>> db['k1']
'v1-e'
>>> [item for item in db['k2': 'k444']]
[('k2', 'v2'), ('k3', 'v3'), ('k4', 'v4')]

>>> list(db['k3':'k1'])  # Results are returned in reverse.
[('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1-e')]

>>> list(db[:'k3'])  # All items from lowest key up to 'k3'.
[('k1', 'v1-e'), ('k2', 'v2'), ('k3', 'v3')]

>>> list(db[:'k3':True])  # Same as above, but ordered in reverse.
[('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1-e')]

>>> list(db['k3':])  # All items from k3 to highest key.
[('k3', 'v3'), ('k4', 'v4'), ('k5', 'v5')]

Values can also be deleted singly or in groups. To delete multiple items atomically use the transaction() method.

>>> del db['k3']
>>> db['k3']  # 'k3' no longer exists.
Traceback (most recent call last):
  ...
KeyError: 'k3'

>>> with db.transaction() as wb:
...     del wb['k2']
...     del wb['k5']
...     del wb['kxx']  # No error raised.
...
>>> list(db)
[('k1', 'v1-e'), ('k4', 'v4')]

Transactions

As shown above, transactions can be used to effect multiple changes atomically.

>>> with db.transaction() as txn:
...     txn['k2'] = 'v2-e'
...     txn['k1'] = 'v1-e2'
...
>>> list(db[::True])
[('k4', 'v4'), ('k2', 'v2-e'), ('k1', 'v1-e2')]

You can call commit() or rollback() inside the transaction block itself:

>>> with db.transaction() as txn:
...     txn['k1'] = 'whoops'
...     txn.rollback()
...     txn['k2'] = 'v2-e2'
...
>>> list(db.items())
[('k1', 'v1-e2'), ('k2', 'v2-e2'), ('k4', 'v4')]

If an exception occurs in the wrapped block, the transaction will automatically be rolled back.

Cursors

Sophia is an ordered key/value store, so cursors will by default iterate through the keyspace in ascending order. To iterate in descending order, you can specify this using the slicing technique described above. For finer-grained control, however, you can use the cursor() method.

The Database.cursor() method supports a number of interesting parameters:

order: either '>=' (default), '<=' (reverse), '>' (ascending not including endpoint), '<' (reverse, not including endpoint).
key: seek to this key before beginning to iterate.
prefix: perform a prefix search.
keys: include the database key while iterating (default True).
values: include the database value while iterating (default True).

To perform a prefix search, for example, you might do something like:

>>> db.update(aa='foo', abc='bar', az='baze', baz='nugget')
>>> for item in db.cursor(prefix='a'):
...     print item
...
('aa', 'foo')
('abc', 'bar')
('az', 'baze')

Configuration

Sophia supports a huge number of configuration options, most of which are exposed as simple properties on the Sophia database object. For example, to configure Sophia with a set memory limit and number of worker threads:

>>> env.close()  # We may need to close the env to set some options.
True
>>> env.scheduler_threads = 4
>>> env.memory_limit = 1024 * 1000 * 16
>>> env.open()  # Now using 4 threads and 16mb mem limit.
True

You can also force checkpointing, garbage-collection, and other things using simple methods:

>>> env.checkpoint()
>>> env.gc()

Some properties are read-only:

>>> db.index_count
10
>>> len(db)
10
>>> db.status
'online'
>>> db.memory_used
69

Take a look at the configuration docs for more details.

Not mentioned, but cool

Sophia supports multiple key types as well as multi-part keys (up to 8 parts). So while all these examples have used a string key, we can mix and match strings, 32-bit and 64-bit unsigned ints.

Sophia also supports read-only views which reflect the state of the database at a given moment. Views act much like the regular database interface (dictionary-lookup and slicing, cursor support).

Similarity to other projects

The sophy bindings are quite similar to the SQLite4 lsm-db bindings I wrote a couple months ago. Both databases are ordered key/value stores, and the Python bindings both implement a dictionary-like API with the addition of efficient slicing to read multiple key/value pairs. I've written bindings to UnQLite (unqlite-python) and Vedis (vedis-python), both of which are embedded databases, the former is a JSON document store, the latter is Redis-like embedded data-structure store. Lastly, I've got another project kicking around called kvkit which provides a high-level Python API around a number of ordered key/value stores (kyotocabinet, leveldb, rocksdb, etc).

If you'd like to experiment with embedded databases from Python, hopefully one of the above projects will meet your needs! ... and if not, give SQLite a try!.

Thanks for reading

Thanks for taking the time to read this post, I hope you found it interesting! As always, please feel free to write a comment. Happy hacking!

Comments (0)

Commenting has been closed.