Announcing sophy: fast Python bindings for Sophia Database
Sophia is a powerful key/value database with loads of features packed into a simple C API. In order to use this database in some upcoming projects I've got planned, I decided to write some Python bindings and the result is sophy. In this post, I'll describe the features of Sophia database, and then show example code using
sophy, the Python wrapper.
Here is an overview of the features of the Sophia database:
- Append-only MVCC database
- ACID transactions
- Consistent cursors
- Ordered key/value store
- Range searches
- Multi-part keys
- Prefix searches
The architecture is unique and definitely worthy of a skim if you plan to use Sophia.
$ pip install Cython sophy
$ pip install Cython $ git clone https://github.com/coleifer/sophy $ cd sophy $ python setup.py build $ python setup.py install
sophy is very simple to use. It acts primarly like a Python
dict object, but in addition to normal dictionary operations, you can read slices of data that are returned efficiently using cursors. Similarly, bulk writes using
update() use an efficient, atomic batch operation.
To begin, instantiate and open a Sophia database. If the database and path do not exist, they will be created automatically. Multiple databases can exist under the same path:
>>> from sophy import Sophia >>> env = Sophia('/tmp/sophia-env', [('test-db', 'string')]) >>> env.open() True >>> db = env['test-db'] # Environments can have many dbs. >>> db2 = env.create_database('another-db', 'u64') # A database keyed by 64-bit unsigned ints.
We can set values individually or in groups:
>>> db['k1'] = 'v1' >>> db.update(k2='v2', k3='v3', k4='v4') # Efficient, atomic. >>> with db.transaction() as txn: # Same as .update() ... txn['k1'] = 'v1-e' ... txn['k5'] = 'v5' ...
We can read values individually or in groups. When requesting a slice, the third parameter (
step) is used to indicate the results should be returned in reverse. Alternatively, if the first key is higher than the second key in a slice,
sophy will interpret that as ordered in reverse.
>>> db['k1'] 'v1-e' >>> [item for item in db['k2': 'k444']] [('k2', 'v2'), ('k3', 'v3'), ('k4', 'v4')] >>> list(db['k3':'k1']) # Results are returned in reverse. [('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1-e')] >>> list(db[:'k3']) # All items from lowest key up to 'k3'. [('k1', 'v1-e'), ('k2', 'v2'), ('k3', 'v3')] >>> list(db[:'k3':True]) # Same as above, but ordered in reverse. [('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1-e')] >>> list(db['k3':]) # All items from k3 to highest key. [('k3', 'v3'), ('k4', 'v4'), ('k5', 'v5')]
Values can also be deleted singly or in groups. To delete multiple items atomically use the
>>> del db['k3'] >>> db['k3'] # 'k3' no longer exists. Traceback (most recent call last): ... KeyError: 'k3' >>> with db.transaction() as wb: ... del wb['k2'] ... del wb['k5'] ... del wb['kxx'] # No error raised. ... >>> list(db) [('k1', 'v1-e'), ('k4', 'v4')]
As shown above, transactions can be used to effect multiple changes atomically.
>>> with db.transaction() as txn: ... txn['k2'] = 'v2-e' ... txn['k1'] = 'v1-e2' ... >>> list(db[::True]) [('k4', 'v4'), ('k2', 'v2-e'), ('k1', 'v1-e2')]
You can call
rollback() inside the transaction block itself:
>>> with db.transaction() as txn: ... txn['k1'] = 'whoops' ... txn.rollback() ... txn['k2'] = 'v2-e2' ... >>> list(db.items()) [('k1', 'v1-e2'), ('k2', 'v2-e2'), ('k4', 'v4')]
If an exception occurs in the wrapped block, the transaction will automatically be rolled back.
Sophia is an ordered key/value store, so cursors will by default iterate through the keyspace in ascending order. To iterate in descending order, you can specify this using the slicing technique described above. For finer-grained control, however, you can use the
Database.cursor() method supports a number of interesting parameters:
'>'(ascending not including endpoint),
'<'(reverse, not including endpoint).
key: seek to this key before beginning to iterate.
prefix: perform a prefix search.
keys: include the database key while iterating (default
values: include the database value while iterating (default
To perform a prefix search, for example, you might do something like:
>>> db.update(aa='foo', abc='bar', az='baze', baz='nugget') >>> for item in db.cursor(prefix='a'): ... print item ... ('aa', 'foo') ('abc', 'bar') ('az', 'baze')
Sophia supports a huge number of configuration options, most of which are exposed as simple properties on the
Sophia database object. For example, to configure
Sophia with a set memory limit and number of worker threads:
>>> env.close() # We may need to close the env to set some options. True >>> env.scheduler_threads = 4 >>> env.memory_limit = 1024 * 1000 * 16 >>> env.open() # Now using 4 threads and 16mb mem limit. True
You can also force checkpointing, garbage-collection, and other things using simple methods:
>>> env.checkpoint() >>> env.gc()
Some properties are read-only:
>>> db.index_count 10 >>> len(db) 10 >>> db.status 'online' >>> db.memory_used 69
Take a look at the configuration docs for more details.
Not mentioned, but cool
Sophia supports multiple key types as well as multi-part keys (up to 8 parts). So while all these examples have used a string key, we can mix and match strings, 32-bit and 64-bit unsigned ints.
Sophia also supports read-only views which reflect the state of the database at a given moment. Views act much like the regular database interface (dictionary-lookup and slicing, cursor support).
Similarity to other projects
sophy bindings are quite similar to the SQLite4 lsm-db bindings I wrote a couple months ago. Both databases are ordered key/value stores, and the Python bindings both implement a dictionary-like API with the addition of efficient slicing to read multiple key/value pairs. I've written bindings to UnQLite (unqlite-python) and Vedis (vedis-python), both of which are embedded databases, the former is a JSON document store, the latter is Redis-like embedded data-structure store. Lastly, I've got another project kicking around called kvkit which provides a high-level Python API around a number of ordered key/value stores (kyotocabinet, leveldb, rocksdb, etc).
If you'd like to experiment with embedded databases from Python, hopefully one of the above projects will meet your needs! ... and if not, give SQLite a try!.
Thanks for reading
Thanks for taking the time to read this post, I hope you found it interesting! As always, please feel free to write a comment. Happy hacking!
Commenting has been closed, but please feel free to contact me