Creating a standalone Python driver for BerkeleyDB's SQLite front-end

photos/sqlite-bdb.png

In this post I'll provide instructions for building a standalone Python sqlite3-compatible module which is powered by BerkeleyDB. This is possible because BerkeleyDB provides a SQL frontend, which essentially is the SQLite we all know and love with the b-tree code ripped out and replaced with BerkeleyDB.

Why?

Why would anyone use BerkeleyDB's SQLite in the first place? There are a couple of advantages, but the one most likely to be relevant is that BerkeleyDB uses page-level locking while SQLite locks the database during writes. As a result, BerkeleyDB has much higher transaction throughput when multiple threads or processes are involved. To read more about BerkeleyDB and SQLite, check out this document, which compares the behaviors of each database. You can also learn more by reading the BerkeleyDB SQLite API doc.

How?

There are a number of ways you could get Python talking to BerkeleyDB SQLite (lib_dbsql), but I think the cleanest and best way is to create a completely standalone Python driver by statically linking against lib_dbsql.

The latest version of BerkeleyDB at the time of writing is 18.1.25, and it can be downloaded for free after signing up on Oracle's website. I recommend using this release, as the version of SQLite bundled with it is relatively new (3.18.0) and includes things like JSON support and the new FTS5 full-text search extension.

Patching a bug

Once you've got the BerkeleyDB source downloaded and extracted somewhere, you're almost ready to go. I ran into an unfortunate bug in src/repmgr/repmgr_net.c when attempting to compile on a newer Ubuntu. The fix is really simple, though -- here's a diff:

--- orig_repmgr_net.c   2018-07-17 18:38:14.071591439 -0500
+++ repmgr_net.c    2018-07-17 18:36:40.302562092 -0500
@@ -2894,6 +2894,7 @@
    if (ssl_ctx)
        SSL_CTX_free(ssl_ctx);

+#if OPENSSL_VERSION_NUMBER < 0x10100000L
    if (repmgr_ssl_mutex_arr != NULL) {
        for (i = 0; i < CRYPTO_num_locks(); i++) {
            if (repmgr_ssl_mutex_arr[i] != NULL)
@@ -2902,6 +2903,7 @@

        repmgr_ssl_mutex_arr = NULL;
    }
+#endif

    return (1);
 }

We're just adding guards around an if statement so that it is only compiled when the system OpenSSL version is old. I'm guessing the BerkeleyDB devs test this shit on older machines?

Compiling

To compile BerkeleyDB with SQLite support, we'll specify the define macros we want and do configure -> make -> make install into a directory of your choosing. In the commands that follow, we'll use ~/code/berkeleydb as the destination for our static library and Python driver.

We'll begin by declaring a couple environment variables, setting the path for our code and the compilation flags so our SQLite is compiled with all the nice bells-and-whistles:

export BVER='18.1.25'
export BDB="$HOME/code/berkeleydb"
export CFLAGS="-DSQLITE_ALLOW_COVERING_INDEX_SCAN=1 \
-DSQLITE_DEFAULT_CACHE_SIZE=10000 \
-DSQLITE_DEFAULT_FOREIGN_KEYS=1 \
-DSQLITE_ENABLE_COLUMN_METADATA=1 \
-DSQLITE_ENABLE_FTS3 \
-DSQLITE_ENABLE_FTS3_PARENTHESIS \
-DSQLITE_ENABLE_FTS4 \
-DSQLITE_ENABLE_FTS5 \
-DSQLITE_ENABLE_JSON1 \
-DSQLITE_ENABLE_STAT4 \
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \
-DSQLITE_HAS_CODEC \
-DSQLITE_LIKE_DOESNT_MATCH_BLOBS \
-DSQLITE_MAX_EXPR_DEPTH=0 \
-DSQLITE_SOUNDEX \
-DSQLITE_TEMP_STORE=3 \
-DSQLITE_USE_URI \
-DHAVE_ISNAN \
-DHAVE_USLEEP \
-O2 \
-fPIC"
export LDFLAGS="-lm"

Now, change directories to the source code and we'll compile BerkeleyDB and SQLite. As an added bonus, we'll compile BerkeleyDB with cryptography, so our databases can be encrypted.

mkdir -p $BDB  # Create destination directory for library and headers.
cd /path/to/db-18.2.5/  # Change directories to the berkeleydb source code.
cd build_unix/  # Change to the "build_unix" subdirectory.
../dist/configure \
  --enable-static \
  --enable-sql \
  --prefix="$BDB" \
  --with-cryptography=yes
make -j8
make prefix="$BDB/" install

Assuming everything went successfully, we're now ready to build our standalone Python SQLite driver. We'll use pysqlite3, which is essentially the standard library sqlite3 module packaged as a standalone library.

cd $BDB
git clone git@github.com:coleifer/pysqlite3
cd pysqlite3
echo -e "[build_ext]\nlibraries=ssl\ninclude_dirs=$BDB/include\nlibrary_dirs=$BDB/lib" > setup.cfg
sed -i "s|Extension(|Extension(extra_objects=['$BDB/lib/libdb_sql.a'],|g" setup.py
find src -name "*.h" -exec sed -i "s|sqlite3.h|dbsql.h|g" {} \;

# Provide a custom name for our library: "pybdblite"
find . -name "*.py" -exec sed -i "s|pysqlite3|pybdblite|g" {} \;
python setup.py build

You can test that the build succeeded by running the following:

cd build/lib.linux*/
python -c "from pybdblite import dbapi2 as sqlite3;\
  conn = sqlite3.connect(':memory:');\
  print(conn.execute('pragma compile_options;').fetchall())"

Which should print something like:

[('BERKELEY_DB',),
 ('COMPILER=gcc-7.3.0',),
 ('DEFAULT_SYNCHRONOUS=2',),
 ('DEFAULT_WAL_SYNCHRONOUS=2',),
 ('ENABLE_COLUMN_METADATA',),
 ('ENABLE_FTS3',),
 ('ENABLE_FTS3_PARENTHESIS',),
 ('ENABLE_FTS4',),
 ('ENABLE_FTS5',),
 ('ENABLE_JSON1',),
 ('ENABLE_STAT4',),
 ('ENABLE_UPDATE_DELETE_LIMIT',),
 ('HAS_CODEC',),
 ('HAVE_ISNAN',),
 ('LIKE_DOESNT_MATCH_BLOBS',),
 ('SOUNDEX',),
 ('SYSTEM_MALLOC',),
 ('TEMP_STORE=3',),
 ('THREADSAFE=1',)]

At the very top you can see that BERKELEY_DB is one of the compilation options. Success! Our custom pysqlite3 module (pybdblite) is using the BerkeleyDB version of SQLite.

From here, you can either install pybdblite using python setup.py install or symlink it into your PYTHONPATH. Feel free to blow away all the BerkeleyDB source code, libraries, include dirs, etc.

Thanks for reading

Thanks for reading this post, I hope you found these instructions helpful! For more detailed instructions on compiling SQLite, check out my post Compiling SQLite for use with Python applications.

An older version of this post is available to read (though not listed on the site). If you're interested it can be found here.

Comments (0)


Commenting has been closed.