Five reasons you should use SQLite in 2016

January 06, 2016 20:49 / berkeleydb python sqlite / 17 comments

If you haven't heard, SQLite is an amazing database capable of doing real work in real production environments. In this post, I'll outline 5 reasons why I think you should use SQLite in 2016.

PS: the site you're reading this post on? Yep, SQLite.

Simple to manage

Have you ever managed a Postgres database? There are quite a few things you need to understand in order to ensure the database server is properly configured (shared buffers, effective cache size, work mem, maintenance work mem, wal buffers...). Upgrading can be a scary process as well, and you may need to take your database offline, run a program against it to upgrade it, and hope that it works when you turn things back on. And furthermore, do you know where exactly your postgres database is stored? Can you point somewhere and say, "that's my database"?

(Now, we all know that there are many situations where nothing but Postgres (or MySQL, Oracle, SQL Server, etc) will do for the requirements of the application. My purpose here is not to say that it is otherwise. Rather I just want to highlight the difference between managing a SQLite db compared to a typical database server.)

SQLite is easy to manage -- it's a single file (or at some times a file + transaction log). The file format is stable across major versions, so if I had a SQLite database file from version 3.0.0 (back in 2004), I could read it using the latest SQLite 3.10.0. If I want to take the database file with me on a thumb drive, I just copy the file, or better yet keep it in my dropbox folder. If I want to back things up every night, I just sync the database file to S3. If I want to share some data analysis I'm doing with a coworker, I can just send them a copy of the database file and they are ready to go. Having the database in a single file with a stable format is a feature.

What's more, SQLite is very easy to configure. SQLite features are managed in two ways: compilation flags and PRAGMA statements (run-time configuration). There are no configuration files to speak of, you just build the library with the features you want, then configure the run-time options when you create a database connection.

Constantly improving yet rock-solid stability

SQLite is actively developed by some truly amazing software engineers. New features of high quality are added at an impressive rate. Just recently SQLite added support for JSON data via the json1 extension. SQLite also released an improved version of the full-text search extension, which includes result ranking using the BM25 algorithm.

In addition to adding new features, SQLite developers also are working to make the library more performant. In version 3.8.11, the release notes contain this little blurb:

SQLite now runs twice as fast as version 3.8.0 and three times as fast as version 3.3.9

Despite all these changes and enhancements, SQLite rarely introduces bugs. The SQLite test suite is widely regarded to be one of the best in the industry and the How SQLite is tested page frequently appears on HackerNews as developer after developer discovers this impressive document.

Extensible and Hackable

My personal favorite feature of SQLite is its extensibility. Because SQLite is embedded by your application, it runs in the same address space and can execute application code on your behalf. Both the Python standard library SQLite driver, pysqlite, and the alternative driver apsw provide APIs for defining custom SQL functions, aggregate functions, and collations. apsw goes one step further and provides APIs for defining virtual tables and virtual filesystems!

As a practical example, suppose you have a column in a table that is storing URLs and you wish to determine which are the most common hostnames -- if you were using a different database you would be forced to write a complicated regex, a combination of string-manipulation functions, or pull the data into your app and do the calcuations in code. With SQLite, you can define a hostname function in Python, and then use it to create a simple COUNT query:

from urlparse import urlparse

def hostname(url):
    return urlparse(url).netloc

conn = sqlite3.connect('my-database.db')
conn.create_function('hostname', 1, hostname)  # name, num_params, func

SELECT hostname(mytable.url), COUNT(mytable.id) AS ct
FROM mytable
GROUP BY hostname(mytable.url)
ORDER BY ct DESC;

You can also create aggregate functions, which accept 0..n values, and produce a single output value. Examples might be calculating the standard deviation, generating a string by processing the values in some way, doing some type of classification, etc.

Virtual tables, which are currently only supported by apsw, allow you to define a table in code and then query it as if it were a normal SQL table, even though the backing data might be completely dynamic. For example, I wrote a simple virtual table that allows you to query Redis as if it were a SQL table.

You can also write eponymous functions, which are functions that return 0..n rows of results. An example might be a regular-expression search function, which processes input and yields rows of matching tokens. Peewee provides convenient tooling for writing table-functions or you can use a standalone library, sqlite-vtfunc.

Virtually every aspect of SQLite can be manipulated by your application.

Lightning fast

SQLite is very fast. It runs on the same machine, so there is no networking overhead when executing queries or reading results. It runs in the same address space, so there is no wire protocol, serialization or need to communicate via unix sockets. SQLite runs on mobile devices where resources are scarce and efficiency is crucial. SQLite also supports a huge number of compilation flags that allow you to remove features you don't plan to use.

SQLite's speed makes up for one of it's (perceived) greatest shortcomings, which is database-file locking for writes. By writing data incredibly quickly, database locking only becomes an issue when there is a great deal of concurrent writers.

WAL Mode

The 3.7.0 release of SQLite added a new journaling method that utilizes a write-ahead log. By itself this isn't really exciting news, but what it means for web application developers (or anyone dealing with concurrency) is that readers no longer block writers, and vice-versa. Or to put it another way, reading and writing can both occur simultaenously. Without WAL mode, in order to write to the database, the writer would acquire exclusive access to the database and no reading could occur until the writer was finished.

Here's an example illustrating the difference between the two. Let's say we have two processes, a writer and a reader. The writer opens an exclusive transaction (indicating the intent to write). Next, reader opens a transaction. The reader then attempts to issue a SELECT statement:

Journal mode = "delete" (the default):

Writer: BEGIN EXCLUSIVE
Reader: BEGIN
Reader: SELECT * FROM foo; Error: database is locked

Journal mode = "wal":

Writer: BEGIN EXCLUSIVE
Reader: BEGIN
Reader: SELECT * FROM foo; Returns table contents

It's worth noting, however, that even if you don't enable WAL mode, writes typically occur in milliseconds. This is such a small amount of time that you will only notice problems if you have very high concurrency or very long write transactions.

Bonus reason: BerkeleyDB

BerkeleyDB's SQLite integration can give application developers needing concurrent database access even better performance, because rather than locking the entire database, BerkeleyDB only needs to lock the individual page(s). This allows BerkeleyDB to scale more efficiently under concurrent database load, provided the transactions aren't contending for the same page of data. BerkeleyDB also supports multi-version concurrency control (MVCC), which allows read operations to continue to occur on a page of data being operated on by a write transaction.

An additional benefit of BerkeleyDB is increased efficiency. In other words, BerkeleyDB may use less system resources and executes fewer system calls. You can find more details in this whitepaper and this brief technical overview.

BerkeleyDB's SQL interface is a drop-in replacement for SQLite, and supports the same APIs and features. BerkeleyDB provides a few additional features like replication (SQLite has a backup utility, but my understanding is that it is not as powerful as BDB's), encryption, and of course all the features of BerkeleyDB itself.

One major downside of using BerkeleyDB is that it is very sensitive to configuration values, and getting the right page size, cache size and other settings requires deep knowledge. The other downside is the licensing - to read more about the BerkeleyDB license, check out Oracle's licensing page.

For instructions on compiling a standalone Python SQLite driver for use with BerkeleyDB, check out this post.

The end

I hope you'll give SQLite a try. Don't believe the FUD about it not being production-worthy, or not being suitable for use in web-applications.

If you'd like to read more, SQLite itself has a nice document describing when to use SQLite, which also includes a list of situations where another RDBMS would work better. I've also written a shorter post in the same vein called SQLite: Small. Fast. Reliable. Choose any three which you might also enjoy. I've written a python-focused post Going fast with Sqlite and Python and Peewee provides a wide range of sqlite extensions. Lastly, you can check out my posts tagged "sqlite" for more topical posts.

Thanks so much for taking the time to read, I hope you'll leave a comment if you have any questions, comments or suggestions!

Comments (17)

Fabiano Gomes | jan 13 2016, at 05:08pm

We use SQLITE to persist information in a non-OS system, called starterwere. Nice post.

Ruel | jan 10 2016, at 08:47am

Looks nice, but I would be guessing this is not suitable for horizontally scalable applications. Or prove me wrong?

Olivier | jan 09 2016, at 02:38pm

Another compelling reason to use SQLite is SQLCipher.

Ramiro | jan 09 2016, at 01:54pm

I wasn't aware of how easy you can extend SQLite from within Python, that is really neat.

I mainly use SQLite for local prototyping, but also in a few production systems. So far I've not experienced performance issues. That said, is there an upper limit in DB size, where you'd always say no to SQLite and choose something like Postgres instead? I know there are a lot of variables in play, but let's assume I have a table with ten million records, that are only read, is that a number SQLite can handle well?

Great post btw. I didn't need to be persuaded, but I learned something new.

Rene Kriest | jan 08 2016, at 01:26am

Bonus reason: Android uses it as their standard storing mechanism for apps - tested on billion devices. ;)

Charles | jan 07 2016, at 07:37pm

Stephen, I think you must have selectively read the article looking for Postgres-bait. If you look again you'll see:

Now, we all know that there are many situations where nothing but Postgres (or MySQL, Oracle, SQL Server, etc) will do for the requirements of the application. My purpose here is not to say that it is otherwise.

We all know that concurrent writes are SQLite's Kryptonite. If you're building a web-app and expect lots of concurrent writes, obviously you shouldn't be using SQLite. It's as simple as that. There's no need to extrapolate from my post that I'm saying Postgres is inferior or that you should only ever use SQLite for web apps.

Stephen A. Goss | jan 07 2016, at 07:31pm

"Don't believe the FUD about it not being production-worthy, or not being suitable for use in web-applications." Really, now. I've seen it put in production for web applications and it has always ended in tears. SQLite is a single file, there's no server, so concurrent access is managed via system level file locking. What a performance nightmare. If you get more than a couple users of your website, you'll see all sorts of bad performance and long query times and locking issues and exceptions thrown in your code. The default Postgres configuration handles many connections and has no trouble with concurrent access, concurrent writes, etc. With default configuration, Postgres will just completely annihilate SQLite performance for web applications.

For single threaded database use, SQLite is great, like a client-side database. It's not for the server.

Ryan Stuart | jan 07 2016, at 06:02pm

It's also worth checking out SQLightning which replaces the SQLite b-tree engine with LMDB. It's faster again!

Charles | jan 07 2016, at 04:49pm

Esa -- the Peewee migrations module supports all standard DDL operations on SQLite databases. The way it works is by creating a temporary table with the updated schema, copying the data over, dropping the original and finally renaming the temp table. All this happens automatically for you. Similarly, if you're using SQLite-web, the web-based SQLite database manager I wrote, you can do the same.

Esa Varemo | jan 07 2016, at 04:29pm

As a web application developer I'd love to use SQLite but the limited ALTER TABLE support means development using some popular ORMs is really tedious. For example there is no ALTER COLUMN. Automatic generation of bases for migrations fails. It's just faster to setup a local mysql/postgresql instance than manually start fixing the tables. Since those are the dbs the apps are tested on, those are also the dbs we run in production.

Thom | jan 07 2016, at 01:59pm

I can't imagine anyone attempting to claim sqlite isn't production-worthy. It's running on literally 10s of millions of devices around the world.

Rene Kroon | jan 07 2016, at 01:52pm

I have used SQLite on embedded production systems since 2012. Its reliability is actually a pro there.

Evert Albers | jan 07 2016, at 01:48pm

As a designer, I'm not really familiar with the technical lingo myself, but I came across a link this article on the chat of the developers of the Bolt CMS who are enthousiastic about SQlite indeed. (Bolt offers SQlite, MySQL and Postgres as database options)

I can confirm that SQlite has served me very well with some fairly large sites, and that moving a site from test to production doesn't come any easier than this.

Bob | jan 07 2016, at 01:48pm

We support SQLite in Bolt CMS as well, and I really like it. People often dismiss it as a 'toy database', but it really is quite capable.

This article about how SQLite is tested is also very worthwhile to read: https://www.sqlite.org/testing.html

Christopher Browne | jan 07 2016, at 01:32pm

I'd rather that the reasons surrounded "why is it better than way worse options?" Competing with multi-user databases is not the fight you should fight; at some point, the need to share data means that you forcibly need to upgrade to something that supports multiple concurrent users, and I'd think you'd rather remain on good terms with "bigger brothers" for that. The argumentation I'd go with is instead to point at why SQLite is a better option than fopen()/fread()/fwrite(), in that it gives you well structured data at little cost in the code, and makes it vastly less likely that you'll be suffering from race conditions in your applications. This is why Android and IOS devices operate mobile apps via wee hives of SQLite databases. Those are environments that benefit from the strengths of SQLite, and are injured not at all by its limitations.

Lynn Fredricks | jan 07 2016, at 11:46am

We've also integrated SQLite into our Valentina Server, a database and reports server for Windows, Mac OS X and Linux. The other db in it is our own columnar database system. We hope you'll check it out.

Don Holloway | jan 07 2016, at 08:46am

You convinced me to give it a good look.

Commenting has been closed.