SQLite: Small. Fast. Reliable. Choose any three.
SQLite is a fantastic database and in this post I'd like to explain why I think that, for many scenarios, SQLite is actually a great choice. I hope to also clear up some common misconceptions about SQLite.
As the maintainer of an ORM, I get asked a lot of questions about databases. One of the more common questions is:
Can I use SQLite in my web app? I've heard it doesn't do concurrency.
SQLite does support multiple concurrent connections, and therefore it can be used with a multi-threaded or multi-process application. The catch is that when SQLite opens a write transaction, it will lock all the tables.
Locking the entire database may sound scary, but typically writes occur very quickly, and there are ways to make them faster (discussed below). So, even though the database is locked while you are writing, the writes happen so quickly that it's only likely to be an issue if you are doing a lot of concurrent writes.
Additionally, the Python SQLite driver will automatically retry queries that fail due to the
database being locked. You can configure this behavior using the
parameter, which has a default value of 5 seconds. Since most writes occur
in milliseconds, the 5 second timeout is a relative eternity.
If you'd like to learn more about SQLite's locking behavior, I'd suggest reading the SQLite locking documentation.
Faster concurrent operations
By using write-ahead logging (WAL), available since version 3.7.0, you can gain additional speed when performing concurrent operations. WAL allows multiple readers to operate on the database, even while a write is occuring. To enable WAL, execute the following query after opening a new connection to the database:
Another way to achieve higher concurrency is to compile BerkeleyDB with a SQLite-compatible API. Then, by compiling the python SQLite driver against the BerkeleyDB SQLite library, you can achieve some pretty awesome performance. The image below shows how BerkeleyDB delivers many more transactions-per-second when using multiple threads.
If you're interested in trying out the BerkeleyDB SQLite library, this post
contains instructions for compiling BerkeleyDB and the Python SQLite driver.
Also, if you're using peewee ORM, you can use the
berkeley_build.sh shell script (contained
playhouse directory). This script will download BerkeleyDB from Oracle's site, compile it with support
for the SQL API, then compile
pysqlite against the BerkeleyDB SQLite driver.
SQLite is not a "real" database
I've occasionally heard people say that SQLite is not a real database. The reasons for this opinion vary, but here are some that come to mind:
- Column data-types are not enforced, e.g. you can store text in an integer column.
- After a table is created, columns cannot be dropped or renamed (though you can add new columns).
- Database is locked for writes.
On the other hand, look at all the features that SQLite does provide:
- Transactions are atomic, consistent, isolated and durable (ACID).
- Support for a wide variety of SQL queries.
- Indexes, constraints, foreign keys, views, and partial support for triggers.
- Savepoints (nested transactions).
- Multiple concurrent connections to the database.
- Databases can be up to 140TB in size.
- SQLite can recover from program crashes, operating system crashes, and even power failures.
- Write your own extensions, define your own aggregates, collations, and SQL functions.
- Full-text search and other neat extensions.
Every database I've used differs slightly in the subset of SQL it supports,
and even though the data-types and
ALTER TABLE behavior of SQLite are unusual,
I don't think either of them means that SQLite is not a real database.
Working around ALTER TABLE
The standard approach when altering or dropping columns from a SQLite database is to create a new table, copy the data you want, then replace the old table with the new. This can be quite cumbersome, especially when your table has a large number of columns.
To make this easier, I wrote a lightweight schema migrations extension for the peewee ORM. I'm not positive, but I imagine something similar exists for SQLAlchemy users. This extension will automatically handle creating the temporary table and copying over the data, so all you need to do is tell peewee which columns to drop (or rename, etc):
from playhouse.migrate import * migrator = SqliteMigrator('mydata.db') migrate( migrator.drop_column('some_table', 'some_column'), migrator.drop_column('some_table', 'another_column'), migrator.rename_column('some_table', 'old_colname', 'new_colname'), )
Even if you aren't using an ORM you can still use the migrator library to help with schema changes.
When would SQLite not be a good choice?
Now that I've sung the praises of SQLite and described some ways to work around its limitations, I'd like to describe some scenarios where SQLite would be a bad choice.
- Multiple servers communicating with the database over the network, or situations in which you plan to run your database on a separate server.
- Highly concurrent applications, including websites that receive a very large amount of traffic.
- Very large data-sets.
The main consideration there, I think, is when you have multiple web-servers and need to connect to your database over the network. While you can use SQLite over an NFS mount, for instance, the SQLite documentation seems to indicate that this may be buggy.
Both SQLite and Postgres have their place in the developer's toolkit. Postgres is a phenomenal database, and I could go on and on about how awesome it is. But this post is about clearing up some misconceptions about SQLite, and hopefully convincing you that it is a great choice for many applications, including web development.
Donald Knuth said,
Premature optimization is the root of all evil.
In that vein, don't be afraid to give SQLite a try -- I'd be willing to bet it would work very well for you!
Oh, and in case you were wondering, yes -- this site uses SQLite!
I was asked on a Reddit discussion to quantify how much traffic a site should receive before it is "too much". This obviously depends on a lot of factors (ratio of reads to writes, length of time it takes to perform a write, disk speed, etc). I will share a quote from the SQLite docs, though:
SQLite usually will work great as the database engine for low to medium traffic websites (which is to say, 99.9% of all websites). The amount of web traffic that SQLite can handle depends, of course, on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
Thanks for taking the time to read this post, if you have any questions or comments, please feel free to leave a comment below.
If you'd like to learn more about the topics mentioned in this post, here are some links which you might find useful:
- SQLite concurrency and locking and write-ahead logging.
- BerkeleyDB SQLite API and comparison of performance.
- Instructions on compiling pysqlite for use with BerkeleyDB.
- Peewee schema migrations.
Commenting has been closed, but please feel free to contact me