Search: "peewee AND sqlite"
For the past month or so I've been working on writing my own ORM in Python. The project grew out of a need for a lightweight persistence layer for use in Flask web apps. As I've grown so familiar with the Django ORM over the past year, many of the ideas in Peewee are analagous to the concepts in Django. My goal from the beginning has been to keep the implementation simple without sacrificing functionality, and to ultimately create something hackable that others might be able to read and contribute to.
In this post I'd like to share some techniques for querying the top item by group using the Peewee ORM. For example,
- List the most recent tweet by each of my followers.
- List the highest severity open bug for each of my open source projects.
- List the latest story in each section of a news site.
This is a common task, but one that can be a little tricky to implement in a single SQL query. To add a twist, we won't use window functions or other special SQL constructs, since they aren't supported by SQLite. If you're interested in finding the top N items per group, check out this follow-up post.
This post is a follow-up to my post about querying the top related item by group. In this post we'll go over ways to retrieve the top N related objects by group using the Peewee ORM. I've also presented the SQL and the underlying ideas behind the queries, so you can translate them to whatever ORM / query layer you are using.
Retrieving the top N per group is a pretty common task, for example:
- Display my followers and their 10 most recent tweets.
- In each of my inboxes, list the 5 most recent unread messages.
- List the sections of the news site and the three latest stories in each.
- List the five best sales in each department.
In this post we'll discuss the following types of solutions:
- Solutions involving
- Solutions involving
- Window functions
- Postgresql lateral joins
For the past week or two I've been spending some of my spare time working on a web-based SQLite database browser. I thought this would be a useful project, because I've switched all my personal projects over to SQLite and foresee using it for pretty much everything. It also dovetailed with some work I'd been doing lately on peewee regarding reflection and code generation. So it seemed like some pretty good bang/buck, especially given my perception that there weren't many SQLite browsers out there (it turns out there are quite a few, however). I'm sharing it in the hopes that other devs (and non-devs?) find it useful.
SQLite is a fantastic database and in this post I'd like to explain why I think that, for many scenarios, SQLite is actually a great choice. I hope to also clear up some common misconceptions about SQLite.
Small. Fast. Reliable. Choose any three.
I made the decision this week to migrate my personal sites and several other sites I host onto SQLite. Previously almost everything I hosted had been using Postgresql. The move was motivated by a couple factors:
- SQLite is awesome!
- Self-contained: does not require a separate server process
- Data is stored in a single file, simplifying backups
- Excellent Python (and peewee) support
- Full-text search
At times it has seemed to me that there is a tacit agreement within the Flask / Django communities that if you're using SQL you should be using Postgresql. Postgresql is an amazing piece of engineering. I have spent the last five years of my career working exclusively with it, and I am continually impressed by its performance and the constant stream of great new features.
So why change things?
Well, as my list indicates, there are a handful of reasons. But the primary reason was that I wanted something lightweight. I'm running a fairly low-traffic, read-heavy site, so Postgresql was definitely overkill. My blog is deployed on a VPS with very limited resources, so every MB of RAM counts. Additionally, I wasn't using any special Postgresql features so there was nothing holding me back.
SQLite is an embedded database, which means that instead of running as a separate server process, the actual database engine resides within the application. This makes it possible for the database to call directly into the application when it would be beneficial to add some low-level, application-specific functionality. SQLite provides numerous hooks for inserting user code and callbacks, and, through virtual tables, it is even possible to construct a completely user-defined table. By extending the SQL language with Python, it is often possible to express things more elegantly than if we were to perform calculations after the fact.
In this post I'll describe how to extend SQLite with Python, adding functions and aggregates that will be callable directly from any SQL queries you execute. We'll wrap up by looking at SQLite's virtual table mechanism and seeing how to expose a SQL interface over external data sources.
SQLCipher, created by Zetetic, is an open-source library that provides transparent 256-bit AES encryption for your SQLite databases. SQLCipher is used by a large number of organizations, including Nasa, SalesForce, Xerox and more. The project is open-source and BSD licensed. Best of all, there are open-source python bindings.
In this post, I'll show how to compile SQLCipher and the pysqlcipher bindings, then use peewee ORM to work with an encrypted SQLite database.
In this post I will show how to use SQLite full-text search with Python (and a lot of help from peewee ORM). We will see how to index content for searching, and how to order search results using two ranking algorithms.
Last week I migrated my site from Postgresql to SQLite. I had been using Redis to power my site's search, but since SQLite has an awesome full-text search extension, I decided to give it a try. I am really pleased with the results, and being able to specify boolean search queries is an added plus. Here is a brief overview of the types of search queries SQLite supports:
- Simple phrase: peewee would return all docs containing the word peewee.
- Prefix queries: py* would return docs containing Python, pypi, etc.
- Quoted phrases: "sqlite extension"
NEAR: peewee NEAR sqlite would return docs containing the words peewee and sqlite with no more than
10intervening words. You can also specify the max number of intervening words, e.g. peewee NEAR/3 sqlite.
NOT: sqlite OR postgresql AND NOT mysql would return docs about high-quality databases (just trollin).
Check out the full post for details on adding full-text search to your project.
SQLite's write lock and pysqlite's clunky transaction state-machine are a toxic combination for multi-threaded applications. Unless you are very diligent about keeping your write transactions as short as possible, you can easily wind up with one thread accidentally holding a write transaction open for an unnecessarily long time. Threads that are waiting to write will then have a much greater likelihood of timing out while waiting for the lock, giving the illusion of poor performance.
In this post I'd like to share a very effective technique for performing writes to a SQLite database from multiple threads.