Recently I've been receiving a lot of questions about how to manage database connections with the peewee ORM. I thought I'd write a short post explaining the various ways peewee can help you manage your database connections, and outline some general best practices.
In this post I will describe how to build a simple naive bayes classifier with Python and the Kyoto Cabinet key/value database. I'll begin with a short description of how a probabilistic classifier works, then we will implement a simple classifier and put it to use by writing a spam detector. The training and test data will come from the Enron spam/ham corpora, which contains several thousand emails that have been pre-categorized as spam or ham.
A couple weekends ago I got it into my head that I would build a thin Python wrapper for working with Redis. Andy McCurdy's redis-py is a fantastic low-level client library with built-in support for connection-pooling and pipelining, but it does little more than provide an interface to Redis' built-in commands (and rightly so). I decided to build a project on top of redis-py that exposed pythonic containers for the Redis data-types. I went on to add a few extras, including a cache and a declarative model layer. The result is walrus.
I'm interested in learning to use ElasticSearch, so I thought I'd document how I set it up on my EC2 instance. Because I wanted to write code on my laptop, I needed to expose ElasticSearch over the public internet, which added a bit of extra complexity. Here is a rough outline of the process:
After writing about building a nicer regex API using operator overloading, I thought it would be fun to flesh out the code and release it as a library. I'm calling the result pollywog and you can find the code on GitHub.
To install pollywog, you can use
pip or clone the GitHub repository:
$ pip install pollywog
In this post I'll show some examples of using pollywog.
SQLite is an embedded database, which means that instead of running as a separate server process, the actual database engine resides within the application. This makes it possible for the database to call directly into the application when it would be beneficial to add some low-level, application-specific functionality. SQLite provides numerous hooks for inserting user code and callbacks, and, through virtual tables, it is even possible to construct a completely user-defined table. By extending the SQL language with Python, it is often possible to express things more elegantly than if we were to perform calculations after the fact.
In this post I'll describe how to extend SQLite with Python, adding functions and aggregates that will be callable directly from any SQL queries you execute. We'll wrap up by looking at SQLite's virtual table mechanism and seeing how to expose a SQL interface over external data sources.
I recently read a good write-up on tree structures in PostgreSQL. Hierarchical data is notoriously tricky to model in a relational database, and a variety of techniques have grown out of developers' attempts to optimize for certain types of queries.
In his post, Graeme describes several approaches to modeling trees, including:
In the comments, some users pointed out that the ltree extension could also be used to efficiently store and query materialized paths. LTrees support two powerful query languages (lquery and ltxtquery) for pattern-matching LTree labels and performing full-text searches on labels.
One technique that was not discussed in Graeme's post was the use of closure tables. A closure table is a many-to-many junction table storing all relationships between nodes in a tree. It is related to the adjacency model, in that each database row still stores a reference to its parent row. The closure table gets its name from the additional table, which stores each combination of ancestor/child nodes.
For the past week or two I've been spending some of my spare time working on a web-based SQLite database browser. I thought this would be a useful project, because I've switched all my personal projects over to SQLite and foresee using it for pretty much everything. It also dovetailed with some work I'd been doing lately on peewee regarding reflection and code generation. So it seemed like some pretty good bang/buck, especially given my perception that there weren't many SQLite browsers out there (it turns out there are quite a few, however). I'm sharing it in the hopes that other devs (and non-devs?) find it useful.
In my last post, I wrote about how to work with encrypted SQLite databases with Python. As an example application of these libraries, I showed some code fragments for a fictional diary program. Because I was thinking the examples directory of the peewee repo was looking a little thin, I decided to flesh out the diary program and include it as an example.
In this post, I'll go over the diary code in the hopes that you may find it interesting or useful. The code shows how to use the peewee SQLCipher extension. I've also implemented a simple command-line menu loop. All told, the code is less than 100 lines!
SQLCipher, created by Zetetic, is an open-source library that provides transparent 256-bit AES encryption for your SQLite databases. SQLCipher is used by a large number of organizations, including Nasa, SalesForce, Xerox and more. The project is open-source and BSD licensed. Best of all, there are open-source python bindings.
In this post, I'll show how to compile SQLCipher and the pysqlcipher bindings, then use peewee ORM to work with an encrypted SQLite database.