Entries tagged with sql
In this post I'll describe how to implement tagging with a relational database. What I mean by tagging are those little labels you see at the top of this blog post, which indicate how I've chosen to categorize the content. There are many ways to solve this problem, and I'll try to describe some of the more popular methods, as well as one unconventional approach using bitmaps. In each section I'll describe the database schema, try to list the benefits and drawbacks, and present example queries. I will use Peewee ORM for the example code, but hopefully these examples will easily translate to your tool-of-choice.
This post is a follow-up to my post about querying the top related item by group. In this post we'll go over ways to retrieve the top N related objects by group using the Peewee ORM. I've also presented the SQL and the underlying ideas behind the queries, so you can translate them to whatever ORM / query layer you are using.
Retrieving the top N per group is a pretty common task, for example:
- Display my followers and their 10 most recent tweets.
- In each of my inboxes, list the 5 most recent unread messages.
- List the sections of the news site and the three latest stories in each.
- List the five best sales in each department.
In this post we'll discuss the following types of solutions:
- Solutions involving
- Solutions involving
- Window functions
- Postgresql lateral joins
In this post I'd like to share some techniques for querying the top item by group using the Peewee ORM. For example,
- List the most recent tweet by each of my followers.
- List the highest severity open bug for each of my open source projects.
- List the latest story in each section of a news site.
This is a common task, but one that can be a little tricky to implement in a single SQL query. To add a twist, we won't use window functions or other special SQL constructs, since they aren't supported by SQLite. If you're interested in finding the top N items per group, check out this follow-up post.
In case you've missed the last few releases, I've been busy adding some fun new features to peewee. While the changelog and the docs explain the new features and describe their usage, I thought I'd write a blog post to provide a bit more context.
Most of these features were requested by peewee users. I depend heavily on users like you to help me improve peewee, so thank you very much! Not only have your feature requests helped make peewee a better library, they've helped me become a better programmer.
So what's new in peewee? Here is something of an overview:
- Window functions.
- CASE statements.
- Savepoints for nested transactions.
- Array and JSON fields with Postgresql.
- Union, Intersect and Except compound queries.
Hopefully some of those things sound interesting. In this post I will not be discussing everything, but will hit some of the highlights.
I sat down and started working on a new library shortly after posting about
Django's missing API for generating SQL.
is the result, and provides a simple
translate() function that will recursively
translate a Django model graph into a set of "peewee equivalents". The peewee
versions can then be used to construct queries which can be passed back into
Django as a "raw query".
Here are a couple scenarios when this might be useful:
- Joining on fields that are not related by foreign key (for example UUID fields).
- Performing filters on calculated values.
- Performing aggregate queries on calculated values.
- Using SQL statements that Django does not support such as
- Utilizing SQL functions that Django does not support, such as
- Replacing nearly-identical SQL queries with reusable, composable data-structures.
I've included this module in peewee's playhouse, which is bundled with peewee.
I had the opportunity this week to write some fairly interesting SQL queries. I don't write "raw" SQL too often, so it was fun to use that part of my brain (by the way, does it bother anyone else when people call SQL "raw"?). At Counsyl we use Django for pretty much everything so naturally we also use the ORM. Every place I've worked there's a strong bias against using SQL when you've got an ORM on board, which makes sense -- if you choose a tool you should standardize on it if for no other reason than it makes maintenance easier.
So as I was saying, I had some pretty interesting queries to write and I struggled to think how to shoehorn them into Django's ORM. I've already written about some of the shortcomings of Django's ORM so I won't rehash those points. I'll just say that Django fell short and I found myself writing SQL. The queries I was working on joined models from very disparate parts of our codebase. The joins were on values that weren't necessarily foreign keys (think UUIDs) and this is something that Django just doesn't cope with. Additionally I was interested in aggregates on calculated values, and it seems like Django can only do aggregates on a single column.
As I was prototyping, I found several mistakes in my queries and decided to run them in the postgres shell before translating them into my code. I started to think that some of these errors could have been avoided if I could find an abstraction that sat between the ORM and a string of SQL. By leveraging the python interpreter, the obvious syntax errors could have been caught at module import time. By using composable data structures, methods I wrote that used similar table structures could have been more DRY. When I write less code, I think I generally write less bugs as well.
That got me started on my search for the "missing link" between SQL (represented as a string) and Django's ORM.
I think it would be great if more sites allowed users (or consumers of their APIs) to produce and execute ad-hoc queries against their data. In this post I'll talk a little bit about some ways sites are currently doing this, some of the challenges involved, my experience trying to build something "reusable", and finally invite you to share your thoughts.