Entries from 2015 « 2014 / all / by tag / popular / 2016 »

Querying the top N objects per group with Peewee ORM

photos/p1425417194.66.png

This post is a follow-up to my post about querying the top related item by group. In this post we'll go over ways to retrieve the top N related objects by group using the Peewee ORM. I've also presented the SQL and the underlying ideas behind the queries, so you can translate them to whatever ORM / query layer you are using.

Retrieving the top N per group is a pretty common task, for example:

  • Display my followers and their 10 most recent tweets.
  • In each of my inboxes, list the 5 most recent unread messages.
  • List the sections of the news site and the three latest stories in each.
  • List the five best sales in each department.

In this post we'll discuss the following types of solutions:

  • Solutions involving COUNT()
  • Solutions involving LIMIT
  • Window functions
  • Postgresql lateral joins

Read more...

Querying the top item by group with peewee ORM

photos/kitties-and-toys.jpg

In this post I'd like to share some techniques for querying the top item by group using the Peewee ORM. For example,

  • List the most recent tweet by each of my followers.
  • List the highest severity open bug for each of my open source projects.
  • List the latest story in each section of a news site.

This is a common task, but one that can be a little tricky to implement in a single SQL query. To add a twist, we won't use window functions or other special SQL constructs, since they aren't supported by SQLite. If you're interested in finding the top N items per group, check out this follow-up post.

Read more...

Naive Bayes Classifier using Python and Kyoto Cabinet

photos/p1422977174.11.png

In this post I will describe how to build a simple naive bayes classifier with Python and the Kyoto Cabinet key/value database. I'll begin with a short description of how a probabilistic classifier works, then we will implement a simple classifier and put it to use by writing a spam detector. The training and test data will come from the Enron spam/ham corpora, which contains several thousand emails that have been pre-categorized as spam or ham.

Read more...

Walrus: Lightweight Python utilities for working with Redis

photos/walrus-logo.png

A couple weekends ago I got it into my head that I would build a thin Python wrapper for working with Redis. Andy McCurdy's redis-py is a fantastic low-level client library with built-in support for connection-pooling and pipelining, but it does little more than provide an interface to Redis' built-in commands (and rightly so). I decided to build a project on top of redis-py that exposed pythonic containers for the Redis data-types. I went on to add a few extras, including a cache and a declarative model layer. The result is walrus.

Read more...