Entries from 2012 « 2011 / all / by tag / popular / 2013 »

Shortcomings in the Django ORM and a look at Peewee, a lightweight alternative

In this post I'd like to talk about some of the shortcomings of the Django ORM, the ways peewee approaches things differently, and how this resulted in peewee having an API that is both more consistent and more expressive.

Read more...

Using python and k-means to find the dominant colors in images

I'm working on a little photography website for my Dad and thought it would be neat to extract color information from photographs. I tried a couple of different approaches before finding one that works pretty well. This approach uses k-means clustering to cluster the pixels in groups based on their color. The center of those resulting clusters are then the "dominant" colors. k-means is a great fit for this problem because it is (usually) fast.

Here's an example

Akira motorcycles

The results:

                                

Read more...

Experimenting with an analytics web-service using python and cassandra

The other day I was poking around my google analytics account and thought it would be a fun project to see if I could collect "analytics"-type data myself. I recalled that the Apache Cassandra project was supposed to use a data model similar to Google's BigTable so I decided to use it for this project. The BigTable data model turned out to be a good fit for this project once I got over some of the intricacies of dealing with time-series data in Cassandra. In this post I'll talk about how I went about modelling, collecting, and finally analyzing basic page-view data I collected from this very blog.

Read more...

Powerful autocomplete with Redis in under 200 lines of Python

In this post I'll present how I built a (reasonably) powerful autocomplete engine with Redis and python. For those who are not familiar with Redis, it is a fast, in-memory, single-threaded database that is capable of storing structured data (lists, hashes, sets, sorted sets). I chose Redis for this particular project because its sorted set data type, which is a good fit for autocomplete. The engine I'll describe relies heavily on Redis' sorted sets and its set operations, but can easily be translated to a pure-python solution (links at bottom of post).

Update

Redis-completion is now deprecated. The autocomplete functionality, along with a number of other features, have been integrated into a new project walrus. Check out the walrus blog announcement for more details.

Read more...

Working around Django's ORM to do interesting things with GFKs

In this post I want to discuss how to work around some of the shortcomings of djangos ORM when dealing with Generic Foreign Keys (GFKs).

At the end of the post I'll show how to work around django's lack of correctly CAST-ing when the generic foreign key is of a different column type than the objects it may point to.

Read more...

Micawber, a python library for extracting rich content from URLs

photos/micawber-logo-0.png

OEmbed is a simple, open API standard for embedding rich content and retrieving content metadata. The way OEmbed works is actually kind of ingenious, because the only things a consumer of the API needs to know are the location of the OEmbed endpoint, and the URL to the piece of content they want to embed.

YouTube, for example, maintains an OEmbed endpoint at youtube.com/oembed. Using the OEmbed endpoint, we can very easily retrieve the HTML for an embedded video player along with metadata about the clip:

GET https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=nda_OSWeyn8

Response:

{
  "provider_url": "https://www.youtube.com/", 
  "title": "Leprechaun in Mobile, Alabama", 
  "type": "video", 
  "html": "<iframe width=\"459\" height=\"344\" src=\"https://www.youtube.com/embed/nda_OSWeyn8?feature=oembed\" frameborder=\"0\" allowfullscreen></iframe>", 
  "thumbnail_width": 480, 
  "height": 344, 
  "width": 459, 
  "version": "1.0", 
  "author_name": "botmib", 
  "thumbnail_height": 360, 
  "thumbnail_url": "https://i.ytimg.com/vi/nda_OSWeyn8/hqdefault.jpg", 
  "provider_name": "YouTube", 
  "author_url": "https://www.youtube.com/user/botmib"
}

The oembed spec defines four types of content along with a number of required attributes for each content type. This makes it a snap for consumers to use a single interface for handling things like:

  • youtube videos
  • flickr photos
  • hulu videos
  • slideshare decks
  • and many more

Read more...

Building a bookmarking service with python and phantomjs

Using python and phantomjs, a headless webkit browser, it is a snap to build a self-hosted bookmarking service that can capture images of entire pages. Combine this with a simple javascript bookmarklet and you end up with a really convenient way of storing bookmarks. The purpose of this post will be to walk through the steps to getting a simple bookmarking service up and running.

Bookmark app in playground

Read more...

Huey, a lightweight task queue for python

At my job we've been doing a quarterly hackday for almost a year now. My coworkers have made some amazing stuff, and its nice to have an entire day dedicated to hacking on ... well, whatever you want. Tomorrow marks the 4th hackday and I need to scrounge up a good project, but in the meantime I thought I'd write a post about what I did last time around -- a lightweight python task queue that has an API similar to celery.

I've called it huey (which also turns out to be the name of my kitten).

Design goals

The goal of the project was to keep it simple while not skimping on features. At the moment the project does the following:

Backend storages implement a simple API, currently the only implementation uses Redis but adding one that uses the database would be a snap.

The other main goal of the project was to have it work easily for any python application (I've been into using flask lately), but come with baked-in support for django. Because of django's centralized configuration and conventions for loading modules, the django API is simpler than the python one, but hopefully both are reasonably straightforward.

Read more...

Building a markov-chain IRC bot with python and Redis

As an IRC bot enthusiast and tinkerer, I would like to describe the most enduring and popular bot I've written, a markov-chain bot. Markov chains can be used to generate realistic text, and so are great fodder for IRC bots. The bot I am writing of has been hanging out in my town's channel for the past year or so and has amassed a pretty awesome corpus from which it generates messages. Here are few of his greatest hits:

Read more...