Blog Entries all / by tag / by year / popular

The search for the missing link: what lies between SQL and Django's ORM?

I had the opportunity this week to write some fairly interesting SQL queries. I don't write "raw" SQL too often, so it was fun to use that part of my brain (by the way, does it bother anyone else when people call SQL "raw"?). At Counsyl we use Django for pretty much everything so naturally we also use the ORM. Every place I've worked there's a strong bias against using SQL when you've got an ORM on board, which makes sense -- if you choose a tool you should standardize on it if for no other reason than it makes maintenance easier.

So as I was saying, I had some pretty interesting queries to write and I struggled to think how to shoehorn them into Django's ORM. I've already written about some of the shortcomings of Django's ORM so I won't rehash those points. I'll just say that Django fell short and I found myself writing SQL. The queries I was working on joined models from very disparate parts of our codebase. The joins were on values that weren't necessarily foreign keys (think UUIDs) and this is something that Django just doesn't cope with. Additionally I was interested in aggregates on calculated values, and it seems like Django can only do aggregates on a single column.

As I was prototyping, I found several mistakes in my queries and decided to run them in the postgres shell before translating them into my code. I started to think that some of these errors could have been avoided if I could find an abstraction that sat between the ORM and a string of SQL. By leveraging the python interpreter, the obvious syntax errors could have been caught at module import time. By using composable data structures, methods I wrote that used similar table structures could have been more DRY. When I write less code, I think I generally write less bugs as well.

That got me started on my search for the "missing link" between SQL (represented as a string) and Django's ORM.

Read more...

Using python to generate awesome linux desktop themes

I remember spending hours when I was younger cycling through the various awesome color themes on my 386, in the glory days of windows 3.1. Remember hotdog stand?

Hotdog Stand

Well, I haven't changed much. I still enjoy making tweaks to the colors and appearance of my desktop. In this post I'll talk about a script I wrote that makes it easy for me to modify all the various colors and configuration files which control the appearance of my desktop.

Read more...

Structuring flask apps, a how-to for those coming from Django

The other day a friend of mine was trying out flask-peewee and he had some questions about the best way to structure his app to avoid triggering circular imports. For someone new to flask, this can be a bit of a puzzler, especially if you're coming from django which automatically imports your modules. In this post I'll walk through how I like to structure my flask apps to avoid circular imports. In my examples I'll be showing how to use "flask-peewee", but the same technique should be applicable for other flask plugins.

I'll walk through the modules I commonly use in my apps, then show how to tie them all together and provide a single entrypoint into your app.

Read more...

Creating a personal password manager

My password "system" used to be that I had three different passwords, all of which were variations on the same theme. I maintained a list of sites I had accounts on and for each site gave a hint which of the three passwords I used. What a terrible scheme.

A couple weeks ago I decided to do something about it. I wanted, above all, to only have to remember a single password. Being lately security-conscious, I also recognized the need for a unique password on every site.

In this post I'll show how I used python to create a password management system that allows me to use a single "master" password to generate unique passwords for all the sites and services I use.

Read more...

Shortcomings in the Django ORM and a look at Peewee, a lightweight alternative

In this post I'd like to talk about some of the shortcomings of the Django ORM, the ways peewee approaches things differently, and how this resulted in peewee having an API that is both more consistent and more expressive.

Read more...

Using python and k-means to find the dominant colors in images

I'm working on a little photography website for my Dad and thought it would be neat to extract color information from photographs. I tried a couple of different approaches before finding one that works pretty well. This approach uses k-means clustering to cluster the pixels in groups based on their color. The center of those resulting clusters are then the "dominant" colors. k-means is a great fit for this problem because it is (usually) fast.

Here's an example

Akira motorcycles

The results:

                                

Read more...

Experimenting with an analytics web-service using python and cassandra

The other day I was poking around my google analytics account and thought it would be a fun project to see if I could collect "analytics"-type data myself. I recalled that the Apache Cassandra project was supposed to use a data model similar to Google's BigTable so I decided to use it for this project. The BigTable data model turned out to be a good fit for this project once I got over some of the intricacies of dealing with time-series data in Cassandra. In this post I'll talk about how I went about modelling, collecting, and finally analyzing basic page-view data I collected from this very blog.

Read more...

Powerful autocomplete with Redis in under 200 lines of Python

In this post I'll present how I built a (reasonably) powerful autocomplete engine with Redis and python. For those who are not familiar with Redis, it is a fast, in-memory, single-threaded database that is capable of storing structured data (lists, hashes, sets, sorted sets). I chose Redis for this particular project because its sorted set data type, which is a good fit for autocomplete. The engine I'll describe relies heavily on Redis' sorted sets and its set operations, but can easily be translated to a pure-python solution (links at bottom of post).

Update

Redis-completion is now deprecated. The autocomplete functionality, along with a number of other features, have been integrated into a new project walrus. Check out the walrus blog announcement for more details.

Read more...

Working around Django's ORM to do interesting things with GFKs

In this post I want to discuss how to work around some of the shortcomings of djangos ORM when dealing with Generic Foreign Keys (GFKs).

At the end of the post I'll show how to work around django's lack of correctly CAST-ing when the generic foreign key is of a different column type than the objects it may point to.

Read more...

Micawber, a python library for extracting rich content from URLs

photos/micawber-logo-0.png

OEmbed is a simple, open API standard for embedding rich content and retrieving content metadata. The way OEmbed works is actually kind of ingenious, because the only things a consumer of the API needs to know are the location of the OEmbed endpoint, and the URL to the piece of content they want to embed.

YouTube, for example, maintains an OEmbed endpoint at youtube.com/oembed. Using the OEmbed endpoint, we can very easily retrieve the HTML for an embedded video player along with metadata about the clip:

GET https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=nda_OSWeyn8

Response:

{
  "provider_url": "https://www.youtube.com/", 
  "title": "Leprechaun in Mobile, Alabama", 
  "type": "video", 
  "html": "<iframe width=\"459\" height=\"344\" src=\"https://www.youtube.com/embed/nda_OSWeyn8?feature=oembed\" frameborder=\"0\" allowfullscreen></iframe>", 
  "thumbnail_width": 480, 
  "height": 344, 
  "width": 459, 
  "version": "1.0", 
  "author_name": "botmib", 
  "thumbnail_height": 360, 
  "thumbnail_url": "https://i.ytimg.com/vi/nda_OSWeyn8/hqdefault.jpg", 
  "provider_name": "YouTube", 
  "author_url": "https://www.youtube.com/user/botmib"
}

The oembed spec defines four types of content along with a number of required attributes for each content type. This makes it a snap for consumers to use a single interface for handling things like:

  • youtube videos
  • flickr photos
  • hulu videos
  • slideshare decks
  • and many more

Read more...