I sat down and started working on a new library shortly after posting about
Django's missing API for generating SQL.
is the result, and provides a simple
translate() function that will recursively
translate a Django model graph into a set of "peewee equivalents". The peewee
versions can then be used to construct queries which can be passed back into
Django as a "raw query".
Here are a couple scenarios when this might be useful:
I've included this module in peewee's playhouse, which is bundled with peewee.
I had the opportunity this week to write some fairly interesting SQL queries. I don't write "raw" SQL too often, so it was fun to use that part of my brain (by the way, does it bother anyone else when people call SQL "raw"?). At Counsyl we use Django for pretty much everything so naturally we also use the ORM. Every place I've worked there's a strong bias against using SQL when you've got an ORM on board, which makes sense -- if you choose a tool you should standardize on it if for no other reason than it makes maintenance easier.
So as I was saying, I had some pretty interesting queries to write and I struggled to think how to shoehorn them into Django's ORM. I've already written about some of the shortcomings of Django's ORM so I won't rehash those points. I'll just say that Django fell short and I found myself writing SQL. The queries I was working on joined models from very disparate parts of our codebase. The joins were on values that weren't necessarily foreign keys (think UUIDs) and this is something that Django just doesn't cope with. Additionally I was interested in aggregates on calculated values, and it seems like Django can only do aggregates on a single column.
As I was prototyping, I found several mistakes in my queries and decided to run them in the postgres shell before translating them into my code. I started to think that some of these errors could have been avoided if I could find an abstraction that sat between the ORM and a string of SQL. By leveraging the python interpreter, the obvious syntax errors could have been caught at module import time. By using composable data structures, methods I wrote that used similar table structures could have been more DRY. When I write less code, I think I generally write less bugs as well.
That got me started on my search for the "missing link" between SQL (represented as a string) and Django's ORM.
I recently heard a talk from a coworker wherein one of the things he discussed was automatically converting CSV data for use with a SQLite database. I thought this would be a great thing to add to peewee, especially as lately I've found myself on several occasions working with CSV and battling with it in a spreadsheet. It would be much easier to load it into a database and then query it using a tool I'm familiar with.
Which brings me to
playhouse.csv_loader, a new module I've added to the
playhouse package of extras. It's hopefully really easy to use. Here is an
example of how you might use it:
>>> from playhouse.csv_loader import * >>> db = SqliteDatabase(':memory:') # Create an in-memory sqlite database # Load the CSV file into the in-memory database and return a Model suitable # for querying the data. >>> ZipToTZ = load_csv(db, 'zipcode_to_timezone.csv') # Get the timezone for a zipcode. >>> ZipToTZ.get(ZipToTZ.zip == 66047).timezone 'US/Central' # Get all the zipcodes for my town. >>> [row.zip for row in ZipToTZ.select().where( ... (ZipToTZ.city == 'Lawrence') && (ZipToTZ.state == 'KS'))] [66044, 66045, 66046, 66047, 66049]
I saw an interesting post on reddit yesterday showing how user "iFargle" had customized the start page of Google Chrome to display his most commonly-used links. I had to have it! After spending some time customizing, here is what I came up with:
Counsyl is the first job I've worked that has a formal code review process. At first it was intimidating (it still is sometimes), but I really have been impressed how review leads to better code. I still make mistakes in my own code, and sometimes I miss bugs in other's code. Bugs are going to happen, though, so I won't spend time talking about how to write bug-free code. Instead I'll write about some things I've noticed that make the review process go more smoothly. I've seen that a lot of productivity and good will can be gained by how you approach the person whose code you're reviewing, and the person reviewing your code.
The past year or so has been a nice departure from my usual programming routine. Instead of churning out little sites and services, or hacking on my flask/django open-source libraries, I've been building lots of system/productivity tools. This has been extremely rewarding and I've acquired a new mindset about programming that I am very happy about.
As a programmer, it is second nature to think "if I need something, I will write it." Coding something from scratch always gives my ego a boost, and it can also be a great way to learn. The problem is that it can also be inefficient. Knowing where to draw the line, at what point my code stops and an existing tool begins is something I've been learning through experience.
I wasn't even ready to admit this desire to have it be "all mine" until I started thinking about writing a post on tool-building. And while that desire is probably always going to be there, by focusing so much on writing tools and things to enhance productivity, I've been forced to acknowledge the software of others and work to integrate with it. This had the unexpected result of changing the way I think about software (and led to some fun projects, too).
I remember spending hours when I was younger cycling through the various awesome color themes on my 386, in the glory days of windows 3.1. Remember hotdog stand?
Well, I haven't changed much. I still enjoy making tweaks to the colors and appearance of my desktop. In this post I'll talk about a script I wrote that makes it easy for me to modify all the various colors and configuration files which control the appearance of my desktop.
It's been roughly four years since my introduction to the Django framework and I thought I'd write a little post to commemorate this. In my mind nothing had as big an impact on my career as my decision to work at the Journal World. When I started there I knew basically nothing about software engineering or open-source, and it is entirely thanks to my excellent (and patient) coworkers there that I was able to learn about these things.
For a while I've been itching to rewrite Huey, and just last week released 0.4 which is an almost total rewrite. I initially started Huey for performing tasks like checking comments for spam, sending emails, generating thumbnails, and basically anything that would slow down the pagespeed on my sites. This is still what I see as the primary use-case for huey -- performing small tasks outside the request/response cycle and running jobs on a schedule (I have a site that scrapes the county sheriff's site and keeps a log of arrests in my town). The goal for the rewrite was not to change the purpose of Huey, rather it was to change the API.
The other day a friend of mine was trying out flask-peewee and he had some questions about the best way to structure his app to avoid triggering circular imports. For someone new to flask, this can be a bit of a puzzler, especially if you're coming from django which automatically imports your modules. In this post I'll walk through how I like to structure my flask apps to avoid circular imports. In my examples I'll be showing how to use "flask-peewee", but the same technique should be applicable for other flask plugins.
I'll walk through the modules I commonly use in my apps, then show how to tie them all together and provide a single entrypoint into your app.
I had fun writing about my "cd" helper, so I thought I'd share another productivity helper I wrote for setting my wallpaper. It's a little silly, but I insist on my wallpaper being used for my lockscreen and my login window as well -- that way the entire time I'm on my computer the background is "seamless". Before I wrote this script it used to take me probably 3 or 4 minutes to change wallpapers!
My password "system" used to be that I had three different passwords, all of which were variations on the same theme. I maintained a list of sites I had accounts on and for each site gave a hint which of the three passwords I used. What a terrible scheme.
A couple weeks ago I decided to do something about it. I wanted, above all, to only have to remember a single password. Being lately security-conscious, I also recognized the need for a unique password on every site.
In this post I'll show how I used python to create a password management system that allows me to use a single "master" password to generate unique passwords for all the sites and services I use.
cd a lot, I'm no exception. Because I use virtualenvs for my
python projects, I'm often "cutting" through several layers of crap to get to
what I actually want to edit. This was a good opportunity for a helper script!
The two biggest annoyances I was trying to alleviate were:
cd ../../some-other-dir/foo/. It would be nice to just type the part that matters and not the whole thing.
The solution I came up with stores directories I use (the entire path), and then I can perform a search of that history using a partial path.
My Raspberry Pi got a new case this weekend:
In this post I'd like to talk about some of the shortcomings of the Django ORM, the ways peewee approaches things differently, and how this resulted in peewee having an API that is both more consistent and more expressive.
So, for linux folks out there, here is a little wrapper around scrot, a linux screenshot utility. It will allow you to capture the full screen, the current window, or free-select a region, then take the resulting image and put it in your dropbox folder or upload it to Imgur:
#!/usr/bin/env python import base64 import json import optparse import os import subprocess import sys import time import urllib import urllib2 BINARY = 'scrot' HOME = os.environ['HOME'] # Imgur API -- register your app and paste the client id and secret: CLIENT_ID = '' CLIENT_SECRET = '' # Location of your dropbox folder and your dropbox user id: DROPBOX_DIR = os.path.join(HOME, 'Dropbox/Public/screens/') DROPBOX_URL_TEMPLATE = 'http://dl.dropbox.com/u/%s/screens/%s' DROPBOX_UID = '' def upload_file(filename): with open(filename, 'rb') as fh: contents = fh.read() payload = urllib.urlencode(( ('image', base64.b64encode(contents)), ('key', CLIENT_SECRET), )) request = urllib2.Request('https://api.imgur.com/3/image', payload) request.add_header('Authorization', 'Client-ID ' + CLIENT_ID) try: resp = urllib2.urlopen(request) except urllib2.HTTPError, exc: return False, 'Returned status: %s' % exc.code except urllib2.URLError, exc: return False, exc.reason resp_data = resp.read() try: resp_json = json.loads(resp_data) except ValueError: return False, 'Error decoding response: %s' % resp_data if resp_json['success']: return True, resp_json['data']['link'] return False, 'Imgur failure: %s' % resp_data def get_parser(): parser = optparse.OptionParser('Screenshot helper') parser.add_option('-s', '--select', action='store_true', default=True, dest='select', help='Select region to capture') parser.add_option('-f', '--full', action='store_true', dest='full', help='Capture entire screen') parser.add_option('-c', '--current', action='store_true', dest='current', help='Capture currently selected window') parser.add_option('-d', '--delay', default=0, dest='delay', type='int', help='Seconds to wait before capture') parser.add_option('-p', '--public', action='store_true', dest='dropbox', help='Store in dropbox public folder') parser.add_option('-x', '--no-upload', action='store_false', default=True, dest='upload', help='Do not upload to imgur') parser.add_option('-k', '--keep-local', action='store_true', default=False, dest='keep', help='Keep local copy after upload') return parser def get_scrot_command(filename, options): args = [BINARY] if options.current: args.append('-u') elif not options.full: args.append('-s') if options.delay: args.append('-d %s' % options.delay) args.append(dest) return args if __name__ == '__main__': parser = get_parser() options, args = parser.parse_args() filename = 's%s.png' % time.time() if options.dropbox: dest = os.path.join(DROPBOX_DIR, filename) else: dest = os.path.join(HOME, 'tmp', filename) if not options.current and not options.full: print 'Select a region to capture...' scrot_args = get_scrot_command(dest, options) p = subprocess.Popen(scrot_args) p.wait() if options.dropbox: print DROPBOX_URL_TEMPLATE % (DROPBOX_UID, filename) if options.upload: success, res = upload_file(dest) if not success: print 'Error uploading image: %s' % res print 'Image stored in: %s' % dest sys.exit(1) else: if not options.keep: os.unlink(dest) print res else: print dest
I'm working on a little photography website for my Dad and thought it would be neat to extract color information from photographs. I tried a couple of different approaches before finding one that works pretty well. This approach uses k-means clustering to cluster the pixels in groups based on their color. The center of those resulting clusters are then the "dominant" colors. k-means is a great fit for this problem because it is (usually) fast.
I have written a documentation page on upgrading which gives the rationale behind the rewrite and some examples of the new querying API. Please feel free to take a look but much of the information presented in this post is lifted directly from the docs.
The biggest changes between 1.0 and 2.0 are in the syntax used for constructing queries. The first iteration of peewee I threw up on github was about 600 lines. I was passing around strings and dictionaries and as time went on and I added features, those strings turned into tuples and objects. This meant, though, that I needed code to handle all the possible ways of expressing something. Look at the code for parse_select.
I learned a valuable lesson: keep data in datastructures until the absolute last second.
With the benefit of hindsight and experience, I decided to rewrite and unify the API a bit. The result is a tradeoff. The newer syntax may be a bit more verbose at times, but at least it will be consistent.
About two months ago I became the proud owner of a 2004 Yamaha R6! Previously I had been riding an '02 Honda Shadow (pic) and the change has been a revelation.
The other day I noticed I had a couple thumbdrives kicking around with various versions of my "absolutely do not lose" files...stuff like my private keys, tax documents, zips of papers I wrote in college, etc. These USB drives were all over the house, and many contained duplicate versions of the same files. I thought it would be neat to write a little app to give me a web-based interface to store and manage these files securely. In this post I'll talk about how I built a web-based file storage app using flask, pycrypto, and amazon S3.
I wrote a little python IRC bot library a while back. It comes with a few silly examples like a google search bot, an ASCII art bot, even an example botnet. Today I was lurking around in a channel with a bunch of other local developers and noticed that we often are pasting links of images to "contextualize" things other folks have said.
I decided to port my favorite vim theme, candycode, to gedit.
I think it would be great if more sites allowed users (or consumers of their APIs) to produce and execute ad-hoc queries against their data. In this post I'll talk a little bit about some ways sites are currently doing this, some of the challenges involved, my experience trying to build something "reusable", and finally invite you to share your thoughts.
The other day I was poking around my google analytics account and thought it would be a fun project to see if I could collect "analytics"-type data myself. I recalled that the Apache Cassandra project was supposed to use a data model similar to Google's BigTable so I decided to use it for this project. The BigTable data model turned out to be a good fit for this project once I got over some of the intricacies of dealing with time-series data in Cassandra. In this post I'll talk about how I went about modelling, collecting, and finally analyzing basic page-view data I collected from this very blog.
As of Oracle's 11gR2 release of BerkeleyDB, the library has included a SQL API which is fully compatible with SQLite3. BerkeleyDB can even be compiled with support for SQLite's full-text search and r-tree extensions. There's a good whitepaper that Oracle published detailing the performance of their implementation versus SQLite. To summarize, since Berkeley supports page-level locking as opposed to database-level locking, it can push quite a few more transactions per second, making it a better fit for write-heavy applications (the one area sqlite suffers, IMO). Additionally, Berkeley makes fewer syscalls due to the way it blocks as opposed to SQLite's use of busy-locks.
This post is partly for me so I remember how I did this, and partly for anyone else interested in trying out Berkeley's SQL support. plug: I enjoy using SQLite for small projects, if you're interested you might check out peewee, a lightweight ORM with some fun SQLite extensions.
In this post I'll present how I built a (reasonably) powerful autocomplete engine with Redis and python. For those who are not familiar with Redis, it is a fast, in-memory, single-threaded database that is capable of storing structured data (lists, hashes, sets, sorted sets). I chose Redis for this particular project because its sorted set data type, which is a good fit for autocomplete. The engine I'll describe relies heavily on Redis' sorted sets and its set operations, but can easily be translated to a pure-python solution (links at bottom of post).
I've developed an interest in some of the more advanced features of SQLite 3.7 after reading the O'Reilly title Using SQLite (Small. Fast. Reliable. Choose Any Three). For personal projects I like using SQLite for prototyping or for simple applications, and when I need something more powerful I turn to Postgresql. Because peewee supports both of these databases (as well as MySQL), it is limited to a lowest-common-denominator feature set. While this encompasses a broad range of features, each database engine has its own extensions and I've been interested in adding some pythonic support for the cooler extensions.
Currently, I've got support for:
This post will show the usage of the hstore and full-text search extensions. I will also show how I went about writing these extension modules so if you're interested in writing your own you will have a good foundation.
All of these extensions live in the
playhouse package, included with the current
master branch of peewee.
To follow along at home, feel free to install peewee:
pip install -e git+https://github.com/coleifer/peewee.git#egg=peewee
In this post I want to discuss how to work around some of the shortcomings of djangos ORM when dealing with Generic Foreign Keys (GFKs).
At the end of the post I'll show how to work around django's lack of correctly CAST-ing when the generic foreign key is of a different column type than the objects it may point to.
A while ago I wrote about an awesome API for retrieving metadata about URLs called oembed. I'm writing to announce a new project I've been working on called micawber, which is very similar but with a cleaner API and not restricted to django projects.
I recently rewrote my personal site using flask and peewee, breaking a good amount of stuff in the process. I was trying to track down the errors by tailing log files, but that didn't help alert me to new errors that someone visiting the site might stir up. I thought about setting up error emails a-la django, which is a tried and true method...but then I happened on a different approach. I won't say it's the most elegant solution, but it was a quick hack and the results have been awesome. I wrote a custom logging handler that pushes JSON-encoded log record data to a redis pub/sub channel. I then have an IRC bot that subscribes to this channel and when it receives a message generates a paste of the traceback and pings me with a link to the traceback.
Sometimes I want to push a file on my harddrive to S3 for safe keeping. I wrote a little script for nautilus which appears in the context menu to push files to a specific S3 bucket.
For fun I put together a small script that is capable of introspecting databases and generating peewee models. I borrowed the crucial bits from django's codebase, which has methods for introspecting column types and foreign key constraints.
The code is hopefully rather straightforward - it simply grabs the list of tables, column type information which is then mapped to peewee field types, then finally resolves foreign keys. The generated models are then dumped to standard out, along with a database declaration.
After two years of maintaining djangosnippets.org, I am pleased to announce that the guys from django-de are going to be taking over and you can expect to see some real improvements.
At my job we've been doing a quarterly hackday for almost a year now. My coworkers have made some amazing stuff, and its nice to have an entire day dedicated to hacking on ... well, whatever you want. Tomorrow marks the 4th hackday and I need to scrounge up a good project, but in the meantime I thought I'd write a post about what I did last time around -- a lightweight python task queue that has an API similar to celery.
The goal of the project was to keep it simple while not skimping on features. At the moment the project does the following:
Backend storages implement a simple API, currently the only implementation uses Redis but adding one that uses the database would be a snap.
The other main goal of the project was to have it work easily for any python application (I've been into using flask lately), but come with baked-in support for django. Because of django's centralized configuration and conventions for loading modules, the django API is simpler than the python one, but hopefully both are reasonably straightforward.
As an IRC bot enthusiast and tinkerer, I would like to describe the most enduring and popular bot I've written, a markov-chain bot. Markov chains can be used to generate realistic text, and so are great fodder for IRC bots. The bot I am writing of has been hanging out in my town's channel for the past year or so and has amassed a pretty awesome corpus from which it generates messages. Here are few of his greatest hits:
For a change, I've been doing all of my new app development using flask, a python web framework built atop the werkzeug WSGI toolkit. Having used django for the last two years it's been fun to do something different, but at the same time stick with python.
In this post I'd like to show a couple of the small projects I've written using flask over the past few weeks.
Recently I stumbled across the twitter bootstrap project, which is a set of cross-browser compliant stylesheets and scripts. I liked them so much that I've ported the admin templates to use bootstrap. Here's a little screenshot of the design refresh taken from the example app:
I hope this will make the admin easier to work with in the long-run!
I'd like to write a post about a project I've been working on for the past month or so. I've had a great time working on it and am excited to start putting it to use. The project is called flask-peewee -- it is a set of utilities that bridges the python microframework flask and the lightweight ORM peewee. It is packaged as a flask extension and comes with the following batteries included:
Over the past month I've been working on adding support for both MySQL and PostgreSQL to peewee. I'm happy to say that after a couple weekend hack sessions all tests are now passing.
One of the problems mentioned by a couple people when I asked for suggestions on improving djangosnippets.org was the proliferation of tags. This is a well-known problem on sites that allow users to enter their own tags, where misspellings are frequent and its sometimes unclear whether a tag should be plural or singular.
It's been a while since I first wrote about setting up Solr on Ubuntu. Since then I've opted for a different approach that is both simpler and lighter-weight. This post describes briefly the steps to setting up Solr on Ubuntu.
I also added some new reference documentation which describes succinctly how to do basic configuration and querying with peewee.
As of this week we instituted a regular "hackday" at my office -- anything goes, you can work on whatever you like, so at 11:30 the night before the hackday started I decided on writing a simple IRC-powered botnet.
I'm writing this post to introduce a new project I've released, django-generic-m2m, which as its name would indicate is a generic ManyToMany implementation for django models. The goal of this project was to provide a uniform API for both creating and querying generically-related content in a flexible manner. One use-case for this project would be creating semantic "tags" between diverse objects in the database.
One of the nicest UI's around when dealing with a large dataset is a good autocomplete. Facebook's search is a great example, same for Netflix, and recently Google launched "Google Instant", which returns search results as you type. Autocomplete can really complement hierarchical drill-down search (which is useful for discovery), as the goal of autocomplete is more for helping users find something they already know about with a minimum of effort.
For the past month or so I've been working on writing my own ORM in Python. The project grew out of a need for a lightweight persistence layer for use in Flask web apps. As I've grown so familiar with the Django ORM over the past year, many of the ideas in Peewee are analagous to the concepts in Django. My goal from the beginning has been to keep the implementation simple without sacrificing functionality, and to ultimately create something hackable that others might be able to read and contribute to.