A co-worker of mine mentioned that he missed Ruby's syntactic sugar for regular expressions. I haven't used Ruby's regular expressions, but I'm familiar enough with Python's to know that the API is a bit wanting in syntactic sweetness.
In this post I'll show how you might use python's magic methods to make a nicer API for working with regular expressions.
SQLite is a fantastic database and in this post I'd like to explain why I think that, for many scenarios, SQLite is actually a great choice. I hope to also clear up some common misconceptions about SQLite.
I was reorganizing some folders on my laptop and ran across some really old code I'd written. I knew the code was there, but I hadn't looked at it in years and thought it would be fun to take a peek, so I created a WindowsXP virtual machine and fiddled around trying to get the various programs to run.
I've spent some time over the past couple weeks playing with the embedded NoSQL databases Vedis and UnQLite. Vedis, as its name might indicate, is an embedded data-structure database modeled after Redis. UnQLite is a JSON document store (like MongoDB, I guess??). Beneath the higher-level APIs, both Vedis and UnQLite are key/value stores, which puts them in the same category as BerkeleyDB, KyotoCabinet and LevelDB. The Python standard library also includes some dbm-style databases, including gdbm.
For fun, I thought I would put together a completely un-scientific benchmark showing the relative speeds of these various databases for storing and retrieving simple keys and values.
Here are the databases and drivers that I used for the test:
I'm running these tests with:
For the test, I simply recorded the time it took to store 100K simple key/value pairs (no collisions). Then I recorded the time it took to read back all these values. The results are in seconds elapsed:
I'm happy to write that I've just released some python bindings for UnQLite, an embedded NoSQL database and JSON document store. UnQLite might be characterized as the SQLite of NoSQL databases, though it's JSON document-store and Jx9 scripting language make it a pretty unique offering. UnQLite is created by Symisc Systems, who are also responsible for Vedis, an embedded Redis-like database (I also wrote some python bindings for vedis). Here is a quick overview of some of UnQLite's features, as described on the project homepage:
In the rest of this post I will show some basic usage of the unqlite-python library. If you'd like to follow along, you can use
pip to install
pip install unqlite
Read on for the details!
Over the past week I've been writing some python bindings to the embedded NoSQL database Vedis, a transactional data-store modeled after Redis. Like Redis, Vedis could be characterized as an advanced key-value store that supports hash, set and list data-structures. Vedis has over 70 available commands for working with the various data types. Unlike Redis, which is run as a separate server process, Vedis is embedded in the host process like SQLite. Vedis works with either in-memory databases or on-disk databases. Vedis is transactional (ACID) and also thread-safe. If you'd like more information, check out the Vedis FAQ.
Vedis-python allows you to use Vedis in your Python apps. Vedis-python supports all the Vedis data-types, and also allows you extend Vedis by writing your own commands in Python. As I mentioned, this project is very new so while I have written pretty extensive unit tests, the library has certainly not been battle-tested yet.
If you'd like to give it a try, you can use
pip to install
vedis-python. At the time
of writing the current version is 0.1.5.
$ pip install vedis-python
Just a word of caution, I've tested the installation on various flavors of Linux (including on my raspberry pi), and Mac OSX, but have not tested on Windows.
Read the rest of the post for the details.
I'd like to share a little command-line utility I wrote for managing multi-file and multi-directory private gists on GitHub. If you're not familiar with GitHub Gist, it's basically a git-backed pastebin. One of the benefits of Gist is that it supports private gists for free, allowing you to create private repos for your code snippets. To prevent abuse, GitHub does not allow you to create gists containing subdirectories.
Then I had a lightbulb moment -- why not write a script to do all this automatically?
Read the rest of the post for the details.
Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.
If you're a software engineer and you have not yet read The Mythical Man-Month, go get it now.
My post from last month, Saturday Morning Hack, a Little Note-Taking App with Flask, was pretty well-received. Since I've made a number of improvements to the app, I thought I would write one more post to share some of the updates I've made to this project, in the hopes that they may be of interest to you.
A live demo is up and running on Python Anywhere, so feel free to check that out before continuing on with the post: http://beetlejuicer.pythonanywhere.com/
And this is how it looks now!
So what's new? Well, I've made a couple changes under-the-hood, and added some entirely new features to the UI.
Notemodel. Thanks to flask-peewee everything comes "for free".
Read the rest of the post for the details.
Coroutines are an interesting feature of the Python language that I have, in practice, found very little occasion to use. Then one day I was stuck working on a problem and I realized suddenly that coroutines were exactly the thing I needed. In this post I'll describe how coroutines helped me to solve a tricky problem.
I use Flask and peewee for all my personal projects, and wanted an easy way to automatically open an IPython shell with all my project's models in the namespace. If you've used the excellent django-extensions project, you may be familiar with the shell_plus command, which does the same thing. If your project has multiple models spread across several modules, this kind of hack can save you a lot of keystrokes.
In this post I will show how to use SQLite full-text search with Python (and a lot of help from peewee ORM). We will see how to index content for searching, and how to order search results using two ranking algorithms.
Last week I migrated my site from Postgresql to SQLite. I had been using Redis to power my site's search, but since SQLite has an awesome full-text search extension, I decided to give it a try. I am really pleased with the results, and being able to specify boolean search queries is an added plus. Here is a brief overview of the types of search queries SQLite supports:
NEAR: peewee NEAR sqlite would return docs containing the words peewee and sqlite with no more than
10intervening words. You can also specify the max number of intervening words, e.g. peewee NEAR/3 sqlite.
NOT: sqlite OR postgresql AND NOT mysql would return docs about high-quality databases (just trollin).
Check out the full post for details on adding full-text search to your project.
Because I had so much fun writing my last Saturday morning hack, I thought I would share another little hack. I was thinking that I really enjoy my subscription to Python weekly and wouldn't it be great if I had a personal email digest containing just the types of things that interest me? I regularly cruise reddit and
hacker hater news but in my opinion there's a pretty low signal-to-noise ratio. Occasionally I stumble on fascinating content and that's what keeps me coming back.
I wanted to write an app that would search the sites I read and automatically create an email digest based on search terms that I specified. I recently swapped my blog over to SQLite and I love that the SQLite full-text search extension lets you specify boolean queries. With that in mind, I decided that I would have a curated list of boolean search queries which would be used to filter content from the various sites I read. Any articles that match my search would then be emailed to me.
Here are some of my search terms, which I am viewing in the flask-peewee admin interface:
If you're interested in learning how to build your own version of this project, check out the rest of the post.
I made the decision this week to migrate my personal sites and several other sites I host onto SQLite. Previously almost everything I hosted had been using Postgresql. The move was motivated by a couple factors:
At times it has seemed to me that there is a tacit agreement within the Flask / Django communities that if you're using SQL you should be using Postgresql. Postgresql is an amazing piece of engineering. I have spent the last five years of my career working exclusively with it, and I am continually impressed by its performance and the constant stream of great new features.
So why change things?
Well, as my list indicates, there are a handful of reasons. But the primary reason was that I wanted something lightweight. I'm running a fairly low-traffic, read-heavy site, so Postgresql was definitely overkill. My blog is deployed on a VPS with very limited resources, so every MB of RAM counts. Additionally, I wasn't using any special Postgresql features so there was nothing holding me back.
If you follow my blog you've probably seen a few posts at regular intervals containing screenshots of my desktop. I enjoy customizing my desktop for usability and visual appeal, and since I spend 8+ hours a day on my laptop it's important that things be just right.
Recently I've gotten into ricing my phone homescreen as well. I've got a Moto G handset, a great cheap phone which I like so much better than my previous galaxy nexus. Unlike the majority of phones these days it's not so gigantic that I can hardly hold it in one hand.
Anyways, after rooting my phone and installing SlimKat, a custom KitKat ROM, I spent a bit of time tweaking my homescreen. Here are the results!
I was cruising through some old bookmarked YouTube videos this evening and was sad to see that a few of my favorites were no longer available. I was able to search and find them again, but I decided it might not be a bad idea to save my favorites to my media server. I installed youtube-dl and was impressed by how well it just worked. Since I wanted to store these videos on my media server and watch them on Plex I then
scp-ed the video files to the media server. Then it hit me -- why not just write a tiny web frontend for youtube-dl? Once I had a web app, then I could write a chrome extension to communicate with it...
With Flask the whole process took about 20 minutes, so I thought I'd share in case anyone else would find this useful.
Read on for the details.
A couple Saturdays ago I spent the morning hacking together a note-taking app. I'm really pleased with the result, so I thought I'd share the code in case anyone else might find it useful.
The note-taking project idea came about out of necessity -- I wanted something that worked well from my phone. While I have a personal wiki site I've used for things like software installation notes or salsa recipes, I've also noticed that because it's so cumbersome to use from my phone, I often end up emailing things to myself. Plus a wiki implies a kind of permanence to the content, making it not a great fit for these impromptu notes. I also like to use markdown to format notes, but markdown isn't too easy on a phone because of the special characters or the need to indent blocks of text. With these considerations in mind, I set out to build a note-taking app that would be easy to use from my phone.
Here is how the app appears on a narrow screen like my phone:
And here it is on my laptop:
Because markdown is a bit difficult to use when you're not in a nice text editor like vim, I've added some simple toolbar buttons to the editor:
Read the full post for all the details!
I'm pleased to announce that peewee now has more robust support for schema migrations. Basic migration support for Postgres has existed for some time, but peewee now supports SQLite and MySQL as well. In addition, there were a couple of issues with the original migrations API. Schema migrations have been one of the most-requested features, so I'm hopeful that with the addition of this feature, peewee users will have one more reason to enjoy using the library!
Check out the rest of the post for more details.
yield from syntax, introduced in PEP 380, is getting a lot of attention lately due to its important role in the new
asyncio package. I did not immediately understand what this syntax provides, but I have a handy way of thinking about it which I thought I'd share on my blog.
Imagine you have an arbitrarily nested list structure like so:
lists = [ 1, 2, 3, [4, 5, [6, 7], 8], [[[9, 10], 11]], [], 12, ]
You can flatten this data-structure by writing a recursive generator thanks to the new
yield from syntax:
def flatten(items): for item in items: if isinstance(item, (list, tuple)): yield from flatten(item) else: yield item
The output would then be:
>>> [item for item in flatten(lists)] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
To achieve this using Python 2.x, which does not have
yield from, you would instead write the recursive call like this:
if isinstance(item, (list, tuple)): for subitem in flatten(item): yield subitem
Two new themes.
I am proud to live in Lawrence, KS, a college town of about 100,000 which has been my home for the majority of my life. Perhaps the most striking feature about my home is the amazing sky here -- nowhere else I've lived comes close:
Being in the tech industry, I'm often asked if I have plans to move away to a place with more jobs. I always answer simply and somewhat apologetically that I intend to stay in Kansas. Answering that way is so much less embarassing than explaining why I love Kansas. My home is very much a part of me, though, and I'd like to write just once about why I am so happy to live here.
When I first wrote peewee I set out to accomplish a simple task: make it easy to execute queries in my Flask apps. I was a bit familiar with SQLAlchemy, but wanted something lightweight and thought it would be a quick project. While the first version only took a couple days to write, over the past two or three years peewee has been my favorite project to work on. I've been very surprised to see that it's user base has grown, and would like to ask anyone who is using peewee:
I'd like to add a "testimonials" section to the documentation that describes the interesting projects people have written using peewee. If you don't mind sharing, I'd love to hear about your project.
In case you've missed the last few releases, I've been busy adding some fun new features to peewee. While the changelog and the docs explain the new features and describe their usage, I thought I'd write a blog post to provide a bit more context.
Most of these features were requested by peewee users. I depend heavily on users like you to help me improve peewee, so thank you very much! Not only have your feature requests helped make peewee a better library, they've helped me become a better programmer.
So what's new in peewee? Here is something of an overview:
Hopefully some of those things sound interesting. In this post I will not be discussing everything, but will hit some of the highlights.
I've redone my desktop again and thought I'd post some screenshots. The image below is "curated" for maximum visual appeal, but usually I just work with the browser and a few terminals.
December marks my 9th month working remotely for Counsyl and I thought I would write about my experience working from home.
I sat down and started working on a new library shortly after posting about
Django's missing API for generating SQL.
is the result, and provides a simple
translate() function that will recursively
translate a Django model graph into a set of "peewee equivalents". The peewee
versions can then be used to construct queries which can be passed back into
Django as a "raw query".
Here are a couple scenarios when this might be useful:
I've included this module in peewee's playhouse, which is bundled with peewee.
I had the opportunity this week to write some fairly interesting SQL queries. I don't write "raw" SQL too often, so it was fun to use that part of my brain (by the way, does it bother anyone else when people call SQL "raw"?). At Counsyl we use Django for pretty much everything so naturally we also use the ORM. Every place I've worked there's a strong bias against using SQL when you've got an ORM on board, which makes sense -- if you choose a tool you should standardize on it if for no other reason than it makes maintenance easier.
So as I was saying, I had some pretty interesting queries to write and I struggled to think how to shoehorn them into Django's ORM. I've already written about some of the shortcomings of Django's ORM so I won't rehash those points. I'll just say that Django fell short and I found myself writing SQL. The queries I was working on joined models from very disparate parts of our codebase. The joins were on values that weren't necessarily foreign keys (think UUIDs) and this is something that Django just doesn't cope with. Additionally I was interested in aggregates on calculated values, and it seems like Django can only do aggregates on a single column.
As I was prototyping, I found several mistakes in my queries and decided to run them in the postgres shell before translating them into my code. I started to think that some of these errors could have been avoided if I could find an abstraction that sat between the ORM and a string of SQL. By leveraging the python interpreter, the obvious syntax errors could have been caught at module import time. By using composable data structures, methods I wrote that used similar table structures could have been more DRY. When I write less code, I think I generally write less bugs as well.
That got me started on my search for the "missing link" between SQL (represented as a string) and Django's ORM.
I recently heard a talk from a coworker wherein one of the things he discussed was automatically converting CSV data for use with a SQLite database. I thought this would be a great thing to add to peewee, especially as lately I've found myself on several occasions working with CSV and battling with it in a spreadsheet. It would be much easier to load it into a database and then query it using a tool I'm familiar with.
Which brings me to
playhouse.csv_loader, a new module I've added to the
playhouse package of extras. It's hopefully really easy to use. Here is an
example of how you might use it:
>>> from playhouse.csv_loader import * >>> db = SqliteDatabase(':memory:') # Create an in-memory sqlite database # Load the CSV file into the in-memory database and return a Model suitable # for querying the data. >>> ZipToTZ = load_csv(db, 'zipcode_to_timezone.csv') # Get the timezone for a zipcode. >>> ZipToTZ.get(ZipToTZ.zip == 66047).timezone 'US/Central' # Get all the zipcodes for my town. >>> [row.zip for row in ZipToTZ.select().where( ... (ZipToTZ.city == 'Lawrence') && (ZipToTZ.state == 'KS'))] [66044, 66045, 66046, 66047, 66049]
I saw an interesting post on reddit yesterday showing how user "iFargle" had customized the start page of Google Chrome to display his most commonly-used links. I had to have it! After spending some time customizing, here is what I came up with:
Counsyl is the first job I've worked that has a formal code review process. At first it was intimidating (it still is sometimes), but I really have been impressed how review leads to better code. I still make mistakes in my own code, and sometimes I miss bugs in other's code. Bugs are going to happen, though, so I won't spend time talking about how to write bug-free code. Instead I'll write about some things I've noticed that make the review process go more smoothly. I've seen that a lot of productivity and good will can be gained by how you approach the person whose code you're reviewing, and the person reviewing your code.
I remember spending hours when I was younger cycling through the various awesome color themes on my 386, in the glory days of windows 3.1. Remember hotdog stand?
Well, I haven't changed much. I still enjoy making tweaks to the colors and appearance of my desktop. In this post I'll talk about a script I wrote that makes it easy for me to modify all the various colors and configuration files which control the appearance of my desktop.
It's been roughly four years since my introduction to the Django framework and I thought I'd write a little post to commemorate this. In my mind nothing had as big an impact on my career as my decision to work at the Journal World. When I started there I knew basically nothing about software engineering or open-source, and it is entirely thanks to my excellent (and patient) coworkers there that I was able to learn about these things.
For a while I've been itching to rewrite Huey, and just last week released 0.4 which is an almost total rewrite. I initially started Huey for performing tasks like checking comments for spam, sending emails, generating thumbnails, and basically anything that would slow down the pagespeed on my sites. This is still what I see as the primary use-case for huey -- performing small tasks outside the request/response cycle and running jobs on a schedule (I have a site that scrapes the county sheriff's site and keeps a log of arrests in my town). The goal for the rewrite was not to change the purpose of Huey, rather it was to change the API.
The other day a friend of mine was trying out flask-peewee and he had some questions about the best way to structure his app to avoid triggering circular imports. For someone new to flask, this can be a bit of a puzzler, especially if you're coming from django which automatically imports your modules. In this post I'll walk through how I like to structure my flask apps to avoid circular imports. In my examples I'll be showing how to use "flask-peewee", but the same technique should be applicable for other flask plugins.
I'll walk through the modules I commonly use in my apps, then show how to tie them all together and provide a single entrypoint into your app.
I had fun writing about my "cd" helper, so I thought I'd share another productivity helper I wrote for setting my wallpaper. It's a little silly, but I insist on my wallpaper being used for my lockscreen and my login window as well -- that way the entire time I'm on my computer the background is "seamless". Before I wrote this script it used to take me probably 3 or 4 minutes to change wallpapers!
My password "system" used to be that I had three different passwords, all of which were variations on the same theme. I maintained a list of sites I had accounts on and for each site gave a hint which of the three passwords I used. What a terrible scheme.
A couple weeks ago I decided to do something about it. I wanted, above all, to only have to remember a single password. Being lately security-conscious, I also recognized the need for a unique password on every site.
In this post I'll show how I used python to create a password management system that allows me to use a single "master" password to generate unique passwords for all the sites and services I use.
cd a lot, I'm no exception. Because I use virtualenvs for my
python projects, I'm often "cutting" through several layers of crap to get to
what I actually want to edit. This was a good opportunity for a helper script!
The two biggest annoyances I was trying to alleviate were:
cd ../../some-other-dir/foo/. It would be nice to just type the part that matters and not the whole thing.
The solution I came up with stores directories I use (the entire path), and then I can perform a search of that history using a partial path.
My Raspberry Pi got a new case this weekend:
In this post I'd like to talk about some of the shortcomings of the Django ORM, the ways peewee approaches things differently, and how this resulted in peewee having an API that is both more consistent and more expressive.
So, for linux folks out there, here is a little wrapper around scrot, a linux screenshot utility. It will allow you to capture the full screen, the current window, or free-select a region, then take the resulting image and put it in your dropbox folder or upload it to Imgur:
#!/usr/bin/env python import base64 import json import optparse import os import subprocess import sys import time import urllib import urllib2 BINARY = 'scrot' HOME = os.environ['HOME'] # Imgur API -- register your app and paste the client id and secret: CLIENT_ID = '' CLIENT_SECRET = '' # Location of your dropbox folder and your dropbox user id: DROPBOX_DIR = os.path.join(HOME, 'Dropbox/Public/screens/') DROPBOX_URL_TEMPLATE = 'http://dl.dropbox.com/u/%s/screens/%s' DROPBOX_UID = '' def upload_file(filename): with open(filename, 'rb') as fh: contents = fh.read() payload = urllib.urlencode(( ('image', base64.b64encode(contents)), ('key', CLIENT_SECRET), )) request = urllib2.Request('https://api.imgur.com/3/image', payload) request.add_header('Authorization', 'Client-ID ' + CLIENT_ID) try: resp = urllib2.urlopen(request) except urllib2.HTTPError, exc: return False, 'Returned status: %s' % exc.code except urllib2.URLError, exc: return False, exc.reason resp_data = resp.read() try: resp_json = json.loads(resp_data) except ValueError: return False, 'Error decoding response: %s' % resp_data if resp_json['success']: return True, resp_json['data']['link'] return False, 'Imgur failure: %s' % resp_data def get_parser(): parser = optparse.OptionParser('Screenshot helper') parser.add_option('-s', '--select', action='store_true', default=True, dest='select', help='Select region to capture') parser.add_option('-f', '--full', action='store_true', dest='full', help='Capture entire screen') parser.add_option('-c', '--current', action='store_true', dest='current', help='Capture currently selected window') parser.add_option('-d', '--delay', default=0, dest='delay', type='int', help='Seconds to wait before capture') parser.add_option('-p', '--public', action='store_true', dest='dropbox', help='Store in dropbox public folder') parser.add_option('-x', '--no-upload', action='store_false', default=True, dest='upload', help='Do not upload to imgur') parser.add_option('-k', '--keep-local', action='store_true', default=False, dest='keep', help='Keep local copy after upload') return parser def get_scrot_command(filename, options): args = [BINARY] if options.current: args.append('-u') elif not options.full: args.append('-s') if options.delay: args.append('-d %s' % options.delay) args.append(dest) return args if __name__ == '__main__': parser = get_parser() options, args = parser.parse_args() filename = 's%s.png' % time.time() if options.dropbox: dest = os.path.join(DROPBOX_DIR, filename) else: dest = os.path.join(HOME, 'tmp', filename) if not options.current and not options.full: print 'Select a region to capture...' scrot_args = get_scrot_command(dest, options) p = subprocess.Popen(scrot_args) p.wait() if options.dropbox: print DROPBOX_URL_TEMPLATE % (DROPBOX_UID, filename) if options.upload: success, res = upload_file(dest) if not success: print 'Error uploading image: %s' % res print 'Image stored in: %s' % dest sys.exit(1) else: if not options.keep: os.unlink(dest) print res else: print dest
I'm working on a little photography website for my Dad and thought it would be neat to extract color information from photographs. I tried a couple of different approaches before finding one that works pretty well. This approach uses k-means clustering to cluster the pixels in groups based on their color. The center of those resulting clusters are then the "dominant" colors. k-means is a great fit for this problem because it is (usually) fast.
I have written a documentation page on upgrading which gives the rationale behind the rewrite and some examples of the new querying API. Please feel free to take a look but much of the information presented in this post is lifted directly from the docs.
The biggest changes between 1.0 and 2.0 are in the syntax used for constructing queries. The first iteration of peewee I threw up on github was about 600 lines. I was passing around strings and dictionaries and as time went on and I added features, those strings turned into tuples and objects. This meant, though, that I needed code to handle all the possible ways of expressing something. Look at the code for parse_select.
I learned a valuable lesson: keep data in datastructures until the absolute last second.
With the benefit of hindsight and experience, I decided to rewrite and unify the API a bit. The result is a tradeoff. The newer syntax may be a bit more verbose at times, but at least it will be consistent.
About two months ago I became the proud owner of a 2004 Yamaha R6! Previously I had been riding an '02 Honda Shadow (pic) and the change has been a revelation.
The other day I noticed I had a couple thumbdrives kicking around with various versions of my "absolutely do not lose" files...stuff like my private keys, tax documents, zips of papers I wrote in college, etc. These USB drives were all over the house, and many contained duplicate versions of the same files. I thought it would be neat to write a little app to give me a web-based interface to store and manage these files securely. In this post I'll talk about how I built a web-based file storage app using flask, pycrypto, and amazon S3.
I wrote a little python IRC bot library a while back. It comes with a few silly examples like a google search bot, an ASCII art bot, even an example botnet. Today I was lurking around in a channel with a bunch of other local developers and noticed that we often are pasting links of images to "contextualize" things other folks have said.
I decided to port my favorite vim theme, candycode, to gedit.
I think it would be great if more sites allowed users (or consumers of their APIs) to produce and execute ad-hoc queries against their data. In this post I'll talk a little bit about some ways sites are currently doing this, some of the challenges involved, my experience trying to build something "reusable", and finally invite you to share your thoughts.
The other day I was poking around my google analytics account and thought it would be a fun project to see if I could collect "analytics"-type data myself. I recalled that the Apache Cassandra project was supposed to use a data model similar to Google's BigTable so I decided to use it for this project. The BigTable data model turned out to be a good fit for this project once I got over some of the intricacies of dealing with time-series data in Cassandra. In this post I'll talk about how I went about modelling, collecting, and finally analyzing basic page-view data I collected from this very blog.