Peewee was baroque, so I rewrote it

Today I merged in the "unstable/2.0" branch of peewee. I'm very excited about the changes and I hope you will be, too.

I have written a documentation page on upgrading which gives the rationale behind the rewrite and some examples of the new querying API. Please feel free to take a look but much of the information presented in this post is lifted directly from the docs.

Goals for the rewrite

What changed?

The biggest changes between 1.0 and 2.0 are in the syntax used for constructing queries. The first iteration of peewee I threw up on github was about 600 lines. I was passing around strings and dictionaries and as time went on and I added features, those strings turned into tuples and objects. This meant, though, that I needed code to handle all the possible ways of expressing something. Look at the code for parse_select.

I learned a valuable lesson: keep data in datastructures until the absolute last second.

With the benefit of hindsight and experience, I decided to rewrite and unify the API a bit. The result is a tradeoff. The newer syntax may be a bit more verbose at times, but at least it will be consistent.

I know some people don't care about lines of code, but I think its a good metric for complexity. So for those who are interested, peewee went from almost 2400 SLOC to 1666 as a result of the rewrite.


Since seeing is believing, I will show some side-by-side comparisons. Let’s pretend we’re using the models from the docs, good ol’ user and tweet:

class User(Model):
    username = CharField()

class Tweet(Model):
    user = ForeignKeyField(User, related_name='tweets')
    message = TextField()
    created_date = DateTimeField(
    is_published = BooleanField(default=True)

Get me a list of all tweets by a user named “charlie”:

# 1.0'charlie')

# 2.0 == 'charlie')

Get me a list of tweets ordered by the authors username, then newest to oldest:

# 1.0 -- this is one where there are like 10 ways to express it'username', (Tweet, 'created_date', 'desc'))

# 2.0, Tweet.created_date.desc())

Get me a list of tweets created by users named “charlie” or “peewee herman”, and which were created in the last week.

last_week = - datetime.timedelta(days=7)

# 1.0
    Q(username='charlie') | Q(username='peewee herman')

# 2.0 > last_week) & (
    (User.username == 'charlie') | (User.username == 'peewee herman')

Get me a list of users and when they last tweeted (if ever):

# 1.0{
    User: ['*'],
    Tweet: [Max('created_date', 'last_date')]
}).join(Tweet, 'LEFT OUTER').group_by(User)

# 2.0
    User, fn.Max(Tweet.created_date).alias('last_date')
).join(Tweet, JOIN_LEFT_OUTER).group_by(User)

Let’s do an atomic update on a counter model (you’ll have to use your imagination):

# 1.0
Counter.update(count=F('count') + 1).where(url=request.url)

# 2.0
Counter.update(count=Counter.count + 1).where(Counter.url == request.url)

Let’s find all the users whose username starts with ‘a’ or ‘A’:

# 1.0'LOWER(SUBSTR(username, 1, 1)) = %s', 'a'))

# 2.0, 1, 1)) == 'a')

I hope a couple things jump out at you from these examples. What I see is that the 1.0 API is sometimes a bit less verbose, but it relies on strings in many places (which may be fields, aliases, selections, join types, functions, etc). In the where clause stuff gets crazy as there are args being combined with bitwise operators ("Q" expressions) and also kwargs being used with django-style "double-underscore" lookups. The crazy thing is, there are so many different ways I could have expressed some of the above queries using peewee 1.0 that I had a hard time deciding which to even write.

The 2.0 API is hopefully more consistent. Selections, groupings, functions, joins and orderings all pretty much conform to the same API. Likewise, where and having clauses are handled the same way (in 1.0 the having clause is simply a raw string). The new fn object actually is a wrapper – whatever appears to the right of the dot (i.e. fn.Lower) – is treated as a function that can take any arbitrary parameters.

Where to now?

If you’re feeling froggy and want to get coding, you might want to check out:

Plans for flask-peewee and wtf-peewee

Currently flask-peewee and wtf-peewee are broken. I will be working to update them as I have time. Until then, simply pin your requirements file to a version of peewee before 2.0:

# requirements.txt

Comments (7)

Mike | oct 22 2012, at 09:58am

Awesome looking forward to using it.

Charles Leifer | oct 15 2012, at 10:21am

Flask-peewee is now up-to-date and the tests are passing. The export is working again as well.

Charles Leifer | oct 10 2012, at 10:33am

flask-peewee is just about there. The test suite is passing and things are working for the most part, currently just the export is broken. I'll post an update as soon as everything's fixed.

Mike | oct 10 2012, at 09:26am

Hi Charles

Looks good, I saw you did some updates 16 hours ago on flask-peewee, is everything working again now with peewee 2.0?

Charles Leifer | oct 09 2012, at 11:57am

I have updated the benchmark results and also included results from 1.0 compared to 2.0.

Benchmark results on github

Charles Leifer | oct 09 2012, at 09:08am

I've been doing some profiling, it seems that it does a bit more work when iterating and building objects than 1.0. You can run the included benchmark suite yourself if you have django and sqlalchemy. On my machine peewee has been faster than Django and SQA at most tasks, and about the same when iterating and returning Model instances. I'll get the results posted, though!

Adam | oct 09 2012, at 04:19am

Looks good, what's performance like, any benchmarks?

Commenting has been closed, but please feel free to contact me