Peewee 3.0 released

February 01, 2018 15:36 / peewee python / 1 comments

On Monday of this week I merged in the 3.0a branch of peewee, a lightweight Python ORM, marking the official 3.0.0 release of the project. Today as I'm writing this, the project is at 3.0.9, thanks to so many helpful people submitting issues and bug reports. Although this was pretty much a complete rewrite of the 2.x codebase, I have tried to maintain backwards-compatibility for the public APIs.

In this post I'll discuss a bit about the motivation for the rewrite and some changes to the overall design of the library. If you're thinking about upgrading, check out the changes document and, if you are wondering about any specific APIs, take a spin through the rewritten (and much more thorough) API documentation.

Why rewrite?

When I began the first rewrite that eventually turned into Peewee 2.0, I discovered a handy technique (using recursive-descent) for turning arbitrarily-nested Python data-structures into SQL. This gave Peewee an API that was both expressive and composable. Combined with operator-overloading, one could now express complex queries in a way that would even be validated by the Python interpreter itself:

class Person(Model):
    first = TextField()
    last = TextField()
    dob = DateField()

# Find adults whose last name starts with "A":
eighteen_years_ago = datetime.date.today() - datetime.timedelta(days=18 * 365)
a_people = (Person
            .select()
            .where((fn.LOWER(fn.SUBSTR(Person.last, 1, 1)) == 'a') &
                   (Person.dob <= eighteen_years_ago)))

Although I had wanted to have each "node" in the SQL AST be able to describe, in sufficient detail, it's corresponding SQL, there are sometimes subtle differences in the way database engines represent certain structures. Additionally, depending on the context, a certain object needed to be converted into different SQL (e.g. a nested SELECT has parentheses while the outer-most SELECT does not need them). Because, at this time, I did not envision it being feasible to make all these nodes in the SQL AST aware of the database/context they were being executed against, I ended up with pretty much all SQL-generation logic in a single class. It was kind of like this:

# This was hard to extend!!
class QueryCompiler(object):
    def parse_node(self, node, params):
        if isinstance(node, Entity):
            return self.parse_entity(node, params)
        elif isinstance(node, Expression):
            return self.parse_expression(node, params)
        # etc.

    def parse_entity(self, node, params):
        return '.'.join('"%s"' % part for part in node.parts), params

    def parse_expression(self, node, params):
        lhs = self.parse_node(node.lhs)
        rhs = self.parse_node(node.rhs)
        return '(%s %s %s)' % (lhs, node.op, rhs), params + node.params

With all this logic in a monolithic class, it made it quite difficult and hacky to extend the SQL-generation code with application-defined classes. In 3.0 I introduced the class that would change all this: a lightweight scope / context that allowed classes to generate SQL appropriate to the given context. Peewee 3.0 looks more like this:

class Entity(Node):
    def __init__(self, *path):
        self.path = path

    def __sql__(self, ctx):
        return ctx.literal('.'.join('"%s"' % part for part in self.path))

class Expression(Node):
    def __init__(self, lhs, op, rhs):
        self.lhs, self.op, self.rhs = lhs, op, rhs

    def __sql__(self, ctx):
        # Tell context to wrap this in parentheses, if necessary.
        with ctx.parentheses():
            return (ctx
                    .sql(self.lhs)
                    .literal(' %s ' % self.op)
                    .sql(self.rhs))

If you want to extend Peewee with your own SQL abstractions, it's as simple as implementing the __sql__() method! The context also accumulates the parameters for parameterized queries, so no need to pass those around anymore.

Adding these nice APIs and the scope/context made a lot of things possible that would have been extremely difficult in 2.x, such as common table expressions, proper support for UNION/INTERSECT/etc, and more. It was also possible to get rid of one of the grossest hacks in Peewee 2.x, the strip_parens() function (which turned stuff like "((foo))" into "(foo)").

Additionally, using this approach, it became clear to me that Peewee could–and should–implement low-level APIs for dealing with tables and columns. APIs that would be extended and fleshed out to implement the more-sophisticated Model and Field classes.

Merge and Push

I was pretty nervous to merge and push the new code, as Peewee has grown to have a small but loyal following, and I didn't want to be inconsiderate of the all the people who had placed their trust in me to maintain a quality library. So I didn't rush. Many of the major pieces of the 3.0 rewrite were completed months ago. I even did a small proof-of-concept using Cython, to see how the design worked out (you can find it on my GitHub).

Over the last few months I would implement a feature, work out some kinks in an API, add tests, etc., until I felt comfortable that I hadn't forgotten anything important. I switched over all my personal code to 3.0 to see whether any gaps showed up. After re-writing, revising and extending the documentation, I decided it was ready to merge.

Thank you!

Many people have submitted bug-reports or questions to the issue tracker since 3.0.0 was released on Monday. Thanks to their willingness and patience, I've been able to track down and fix some issues that did not come up during my own testing. I, and all the other users (whether they know it or not), are indebted to them for their help. Thank you!

If you have any questions or comments, feel free to leave a comment below or to contact me. If you have a bug to report or questions specifically about the project, feel free to let me know on the issue tracker.

Comments (1)

kinotix | feb 05 2018, at 06:12pm

eighteen_years_ago = datetime.date.today() - dateutil.relativedelta.relativedelta(years=18) instead of: eighteen_years_ago = datetime.date.today() - datetime.timedelta(days=18 * 365) would give a more accurate result, taking leap years into consideration

Commenting has been closed.