Peewee 4: Async, JSON, Eager-Loading and Types

She purrs like a kitten

If you listen real close you can hear the type whispers in there.

Peewee 4.x is a picture, in code, of me eating my hat. We got async and are shipping a significantly better type whispering stub based on typeshed's original. Beyond those, I've added a core JSONField which exposes a nice cross-backend API for working with JSON data, and a new declarative API for eager-loading relations.

Peewee 4 is broadly compatible with the 3.x releases, check the changelog for full details.

AsyncIO

To implement asyncio there are a couple of paths authors of synchronous libraries can take. Django chose the "just shit it into a threadpool" approach, and as I'm writing this still doesn't support transactions in its async endpoints, among other quirks like running everything serialized onto a dedicated thread via sync_to_async (with thread-sensitive set). redis-py uses the "copy/paste/adjust-to-taste" approach of making a parallel coroutine-based API, and they have thousands of lines of near-duplication. Other projects like psycopg do code-gen to generate sync code from async implementations. All this is to say that handling async and sync from one codebase often requires making choices which come with tradeoffs (thread-safety issues, maintenance burden, codegen stability, etc).

Luckily, Mike Bayer of SQLAlchemy came up with an approach that works well with libraries like Peewee. This approach uses the greenlet library (not gevent dammit!) to yield to the event loop whenever I/O occurs. Since Peewee I/O occurs through a single method (execute_sql), the implementation ended up being simple: Peewee takes the coroutine, switches control to the async layer and awaits it on the event loop, and when it's done the result gets passed back into Peewee transparently.

Example:

import asyncio, time
from playhouse.pwasyncio import AsyncPostgresqlDatabase

db = AsyncPostgresqlDatabase('peewee_test', pool_size=30)

async def slow_query():
    async with db:  # Acquire a connection for this task.
        await db.aexecute_sql('select pg_sleep(1)')
        print('finished sleeping')

async def main():
    start = time.perf_counter()
    await asyncio.gather(*(slow_query() for _ in range(30)))
    print(round(time.perf_counter() - start, 2))
    await db.close_pool()

asyncio.run(main())

# Prints 1.05 on my machine.

Changing the pool size to 10 correctly prints ~3.05, for example.

In this example, each task checks out its own connection from the pool, hands pg_sleep(1) to asyncpg, and awaits it, so all thirty tasks run concurrently on one thread, with no monkeypatching or any nonsense. Connections in pwasyncio are task-local and maintain their own transactional state, so transactions are isolated and won't interleave across concurrently-running tasks. Since queries are awaited like any other coroutine, they also cancel cleanly.

The best part is that the core stays synchronous and, as every query execution runs through execute_sql, we only need to bridge to the loop in one spot. In short, Peewee uses greenlets to pass coroutines out of synchronous code, so they can be await-ed, at the cost of two lightweight context switches. For more detail check out my earlier post.

My end-goal with pwasyncio was to enable Peewee to work smoothly with async-first frameworks like FastAPI, Quart, etc. At the time of writing, the FastAPI docs don't really discuss using an async ORM. SQLModel in the tutorial uses synchronous endpoints throughout (the threadpool again). With pwasyncio and the new pydantic helpers, a fully async FastAPI demo looks like this:

# fastapi_example.py
from fastapi import Depends, FastAPI, HTTPException
from contextlib import asynccontextmanager
from peewee import *
from playhouse.pwasyncio import AsyncPostgresqlDatabase
from playhouse.pydantic_utils import to_pydantic


db = AsyncPostgresqlDatabase('peewee_test')

class User(db.Model):
    name = CharField(verbose_name='Full Name', help_text='Display name')
    email = CharField(unique=True)
    status = IntegerField(default=1, choices=(
        (1, 'Active'),
        (2, 'Inactive'),
        (3, 'Deleted')))

# Generate pydantic schemas suitable for create and responses.
# Schemas include metadata derived from verbose_name, help_text, choices, and
# default value.
UserCreate = to_pydantic(User, model_name='UserCreate')
UserResponse = to_pydantic(User, exclude_autofield=False, model_name='UserResponse')

async def get_db():
    # Hold a pooled connection open for the duration of the request.
    async with db:
        yield db

@asynccontextmanager
async def lifespan(app):
    # Create tables (if they don't exist) at application startup.
    async with db:
        await db.acreate_tables([User])

    yield
    await db.close_pool()  # Shut-down pool and exit.

app = FastAPI(lifespan=lifespan)

@app.get('/users', response_model=list[UserResponse])
async def list_users(db=Depends(get_db)):
    rows = await User.select().dicts().aexecute()
    return [UserResponse(**row) for row in rows]

@app.post('/users', response_model=UserResponse)
async def create_user(data: UserCreate, db=Depends(get_db)):
    user = await User.acreate(**data.model_dump())
    return UserResponse.model_validate(user)

@app.get('/users/{user_id}', response_model=UserResponse)
async def get_user(user_id: int, db=Depends(get_db)):
    try:
        user = await db.get(User.select().where(User.id == user_id))
    except User.DoesNotExist:
        raise HTTPException(status_code=404, detail='User not found')
    return UserResponse.model_validate(user)

To learn more, check out the async documentation. The framework integration doc shows other FastAPI patterns such as Dependency Injection, as well as how to use Peewee with other popular async frameworks like Starlette, Quart, etc.

JSON Field

Peewee 4 adds a core JSONField which provides a unified API for reading, writing, querying and modifying JSON data. Prior to this, the JSON fields were scattered across backend-specific playhouse extensions, and they each had subtly different APIs and behaviors. The biggest pain-point is differentiating between SQL types and JSON types, e.g.:

-- sqlite
WITH t(j) AS (VALUES ('{"k": "v"}'), ('{"k": 1}'))
SELECT j,
       j->'k' AS json_val,
       typeof(j->'k') AS json_type,
       j->>'k' AS sql_val,
       typeof(j->>'k') AS sql_type
FROM t;

╭────────────┬──────────┬───────────┬─────────┬──────────╮
     j       json_val  json_type  sql_val  sql_type 
╞════════════╪══════════╪═══════════╪═════════╪══════════╡
 {"k": "v"}  "v"       text       v        text     
 {"k": 1}    1         text             1  integer  
╰────────────┴──────────┴───────────┴─────────┴──────────╯

SQLite's ->> returns a native integer for the numeric key, while -> keeps it as JSON-typed text. Postgres is subtly different: -> always returns jsonb and ->> always returns text, regardless of the value.

-- postgres
WITH t(j) AS (VALUES ('{"k": "v"}'::jsonb), ('{"k": 1}'::jsonb))
SELECT j,
       j->'k' AS json_val,
       pg_typeof(j->'k') AS json_type,
       j->>'k' AS sql_val,
       pg_typeof(j->>'k') AS sql_type
FROM t;

     j       json_val  json_type  sql_val  sql_type
────────────┼──────────┼───────────┼─────────┼──────────
 {"k": "v"}  "v"       jsonb      v        text
 {"k": 1}    1         jsonb      1        text

MySQL 8 has a real JSON type too, so -> comes back as json. MariaDB has no JSON type at all (it's an alias for longtext), so everything you extract is plain text.

The core JSONField provides a consistent API that works across SQLite (3.38+), Postgres (jsonb), MySQL 8 and MariaDB (pass mariadb=True to MySQLDatabase). The complexities around SQL-vs-JSON are mostly avoided by treating everything as a JSON value (-> in SQLite/Postgres, as opposed to ->>). This isn't perfect, as it requires explicit casts in some places to force numeric ordering (as opposed to lexicographic), but on the whole it works well and is an improvement over the pre-existing playhouse implementations.

Here's the model and data I'll use in the examples:

from peewee import *

db = PostgresqlDatabase('peewee_test')  # or SqliteDatabase / MySQLDatabase.

class Pet(db.Model):
    name = TextField()
    data = JSONField()

with db.atomic():
    db.create_tables([Pet])

    Pet.create(name='Huey', data={
        'species': 'cat',
        'age': 14,
        'toys': ['hard-fur-mouse', 'ball'],
        'vitals': {'weight': 8, 'vet': 'Dr. Quinn'}})

    Pet.create(name='Mickey', data={
        'species': 'dog',
        'age': 10,
        'toys': ['bone', 'rope'],
        'vitals': {'weight': 40}})

    Pet.create(name='Zaizee', data={
        'species': 'cat',
        'age': 2,
        'toys': ['feather'],
        'vitals': {'weight': 15, 'vet': 'Dr. Quinn'}})

Peewee supports reading the data back, traversing paths, and doing comparisons:

Pet.get(Pet.name == 'Huey').data

# Poor Huey, he's getting to be an old boy. He goes kinda slow down
# the stairs these days.
{'age': 14,
 'toys': ['hard-fur-mouse', 'ball'],
 'vitals': {'vet': 'Dr. Quinn', 'weight': 8},
 'species': 'cat'}


# Each pet's best toy.
query = Pet.select(Pet.name, Pet.data['toys'][0])
list(query.tuples())

# Result:
[('Huey', 'hard-fur-mouse'), ('Mickey', 'bone'), ('Zaizee', 'feather')]


# How many of each species do we have? My Aunt and Uncle in Connecticut
# used to have like 30 pets, including a (real) prairie dog! Could I
# have a future in animal hoarding? At what point do they just start
# eating each other, I wonder?
query = (Pet
         .select(Pet.data['species'].alias('species'),
                 fn.COUNT(Pet.id).alias('count'))
         .group_by(Pet.data['species']))
[(p.species, p.count) for p in query]

# Result (amateur numbers):
[('cat', 2), ('dog', 1)]


# What cats do we have, again?
[p.name for p in Pet.select().where(Pet.data['species'] == 'cat')]

# Result:
['Huey', 'Zaizee']


# In order to avoid lexicographic comparison (SQLite and MariaDB), when
# comparing against numbers add `.as_int()` (or `.as_float()`):
[p.name for p in Pet.select().where(Pet.data['age'].as_int() > 5)]

# Result:
['Huey', 'Mickey']

There are also helpers for doing atomic mutations on JSON values:

# Give Huey some yarn, since he's getting old.
(Pet.update(data=Pet.data['toys'].append('yarn'))
 .where(Pet.name == 'Huey')
 .execute())

# Huey's new prescription food has more nutrition in it, I think.
(Pet.update(data=Pet.data['vitals']['weight'].set(9))
 .where(Pet.name == 'Huey')
 .execute())

Pet.get(Pet.name == 'Huey').data

# Result:
{'age': 14,
 'toys': ['hard-fur-mouse', 'ball', 'yarn'],
 'vitals': {'vet': 'Dr. Quinn', 'weight': 9},
 'species': 'cat'}

Finally, MySQL, MariaDB and Postgres offer containment operations, so you can search by key or partial document.

has_vet = Pet.select().where(Pet.data['vitals'].has_key('vet'))
[p.name for p in has_vet]

# Cats are, generally, sicklier than dogs:
['Huey', 'Zaizee']


# Partial-doc containment:
cats = Pet.select().where(Pet.data.contains({'species': 'cat'}))
[p.name for p in cats]

# Result:
['Huey', 'Zaizee']

Instead of learning one API per database, there's now one canonical JSON API. Unfortunately, though, there are still some little quirks that ended up being too awkward to paper over. The main one to be aware of is that Postgres ships a shallow update, while SQLite and MySQL offer full merge/patch nested updates. The backend-specific fields in playhouse aren't going anywhere, if you prefer to continue using those, however the new JSONField is more robust and consistent in how it differentiates between SQL and JSON-typed values, and should be preferred whenever possible.

For more details, see the JSONField docs.

Eager Loading

For as long as I can remember, Peewee has provided eager-loading of relations via JOINs across the "forward" direction of a foreign-key. For example, when displaying a list of Tweets, also display the author's username:

query = (Tweet
         .select(Tweet, User)
         .join(User))
for tweet in query:
    print(tweet.user.username, '->', tweet.content)

But the opposite direction has always been a bit trickier: for each user in a list, also list out their tweets. I added prefetch() many years ago to address this need. Behind-the-scenes, prefetch executes one query per relation being fetched, and then efficiently merges the related rows back together in Python. The implementation is reasonably solid, but it suffers from a few defects. First, prefetch is not declarative - it evaluates its parameters immediately and doesn't play very nicely with other ORM APIs. Second, a very common scenario is to only prefetch X number of rows per parent. This is particularly tricky to get right, as it requires using CTEs.

For Peewee 4, I've added a new declarative API for controlling eager-loading of relations. The with_related() method, together with Load(), supports nestable eager-loading using one of several strategies, and also supports a limit-per-parent on the child relation.

The examples below will use my old standbys:

class User(db.Model):
    username = TextField()

class Tweet(db.Model):
    user = ForeignKeyField(User, backref='tweets')
    content = TextField()
    timestamp = TimestampField()

class Favorite(db.Model):
    tweet = ForeignKeyField(Tweet, backref='favorites')
    user = ForeignKeyField(User, backref='favorites')

For the first example, let's fetch a list of users and their tweets. To do this, we'll specify that we want to Load() the backref User.tweets:

query = User.select().with_related(Load(User.tweets))

for user in query:
    print(user.username, [t.content for t in user.tweets])

# Huey ['meow', 'purr', 'hiss']
# Mickey ['woof', 'bark']
# Zaizee []

# We can also specify a query to filter the relation. Let's ignore
# Mickey and see everyone else along with what they're saying in 2026.
tweets_query = Tweet.select().where(Tweet.timestamp.year >= 2026)
query = (User
         .select()
         .where(User.username != 'Mickey')
         .with_related(Load(User.tweets, tweets_query)))

These examples run two queries no matter how many users are found. The query on the Tweets table is filtered by the Users from the outer query:

-- First example.
SELECT "id", "username" FROM "user";

SELECT * FROM "tweet" WHERE "user_id" IN (
    SELECT "id" FROM "user"
);

-- Second example.
SELECT "id", "username" FROM "user" WHERE "username" != 'Mickey';

SELECT * FROM "tweet" WHERE "timestamp" >= ... AND "user_id" IN (
    SELECT "id" FROM "user"
    WHERE "username" != 'Mickey'
);

The primary use-case is for eagerly-loading backrefs, but the Load helper can also fetch forward foreign-key references. The rows are loaded once, the first time the query is evaluated, regardless of whether you're iterating, calling get(), indexing, taking len(), etc., so it plays nicely with the rest of Peewee's APIs.

To go deeper than one level, relations can be nested with Load.then(). The example below efficiently fetches Users, their tweets, and the favorites on each tweet:

query = User.select().with_related(
    Load(User.tweets).then(
        Load(Tweet.favorites)))

for user in query:
    for tweet in user.tweets:
        print(user.username, tweet.content, len(tweet.favorites))

# Huey meow 1
# Huey purr 0
# Huey hiss 2
# Mickey woof 1
# Mickey bark 0

Each Load can also specify its own query for filtering, ordering and joining:

# Each tweet's favorites, with the favoriting user loaded in the same query:
favorites = Favorite.select(Favorite, User).join(User)
query = (Tweet
         .select()
         .with_related(Load(Tweet.favorites, favorites))
         .order_by(Tweet.content))

for tweet in query:
    print(tweet.content, [f.user.username for f in tweet.favorites])

# bark []
# hiss ['Mickey', 'Zaizee']
# meow ['Zaizee']
# purr []
# woof ['Huey']

Because the favorites query is a real query, accessing f.user.username is free since the favoriting users were joined and selected.

The piece I'm happiest with is per_parent. A plain LIMIT on an eager load applies to the relation as a whole, so limit(2) gives you two tweets total, not two per user. Getting the two most-recent tweets for each user is a top-N-per-group problem, and, as I mentioned, those are fiddly to write by hand. per_parent=n does this with a window function, still in a single query:

recent = Tweet.select().order_by(Tweet.timestamp.desc())
query = User.select().with_related(
    Load(User.tweets, recent, per_parent=2))

for user in query:
    print(user.username, [t.content for t in user.tweets])

# Huey ['hiss', 'purr']
# Mickey ['bark', 'woof']
# Zaizee []

The last piece is the loading strategy. The default strategy embeds the parent query as an IN (SELECT ...) subquery. PREFETCH_TYPE.JOIN joins against the parent query as a derived table instead, which you need when paginating the parent on MySQL/MariaDB (since neither will accept a LIMIT inside an IN subquery). PREFETCH_TYPE.MATERIALIZE skips the subquery entirely and sends the parent keys it already holds in memory as a list of values:

Load(User.tweets, strategy=PREFETCH_TYPE.MATERIALIZE)

# SELECT * FROM "tweet" WHERE "user_id" IN (1, 2, 3)

Materialize avoids re-running the parent query, at the cost of one bind parameter per parent, so it is suitable for expensive parent queries that return a manageable number of rows.

For more details, see the eager-loading docs.

Types

The final piece is also the one I deserve the most flak for not doing sooner, but yeah, type whispers have finally arrived in-tree... Well, kinda. It's a long story, but trying to get typing into Peewee is actually a pain-in-the-ass because we ship a py-module instead of a package, so there's no place to put a py.typed marker. I originally began by making some improvements to the typeshed stub, but my patch was just sitting for a while, so I started digging and figured out we could just vendor the stub in a top-level peewee-stubs/ package built as part of the wheel/sdist. mypy and pyright both pick it up and immediately start using it.

As of 4.1.1, Peewee will ship with this improved type stub, which (finally) properly handles differentiating between Model.field and instance.field:

class User(Model):
    username = CharField()
    age = IntegerField(null=True)  # Type stub respects null= option.

class Tweet(Model):
    user = ForeignKeyField(User)

reveal_type(User.username)  # CharField[str] (querying, etc)
reveal_type(User().username)  # str (the actual username value)

reveal_type(User.age)  # IntegerField[int | None]
reveal_type(User().age)  # int | None

reveal_type(Tweet.user)  # ForeignKeyField[User]
reveal_type(Tweet().user)  # User

Anecdotally, type-checking for ORMs like Peewee is kinda tricky. Django has an elaborate mypy plugin, and SQLAlchemy chose to redesign their surface to better accommodate checkers in 2.0. While the stub isn't perfect, my hope is that it hits the 80% use-case neatly, and provides a stable base for future improvement.

What's next

I plan to continue to improve the asyncio implementation if adoption grows and requests start showing up in the issue tracker. I'm still a little dubious whether it will actually see adoption, or if people will continue using the synchronous endpoint + threadpool pattern with frameworks like FastAPI.

Beyond that, my plan is to continue improving Peewee one piece at a time. Maybe migrations, although I don't know, that one's pretty fraught. At work we have schema changes every release, and it seems to always be the case that there are subtle wrinkles that prevent a naive approach: huge tables that require adding the column, default value, and constraint in steps, indexes that need to be added concurrently outside the main DDL transaction, business logic to migrate or update data in new or altered columns... We'll see, though, there is certainly room for some improvement.

I've been happy to make concessions to async and typing. For better or worse, they are mainstream Python, and I don't want to diminish Peewee's utility for an aesthetic pose. One fashion I will continue to reject, though, is the cottage industry that's grown up around importing Spring Data, Fowler, and Enterprise Java Patterns into Python. Repositories around sessions, services around repositories, DTO around models, generics at every layer, all to avoid User.select(). Nobody wants spaghetti code, but check out the lasagna and the baking-sheets covered in tiny dry raviolis. Peewee's philosophy from the start is that the ORM is the abstraction over the database, and business logic lives in a class or function wrapped with db.atomic(). That's one hat I will not eat, unless possibly it was made out of candy.

Comments (0)


Post a comment