Postgresql HStore, JSON data-type and Arrays with Peewee ORM

I've developed an interest in some of the more advanced features of SQLite after reading the O'Reilly title Using SQLite (Small. Fast. Reliable. Choose Any Three). For personal projects I like using SQLite, but when I need something more powerful I turn to Postgresql. Because peewee supports both of these databases (as well as MySQL), it is limited to a lowest-common-denominator feature set. While this encompasses a broad range of features, each database engine has its own extensions and I've been interested in adding some pythonic support for the cooler extensions.

Here are some of the fun things you can find in peewee's playhouse (collection of extensions):

This post will showcase the peewee support for HStore, JSON document store, and arrays. I've written elsewhere about SQLite's full-text search, so if you're a SQLite user you may want to check out that post.

To follow along at home, feel free to install peewee:

pip install peewee

Postgresql HStore

HStore is a Postgresql module that provides support for storing and querying arbitrary key/value pairs. As of version 2.3, the popular python driver psycopg2 has supported hstore as a part of its extras module. I've packaged up integration with the peewee orm in an extension module, playhouse.postgres_ext.

HStore is great for storing partially structured data. For example, if you've ever worked with real estate data, you probably share my feelings that modelling for real estate is a total pain in the ass. The sheer amount of features a house may have can lead to a very wide, sparse table if each feature is given its own column, so instead it can be nice to store things as key/value pairs. Let's say we're building a simple table for home data:

from playhouse.postgres_ext import *

# Note that we are using PostgresqlExtDatabase as opposed to
# the usual peewee.PostgresqlDatabase.
db = PostgresqlExtDatabase('home_data', user='postgres')

class House(Model):
    address = CharField()
    features = HStoreField()

    class Meta:
        database = db

Now that we have a House model, let's see what HStore can do. Below we're just storing and retrieving a python dictionary:

>>> h = House.create(
...     address='123 Main St',
...     features={'garage': '2 cars', 'bath': '2 bath'})

>>> h_from_db = House.get(House.id == h.id)
>>> h_from_db.features
{'bath': '2 bath', 'garage': '2 cars'}

So that's pretty cool, we can store arbitrary key/value pairs in the database and work with them using python dictionaries, but the real power comes from the ways we can query the stored data. Below are three examples showing search for a single key, multiple keys, and finally searching for a partial dictionary:

>>> features = House.features  # Create a reference to the HStore field.
>>> House.select().where(
...     features.contains('garage')) # All houses with a garage.

>>> House.select().where(
...     features.contains(['garage', 'bath']))

>>> House.select().where(
...     features.contains({'garage': '2 cars'})) # Houses with a 2-car garage.

You can perform atomic updates and deletes using a special function:

>>> query = House.update(features=House.features.update({
...     'bath': '2.5 bath',
...     'sqft': '2000'}))
>>> query.where(House.id == h.id).execute()
1

>>> h = House.get(House.id == h.id)
>>> h.features  # The features now contains our updates.
{'bath': '2.5 bath', 'garage': '2 cars', 'sqft': '2000'}

>>> query = House.update(features=House.features.delete('bath'))
>>> query.where(id=h.id).execute()
1

>>> h = House.get(House.id == h.id)
>>> h.features
{'garage': '2 cars', 'sqft': '2000'}

The documentation provides a complete reference of the supported HStore functions along with examples, so please check them out for more details! Topics covered include:

JSON Documents

Being able to store and query JSON data along with your other relational data is one of Postgresql's coolest new features. Postgresql supports a JSON data type natively as of 9.2 (with full support in 9.3). In order to use peewee's JSON functionality, you must be using an up-to-date version of Postgres with psycopg2 version 2.5 or greater.

Let's build a simple peewee model for storing API responses. Our model will have a single classmethod request() which will be used to fetch a JSON response for storage in the database.

import json
import urllib2
from playhouse.postgres_ext import *

db = PostgresqlExtDatabase('my_database')  # note

class APIResponse(Model):
    url = CharField()
    response = JSONField()

    class Meta:
        database = db

    @classmethod
    def request(cls, url):
        fh = urllib2.urlopen(url)
        return cls.create(url=url, response=json.loads(fh.read()))

Now here's the really cool part: we can query these JSON responses almost as if they were Python objects. First let's use DuckDuckGo's instant answer API to generate some data:

>>> APIResponse.create_table()
>>> categories = ['Lawrence,KS', 'Python', 'Postgresql', 'Kittens']
>>> for category in categories:
...     APIResponse.request('http://api.duckduckgo.com/?q=%s&format=json' % category)

The response data looks something like this:

{
   "Heading" : "Python",
   "AbstractURL" : "https://en.wikipedia.org/wiki/Python",
   "Definition" : "",
   "AbstractSource" : "Wikipedia",
   "Image" : "",
   "Results" : [],
   "RelatedTopics" : [
      {
         "Result" : "<a href=\"https://duckduckgo.com/Python_(programming_language)\">Python (programming language)</a> A widely used general-purpose, high-level programming language.",
         "Icon" : {"URL" : "", "Height" : "", "Width" : ""},
         "FirstURL" : "https://duckduckgo.com/Python_(programming_language)",
         "Text" : "Python (programming language) A widely used general-purpose, high-level programming language."
      },
      ...
    ]
}

Let's find all the responses that contain something in the Image field, and print it out:

>>> Image = APIResponse.response['Image']  # Alias the lookups.
>>> Heading = APIResponse.response['Heading']
>>> has_image = APIResponse.select(Heading, Image).where(Image != '')
>>> [data for data in has_image.tuples()]
[(u'Lawrence, Kansas', u'https://duckduckgo.com/i/ce07740b.jpg'),
 (u'PostgreSQL', u'https://duckduckgo.com/i/270e21a9.png')]

We can select individual fields from the JSON data, and also filter on these fields in the WHERE clause. Peewee exposes access to the fields using Python's normal dictionary access method (__getitem__).

Let's grab the text for the first related topic of each response. This example mixes access by key and access by index:

>>> [topic for topic in
...  APIResponse.select(APIResponse.response['RelatedTopics'][0]['Text']).tuples()]

[(u'Lawrence, Kansas Category',),
 (u'Python (programming language) A widely used general-purpose, high-level programming language.',),
 (u'PostgreSQL Category',),
 (u'KittenA juvenile domesticated cat. A feline litter usually consists of two to five kittens.',)]

These examples are trivial, but hopefully they show you some of the things that are possible with Postgresql's JSON data-type. In the next release of Peewee I plan on adding support for Postgresql 9.4's binary JSON data-type as well as the new operators it supports (contains, contains_any).

A note on JSONB

One of the limitations of HStore is that it does not support nested key/value pairs. The upside is that HStore supports more flexible (and powerful) querying operations than the JSON data-type. With the addition of JSONB, Postgresql will allow you to store nested key/value pairs as well as query them using special operations like contains and contains_any. It's a very cool feature and I'm looking forward to adding support for it soon.

Arrays

Postgresql arrays allow you to store lists of items alongside your relational data. While lists of items are commonly stored in a related table for normalization, for small lists that can sometimes be overkill. To show off peewee's support for Postgresql arrays, we'll make a simple Note model that supports an arbitrary number of tags.

from playhouse.postgres_ext import *

db = PostgresqlExtDatabase('notes')

class Note(Model):
    text = TextField()
    tags = ArrayField(CharField)

    class Meta:
        database = db

Storing arrays in the database is as easy as storing a normal Python list:

>>> Note.create_table()
>>> data = (
...     ('Write a blog post', ['todo']),
...     ('Check out https://duckduckgo.com/api', ['link', 'todo']),
...     ('"Programming bottom-up": http://paulgraham.com/progbot.html', ['link']),
...     ('Blah', ['foo', 'bar', 'baz']),
...     ('A note with no tags.', []),
... )
>>> for text, tags in data:
...     Note.create(text=text, tags=tags)

Now let's look at the various ways we can query these arrays of tags. First of all, when retrieving our notes from the database, the tags are returned as a normal list:

>>> Note.select().first().tags
['todo']

The interesting stuff happens when we begin searching and indexing our arrays:

>>> todo_notes = Note.select().where(Note.tags.contains('todo'))
>>> [note.text for note in todo_notes]
['Write a blog post',
 'Check out https://duckduckgo.com/api']

>>> todo_with_link = Note.select().where(
...     Note.tags.contains('todo', 'link'))
>>> [note.text for note in todo_with_link]
['Check out https://duckduckgo.com/api']

>>> todo_or_link = Note.select().where(
...     Note.tags.contains_any('todo', 'link'))
>>> [note.text for note in todo_or_link]
['Write a blog post',
 'Check out https://duckduckgo.com/api',
 '"Programming bottom-up": http://paulgraham.com/progbot.html']

>>> first_tag = Note.select(Note.tags[0])
>>> [tags for tags in first_tag.tuples()]
[('todo',), ('link',), ('link',), ('foo',), (None,)]

>>> first_two_tags = Note.select(Note.tags[:1])
>>> [tags for tags in first_two_tags.tuples()]
[(['todo'],),
 ([u'link', u'todo'],),
 ([u'link'],),
 ([u'foo', u'bar'],),
 ([],)]

Peewee's ArrayField can also store multi-dimensional data by specifying dimensions when instantiating the field:

the_matrix = ArrayField(IntegerField, dimensions=2)

If you're curious about the implementation, you can find the source code in postgres_ext.py, or browse on GitHub.

Other cool stuff

There is some other cool stuff in the playhouse modules, including:

If you would like to see support for your favorite extension or database feature, feel free to drop me a line:

Hope you enjoyed reading this!

Comments (2)

Charles | sep 17 2014, at 10:22am

Dru! Haha thanks bud, hope you're doing well!

Dru Sellers | sep 17 2014, at 10:18am

CHARLES!!!! Love it when I see your posts come up.


Commenting has been closed, but please feel free to contact me