charlesleifer.com: Entries tagged with "django"http://charlesleifer.com/blog/tags/django/rss/2013-11-12T11:54:34ZWerkzeugThe search for the missing link: what lies between SQL and Django's ORM?http://charlesleifer.com/blog/the-search-for-the-missing-link-what-lies-between-sql-and-django-s-orm-/2013-11-12T11:54:34Z2013-11-12T11:54:34Zcharles leifer<p><a href="https://media.charlesleifer.com/blog/photos/fossil-fish.jpg" title="Missing Link"><img alt="Missing Link" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/fossil-fish.jpg?key=Ik8-RP5yJjcuPg7iqaR4Tw=="/></a></p>
<p><strong>Edit 11/13/2013</strong>: I added some Django integration to peewee to make it (hopefully) easier to build structured queries when you need to work around the ORM's limitations. <a href="http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#django-integration">Read the docs</a>. Any feedback would definitely be appreciated!</p>
<hr/>
<p>I had the opportunity this week to write some fairly interesting SQL queries.
I don't write "raw" SQL too often, so it was fun to use that part of my brain
(by the way, does it bother anyone else when people call SQL "raw"?). At <a href="https://www.counsyl.com/">Counsyl</a>
we use <a href="https://www.djangoproject.com">Django</a> for pretty much everything so
naturally we also use the ORM. Every place I've worked there's a strong bias
against using SQL when you've got an ORM on board, which makes sense -- if you
choose a tool you should standardize on it if for no other reason than it makes
maintenance easier.</p>
<p>So as I was saying, I had some pretty interesting queries to write and I struggled
to think how to shoehorn them into Django's ORM. I've already
written about some of the <a href="/blog/shortcomings-in-the-django-orm-and-a-look-at-peewee-a-lightweight-alternative/">shortcomings of Django's ORM</a>
so I won't rehash those points. I'll just say that Django fell short and I found
myself writing SQL. The queries I was working on joined models from very disparate
parts of our codebase. The joins were on values that weren't necessarily foreign
keys (think UUIDs) and this is something that Django just doesn't cope with. Additionally
I was interested in aggregates on calculated values, and it seems like Django can
only do aggregates on a single column.</p>
<p>As I was prototyping, I found several mistakes in my queries and decided to run
them in the postgres shell before translating them into my code. I started to think
that some of these errors could have been avoided if I could find an abstraction
that sat between the ORM and a string of SQL. By leveraging the python interpreter,
the obvious syntax errors could have been caught at module import time. By using
composable data structures, methods I wrote that used similar table structures
could have been more DRY. When I write less code, I think I generally write less
bugs as well.</p>
<p>That got me started on my search for the "missing link" between SQL (represented
as a string) and Django's ORM.</p>
<h3>Django query internals</h3>
<p>Have you gone digging into Django's <code>db.models.sql</code> modules? I hadn't spent
much time there, so I took a journey. There's a module named <code>aggregates</code> that
has a handful of classes for representing various aggregation functions over a
single column. There's a module named <code>where</code> for representing the query tree
that comprises the where clause, and also has the SQL generation code for
expressions like <code>status = true</code> and <code>IN</code>, <code>IS NULL</code>, etc. There's a
module named <code>query</code> and also one called <code>subqueries</code>. There are methods
with obscure messages in the docstrings, like <code>Query.add_filter</code>:</p>
<blockquote>
<p>If 'negate' is True, this is an exclude() filter. It's important to
note that this method does not negate anything in the where-clause
object when inserting the filter constraints. This is because negated
filters often require multiple calls to add_filter() and the negation
should only happen once. So the caller is responsible for this (the
caller will normally be add_q(), so that as an example).</p>
</blockquote>
<p>The whole thing is so baroque that I started to wonder if I'd ever find the
missing link. It seems that Django was built using what I see as a "top-down"
approach, where there are several very large classes which encapsulate and hide
the workings of smaller subclasses which are only used internally. As soon as you
get one or two levels in, you meet docstrings like the one above and at that point
you're sunk.</p>
<p>Even the module structure implies this top-down design. <code>django.db.models</code> exposes
familiar friends like fields and models. Another level down you have the <code>sql</code>
modules. In the <code>__init__</code> module of <code>django.db.models.sql</code>, there are only
four classes that are exposed to the outside world:</p>
<div class="highlight"><pre><span></span><code><span class="n">__all__</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Query'</span><span class="p">,</span> <span class="s1">'AND'</span><span class="p">,</span> <span class="s1">'OR'</span><span class="p">,</span> <span class="s1">'EmptyResultSet'</span><span class="p">]</span>
</code></pre></div>
<p>But as I said, if you go looking around you'll find all sorts of other things.
The top-down approach encourages consumers of the API to treat these
classes and modules as black boxes, but if you go looking for the missing link...
well, good luck.</p>
<h3>Top-down makes hacking hard</h3>
<p>I came away from my code dive with the conclusion that top-down designs are going
to be hard to hack on, particularly when the part you're interested in is far below
the facade. Contrast the design of Django's ORM with the architecture of the
excellent library <a href="http://aosabook.org/en/sqlalchemy.html">SQLAlchemy</a> (link to
discussion of sqa's architecture). Above the database and database-specific dialect
is the SQL expression engine. <em>That</em> sounds like a missing link to me, and indeed
the SQL expression engine is a pythonic API for generating the <em>expressions</em> that,
when composed, create queries.</p>
<p><a href="http://docs.peewee-orm.com/">Peewee</a> is similarly built using a bottom-up
approach. Almost all the classes used to construct queries are derived from a
class named <code>Node</code>, which can be combined in predictable ways with the various
other subclasses such as fields, expressions, aggregations, etc.</p>
<p>Was the Django ORM's design intentional, or just an accident?</p>
<h3>Archaeology</h3>
<p>I pulled up the source code for Django 1.0, and was surprised to see that much of
the <code>sql</code> module looks the same as it does now, 5 years later. I went further
back to Django 0.96, 7 years in the past (a really long time!) and started to see
how the design of the ORM's API came about. The high-level methods we know like
<code>.filter()</code>, <code>.values()</code> were all there, but the SQL generation was pretty
disorganized. There didn't seem to be any real analog to 1.0's <code>sql</code> modules.</p>
<p>Looking at it this way, maybe when the ORM was rewritten, the top-down approach
made sense because the foremost concern of the Django devs was to maintain
backwards compatibility with the previous APIs. Django has a great reputation for
not breaking things between releases (something, as a developer, I am grateful for).
Maybe the rewritten ORM was such a massive improvement over the old one that
nobody was remotely critical of its design.</p>
<h3>Now what?</h3>
<p>I think it would be useful for Django to have a structured representation of SQL
that is accessible to developers. The ORM is great, and writing SQL by hand
is flexible, but having a pythonic layer between the two would definitely be
a good thing.</p>
<p>I wonder if there would be value in building a bridge between SQLAlchemy or
peewee's expression building, and Django's "raw" query facilities -- what do you
think?</p>
<h3>A long quote from Richard Feynman</h3>
<p>This quote was taken from Richard Feynman's <em>Personal observations on the reliability of the Shuttle</em>, written after the Challenger disaster:</p>
<blockquote>
<p>The usual way that such engines are designed (for military or
civilian aircraft) may be called the component system, or bottom-up
design. First it is necessary to thoroughly understand the properties
and limitations of the materials to be used (for turbine blades, for
example), and tests are begun in experimental rigs to determine
those. With this knowledge larger component parts (such as bearings)
are designed and tested individually. As deficiencies and design
errors are noted they are corrected and verified with further
testing. Since one tests only parts at a time these tests and
modifications are not overly expensive. Finally one works up to the
final design of the entire engine, to the necessary
specifications. There is a good chance, by this time that the engine
will generally succeed, or that any failures are easily isolated and
analyzed because the failure modes, limitations of materials, etc.,
are so well understood. There is a very good chance that the
modifications to the engine to get around the final difficulties are
not very hard to make, for most of the serious problems have already
been discovered and dealt with in the earlier, less expensive, stages
of the process.</p>
<p>The Space Shuttle Main Engine was handled in a different manner,
top down, we might say. The engine was designed and put together all
at once with relatively little detailed preliminary study of the
material and components. Then when troubles are found in the
bearings, turbine blades, coolant pipes, etc., it is more expensive
and difficult to discover the causes and make changes. For example,
cracks have been found in the turbine blades of the high pressure
oxygen turbopump. Are they caused by flaws in the material, the effect
of the oxygen atmosphere on the properties of the material, the
thermal stresses of startup or shutdown, the vibration and stresses of
steady running, or mainly at some resonance at certain speeds, etc.?
How long can we run from crack initiation to crack failure, and how
does this depend on power level? Using the completed engine as a test
bed to resolve such questions is extremely expensive. One does not
wish to lose an entire engine in order to find out where and how
failure occurs. Yet, an accurate knowledge of this information is
essential to acquire a confidence in the engine reliability in use.
Without detailed understanding, confidence can not be attained.</p>
<p>A further disadvantage of the top-down method is that, if an
understanding of a fault is obtained, a simple fix, such as a new
shape for the turbine housing, may be impossible to implement without
a redesign of the entire engine.</p>
</blockquote>
<h3>Reading more</h3>
<p>Thanks for taking the time to read this post, feel free to <a href="#comments">leave a comment</a>
below!</p>
<ul>
<li><a href="http://aosabook.org/en/sqlalchemy.html">SQLAlchemy</a>, in <em>The Architecture of Open Source Applications</em>, by Mike Bayer.</li>
<li><a href="http://www.paulgraham.com/progbot.html">Programming Bottom-up</a>, by Paul Graham.</li>
<li><a href="http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-commission/Appendix-F.txt">Personal observations on the reliability of the Shuttle</a>, by Richard Feynman (see section titled "Liquid Fuel Engine (SSME)").</li>
<li><a href="http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#django-integration">Django integration for Peewee</a></li>
</ul>Structuring flask apps, a how-to for those coming from Djangohttp://charlesleifer.com/blog/structuring-flask-apps-a-how-to-for-those-coming-from-django/2013-04-27T13:21:28Z2013-04-27T13:21:28Zcharles leifer<p>The other day a friend of mine was trying out <a href="http://flask-peewee.readthedocs.org/en/latest/">flask-peewee</a>
and he had some questions about the best way to structure his app to avoid
triggering circular imports. For someone new to flask, this can be a bit of a
puzzler, especially if you're coming from django which automatically imports
your modules. In this post I'll walk through how I like to structure my
flask apps to avoid circular imports. In my examples I'll be showing how to
use "flask-peewee", but the same technique should be applicable for other flask
plugins.</p>
<p>I'll walk through the modules I commonly use in my apps, then show how to tie
them all together and provide a single entrypoint into your app.</p>
<h3>Project layout</h3>
<p>I use a structure that may look familiar to users of the django framework:</p>
<ul>
<li>admin.py - Where you register models with the site admin interface</li>
<li>api.py - Where you register models to be exposed via a REST-ful API</li>
<li>app.py - Your "Flask" application, configuration, and database.</li>
<li>auth.py - The authentication system used to protect access to the admin.</li>
<li>main.py - <em>this is the secret sauce</em></li>
<li>models.py - Database models for use with your ORM, business logic, etc.</li>
<li>views.py - Views that handle requests.</li>
</ul>
<p>In a little bit I'll get to the reason "main.py" is the secret sauce, for now
though I'll focus on the other bits. Rather than go alphabetically, as I did
in the previous list, I will go through these models in order of their precedence
in the "import chain" we'll be setting up:</p>
<ol>
<li>app.py</li>
<li>models.py</li>
<li>auth.py</li>
<li>admin.py / api.py</li>
<li>views.py</li>
<li>main.py</li>
</ol>
<h3>app.py</h3>
<p>Every flask application needs an "app.py", whether you call it that or not. It's
the place where your <code>Flask</code> instance lives. I like to keep my app.py very
thin, so that it only contains things that are central to the entire project.
I do this because, as you'll see, we will import <code>from app</code> pretty much everywhere.</p>
<p>In addition to my <code>Flask</code> object, I also set up a Database, a cache (if I'm
using one), logging handlers, and any global template filters in <code>app.py</code>.</p>
<div class="highlight"><pre><span></span><code><span class="sd">"""</span>
<span class="sd">I keep app.py very thin.</span>
<span class="sd">"""</span>
<span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span>
<span class="c1"># flask-peewee database, but could be SQLAlchemy instead.</span>
<span class="kn">from</span> <span class="nn">flask_peewee.db</span> <span class="kn">import</span> <span class="n">Database</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
<span class="n">app</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">from_object</span><span class="p">(</span><span class="s1">'config.Configuration'</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">Database</span><span class="p">(</span><span class="n">app</span><span class="p">)</span>
<span class="c1"># Here I would set up the cache, a task queue, etc.</span>
</code></pre></div>
<p>That's it! This may seem odd if you've seen the flask "hello world", which
contains URL routing, view functions, etc. As you'll see, we will be putting
those in a different module.</p>
<h3>models.py</h3>
<p>Now that we've got a Database object, we can create our model classes. This
will introduce the first "import dependency", because we need to import the
database we created in <code>app.py</code>. Here is what a small models file might look
like:</p>
<div class="highlight"><pre><span></span><code><span class="sd">"""</span>
<span class="sd">models imports app, but app does not import models so we haven't created</span>
<span class="sd">any loops.</span>
<span class="sd">"""</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="kn">from</span> <span class="nn">flask_peewee.auth</span> <span class="kn">import</span> <span class="n">BaseUser</span> <span class="c1"># provides password helpers..</span>
<span class="kn">from</span> <span class="nn">peewee</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">db</span>
<span class="k">class</span> <span class="nc">User</span><span class="p">(</span><span class="n">db</span><span class="o">.</span><span class="n">Model</span><span class="p">,</span> <span class="n">BaseUser</span><span class="p">):</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">CharField</span><span class="p">()</span>
<span class="n">password</span> <span class="o">=</span> <span class="n">CharField</span><span class="p">()</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">CharField</span><span class="p">()</span>
<span class="n">join_date</span> <span class="o">=</span> <span class="n">DateTimeField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">)</span>
<span class="n">active</span> <span class="o">=</span> <span class="n">BooleanField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">admin</span> <span class="o">=</span> <span class="n">BooleanField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__unicode__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">username</span>
</code></pre></div>
<h3>auth.py</h3>
<p>After creating a <code>User</code> model, we can set up authentication for the app.
Flask-peewee <code>Auth</code> takes an app, a database and a <code>User</code> model as its
parameters. It provides free login/logout functionality, a way to get the
logged-in user, and a decorator to mark a view as "login required".</p>
<div class="highlight"><pre><span></span><code><span class="sd">"""</span>
<span class="sd">auth imports app and models, but none of those import auth</span>
<span class="sd">so we're OK</span>
<span class="sd">"""</span>
<span class="kn">from</span> <span class="nn">flask_peewee.auth</span> <span class="kn">import</span> <span class="n">Auth</span> <span class="c1"># Login/logout views, etc.</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">app</span><span class="p">,</span> <span class="n">db</span>
<span class="kn">from</span> <span class="nn">models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="n">auth</span> <span class="o">=</span> <span class="n">Auth</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">db</span><span class="p">,</span> <span class="n">user_model</span><span class="o">=</span><span class="n">User</span><span class="p">)</span>
</code></pre></div>
<h3>admin.py / api.py</h3>
<p>Most of the sites I use will have an "Admin area", where dynamic content can be
created, edited and deleted. Less often I might add a REST-ful API to expose
models, but for completeness I'll show how they both work since they're pretty
similar.</p>
<p>Here is <code>admin.py</code>:</p>
<div class="highlight"><pre><span></span><code><span class="sd">"""</span>
<span class="sd">admin imports app, auth and models, but none of these import admin</span>
<span class="sd">so we're OK</span>
<span class="sd">"""</span>
<span class="kn">from</span> <span class="nn">flask_peewee.admin</span> <span class="kn">import</span> <span class="n">Admin</span><span class="p">,</span> <span class="n">ModelAdmin</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">app</span><span class="p">,</span> <span class="n">db</span>
<span class="kn">from</span> <span class="nn">auth</span> <span class="kn">import</span> <span class="n">auth</span>
<span class="kn">from</span> <span class="nn">models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="n">admin</span> <span class="o">=</span> <span class="n">Admin</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">auth</span><span class="p">)</span>
<span class="n">auth</span><span class="o">.</span><span class="n">register_admin</span><span class="p">(</span><span class="n">admin</span><span class="p">)</span>
<span class="c1"># or you could admin.register(User, ModelAdmin) -- you would also register</span>
<span class="c1"># any other models here.</span>
</code></pre></div>
<p>And here is <code>api.py</code>:</p>
<div class="highlight"><pre><span></span><code><span class="sd">"""</span>
<span class="sd">api imports app, auth and models, but none of these import api.</span>
<span class="sd">"""</span>
<span class="kn">from</span> <span class="nn">flask_peewee.rest</span> <span class="kn">import</span> <span class="n">RestAPI</span><span class="p">,</span> <span class="n">RestResource</span><span class="p">,</span> <span class="n">UserAuthentication</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">app</span>
<span class="kn">from</span> <span class="nn">auth</span> <span class="kn">import</span> <span class="n">auth</span>
<span class="kn">from</span> <span class="nn">models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="n">user_auth</span> <span class="o">=</span> <span class="n">UserAuthentication</span><span class="p">(</span><span class="n">auth</span><span class="p">)</span>
<span class="c1"># instantiate our api wrapper and tell it to use HTTP basic auth using</span>
<span class="c1"># the same credentials as our auth system. If you prefer this could</span>
<span class="c1"># instead be a key-based auth, or god forbid some open auth protocol.</span>
<span class="n">api</span> <span class="o">=</span> <span class="n">RestAPI</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">default_auth</span><span class="o">=</span><span class="n">user_auth</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">UserResource</span><span class="p">(</span><span class="n">RestResource</span><span class="p">):</span>
<span class="n">exclude</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'password'</span><span class="p">,</span> <span class="s1">'email'</span><span class="p">,)</span>
<span class="c1"># register our models so they are exposed via /api/<model>/</span>
<span class="n">api</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">,</span> <span class="n">UserResource</span><span class="p">,</span> <span class="n">auth</span><span class="o">=</span><span class="n">user_auth</span><span class="p">)</span>
</code></pre></div>
<h3>views.py</h3>
<p>Lastly, the views. The views are responsible for mapping urls to functions,
and so they generally need to reference the app, authentication and models.</p>
<div class="highlight"><pre><span></span><code><span class="sd">"""</span>
<span class="sd">views imports app, auth, and models, but none of these import views</span>
<span class="sd">"""</span>
<span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">render_template</span> <span class="c1"># ...etc , redirect, request, url_for</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">app</span>
<span class="kn">from</span> <span class="nn">auth</span> <span class="kn">import</span> <span class="n">auth</span>
<span class="kn">from</span> <span class="nn">models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="nd">@app</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">homepage</span><span class="p">():</span>
<span class="k">return</span> <span class="n">render_template</span><span class="p">(</span><span class="s1">'foo.html'</span><span class="p">)</span>
<span class="nd">@app</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/private/'</span><span class="p">)</span>
<span class="nd">@auth</span><span class="o">.</span><span class="n">login_required</span>
<span class="k">def</span> <span class="nf">private_view</span><span class="p">():</span>
<span class="c1"># ...</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">auth</span><span class="o">.</span><span class="n">get_logged_in_user</span><span class="p">()</span>
<span class="k">return</span> <span class="n">render_tempate</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
</code></pre></div>
<h2>Tying it all together with "main.py"</h2>
<p>These modules are all fairly self-contained, but we have no mechanism to ensure
that all are imported when we run our application. We need to import all the
modules, though, to capture all the great module-level side-effects. For
that purpose I tie everything together with a module named "main.py".</p>
<div class="highlight"><pre><span></span><code><span class="sd">"""</span>
<span class="sd">this is the "secret sauce" -- a single entry-point that resolves the</span>
<span class="sd">import dependencies. If you're using blueprints, you can import your</span>
<span class="sd">blueprints here too.</span>
<span class="sd">then when you want to run your app, you point to main.py or `main.app`</span>
<span class="sd">"""</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">app</span><span class="p">,</span> <span class="n">db</span>
<span class="kn">from</span> <span class="nn">auth</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">admin</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">api</span> <span class="kn">import</span> <span class="n">api</span>
<span class="kn">from</span> <span class="nn">models</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">views</span> <span class="kn">import</span> <span class="o">*</span>
<span class="n">admin</span><span class="o">.</span><span class="n">setup</span><span class="p">()</span>
<span class="n">api</span><span class="o">.</span><span class="n">setup</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">create_tables</span><span class="p">():</span>
<span class="c1"># Create table for each model if it does not exist.</span>
<span class="c1"># Use the underlying peewee database object instead of the</span>
<span class="c1"># flask-peewee database wrapper:</span>
<span class="n">db</span><span class="o">.</span><span class="n">database</span><span class="o">.</span><span class="n">create_tables</span><span class="p">([</span><span class="n">User</span><span class="p">],</span> <span class="n">safe</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">create_tables</span><span class="p">()</span>
<span class="n">app</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</code></pre></div>
<p><code>main.py</code> should be treated as the entry-point into your application from
here on out. If you are running a WSGI server, you would therefore want to
point it at <code>main.app</code> as opposed to <code>app.app</code>, if that makes sense.</p>
<h3>Thanks</h3>
<p>Thanks for reading this post, I hope you found it useful. Please feel free to
leave any <a href="#comments">questions or comments</a>.</p>
<p>For a more complete example, check out <a href="https://github.com/coleifer/flask-peewee/tree/master/example">the flask-peewee example project</a> -- it's
a very small twitter clone.</p>
<p>I also wrote a series of posts about building a note-taking app with Flask. Here are links to the posts:</p>
<ul>
<li><a href="/blog/saturday-morning-hack-a-little-note-taking-app-with-flask/">Part 1: Building a little note-taking app with Flask and Peewee</a></li>
<li><a href="/blog/saturday-morning-hacks-revisiting-the-notes-app/">Part 2: Revisiting the note-taking app</a> (adding todo lists, reminders, search, and a REST API)</li>
<li><a href="/blog/saturday-morning-hacks-adding-full-text-search-to-the-flask-note-taking-app/">Part 3: Adding full-text search using SQLite's search engine extension</a></li>
</ul>
<p>Or simply look at <a href="/blog/tags/saturday-morning-hacks/">all of the saturday-morning hack posts</a>.</p>Shortcomings in the Django ORM and a look at Peewee, a lightweight alternativehttp://charlesleifer.com/blog/shortcomings-in-the-django-orm-and-a-look-at-peewee-a-lightweight-alternative/2012-12-15T14:27:47Z2012-12-15T14:27:47Zcharles leifer<p>In this post I'd like to talk about some of the shortcomings of the Django ORM,
the ways <a href="http://docs.peewee-orm.com">peewee</a> approaches things differently,
and how this resulted in peewee having an API that is both more consistent <em>and</em> more expressive.</p>
<p><a href="http://alexgaynor.net/">Alex Gaynor</a>, one of the more outspoken core developers on the Django project, gave a great talk at ChiPY titled <a href="http://www.youtube.com/watch?v=GxL9MnWlCwo">"Why I Hate the Django ORM"</a> (<a href="https://speakerdeck.com/alex/why-i-hate-the-django-orm">slides</a>). I think he did a great job identifying what I agree are the two biggest issues with Django's ORM:</p>
<ul>
<li>inconsistent API</li>
<li>lack of composability</li>
</ul>
<h2>The Django ORM has an inconsistent API</h2>
<p>Django <a href="https://docs.djangoproject.com/en/dev/misc/design-philosophies/#consistency">wants to expose consistent APIs</a></p>
<blockquote>
<p>The framework should be consistent at all levels. Consistency applies to everything from low-level (the Python coding style used) to high-level (the “experience” of using Django).</p>
</blockquote>
<p>On the whole I think Django does a good job with this - the big exception being the ORM. Alex gives 4 examples, which should be familiar to Django developers and which are indicative of the
underlying issues:</p>
<ul>
<li><code>filter(field_name=value)</code></li>
<li><code>Q(field_name=value)</code></li>
<li><code>F('field_name')</code></li>
<li><code>Aggregate('field_name')</code></li>
</ul>
<p><strong>filter()</strong> is the most basic method and is the most common. It is used to express a
SQL <strong>Where</strong> clause and takes as parameters a list of keyword arguments mapping
field names to values, <em>and</em> it can take a list of <strong>Q</strong> objects. <strong>Q</strong> objects
also take a keyword argument and are used to allow "combining" one or more expressions
into a query tree (this is how you express a logical "OR").</p>
<p>This is the first inconsistency -- there are two methods of expressing the same
query, one is just more "specialized":</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">sq1</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">Blog</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="s1">'charlie'</span><span class="p">)</span><span class="o">.</span><span class="n">query</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">sq2</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">Blog</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">Q</span><span class="p">(</span><span class="n">author</span><span class="o">=</span><span class="s1">'charlie'</span><span class="p">))</span><span class="o">.</span><span class="n">query</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">sq1</span> <span class="o">==</span> <span class="n">sq2</span>
<span class="go">True</span>
</code></pre></div>
<p>What happens if you want to reference the value of another column in your call
to filter? Maybe there is an Employee model that stores their actual salary
and their desired salary...for this you use the <strong>F</strong> class:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Employee</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">desired_salary__lt</span><span class="o">=</span><span class="n">F</span><span class="p">(</span><span class="s1">'current_salary'</span><span class="p">))</span>
</code></pre></div>
<p>Here we see two ways of identifying fields -- as keyword arguments passed to filter and Q,
and as strings when passed to F.</p>
<p>Another common operation is to take the aggregate of the values in a particular
common, like the SUM or COUNT, or GROUP BY a particular column. There are
a <a href="http://stackoverflow.com/search?q=django+group+by+count&submit=search">ton of questions</a>
on stackoverflow asking over how to do this. The answer involves using three new
APIs:</p>
<ul>
<li><code>values()</code> - takes a list of string field names</li>
<li><code>annotate()</code> - takes an "aggregate" function</li>
<li><code>Count()</code>, <code>Sum()</code>, etc - take a string field name</li>
</ul>
<p>The fact that all these specialized functions are needed to express a fairly common query, and the fact that they all require a specialized API, is a sign of a lurking design problem. For anything more than a simple <strong>Where</strong> clause, Django quickly bogs down in its
own APIs.</p>
<h2>Lack of composability</h2>
<p>Django also <a href="https://docs.djangoproject.com/en/dev/misc/design-philosophies/#terse-powerful-syntax">wants to expose a powerful, expressive querying API</a></p>
<blockquote>
<p>The database API should allow rich, expressive statements in as little syntax as possible. It should not rely on importing other modules or helper objects.</p>
</blockquote>
<p>Django falls very short on the first part. Rich, expressive statements are simply
not possible in the ORM <em>unless a special API was designed for it</em> (like Q and F
objects). To create a rich API, it is necessary to allow composability -- small
pieces can be composed to create larger, more complex pieces.</p>
<p>Alex gives a good example:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="n">event_type</span><span class="p">,</span><span class="w"> </span><span class="k">SUM</span><span class="p">(</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">total_time</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="n">event</span><span class="w"></span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">event_type</span><span class="w"></span>
</code></pre></div>
<p>Which he expresses using Django:</p>
<div class="highlight"><pre><span></span><code><span class="n">Event</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'event_type'</span><span class="p">)</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span>
<span class="n">total_time</span><span class="o">=</span><span class="n">Sum</span><span class="p">(</span><span class="n">F</span><span class="p">(</span><span class="s1">'end_time'</span><span class="p">)</span> <span class="o">-</span> <span class="n">F</span><span class="p">(</span><span class="s1">'start_time'</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div>
<p>This actually does not work, Django chokes because the <strong>Sum</strong> function does not
know how to handle an expression. Essentially anything beyond the very straightforward
use-cases the existing APIs were <em>designed for</em> will probably not work or will have
been addressed by some special-case logic.</p>
<p>Django's ORM falls short on the second part of the design goal as well, requiring users to import
special functions for aggregation and use odd one-letter classes ("Q" and "F")
to express certain types of queries.</p>
<h2>Learning from Django</h2>
<p>When I first wrote peewee I based a lot of my APIs on those I was familiar with.
Coming from Django this meant "kwargs"-style querying and double-underscore lookups.
As peewee grew and I added features, I mimicked django and added APIs for expressing
logical "OR" (Q objects) and column-to-column comparison (F objects). Before long
the code was a mess and people were submitting issues when they tried to express
a query I <strong>hadn't planned for</strong>. I think that this is a parallel to how the Django ORM
has grown up.</p>
<p>I decided to rewrite. It was the best decision I could have made and I learned a ton in the process. I don't have near the number of users, so this was an acceptable path for my project. Peewee is also <em>only</em> an ORM, so the argument for making it a <em>better</em> ORM outweighs some other concerns.</p>
<p>While I rewrote I decided to focus entirely on the mechanics of expressing rich
queries. To do this I took a look at all the atoms that comprise a SQL query,
how they interact, and how they are composed. I identified a few things:</p>
<ul>
<li>Clauses</li>
<li>Columns</li>
<li>Scalars and Parameters (e.g. <code>LIMIT 100</code> or <code>foo = "Bar"</code>)</li>
<li>Functions</li>
</ul>
<p>Clauses are things like <strong>SELECT</strong> and <strong>FROM</strong> which denote particular parts of
the query -- these I decided to expose as methods on a class. Columns, or field
instances, would be exposed as class attributes. Scalars and parameters were easy, just represent them using python's various types. Finally functions, since they
are so diverse and accept varying numbers of parameters, would be exposed dynamically.</p>
<p>This allows us to write the above "events" query:</p>
<div class="highlight"><pre><span></span><code><span class="n">Event</span><span class="o">.</span><span class="n">select</span><span class="p">(</span>
<span class="n">Event</span><span class="o">.</span><span class="n">event_type</span><span class="p">,</span>
<span class="n">fn</span><span class="o">.</span><span class="n">Sum</span><span class="p">(</span><span class="n">Event</span><span class="o">.</span><span class="n">end_time</span> <span class="o">-</span> <span class="n">Event</span><span class="o">.</span><span class="n">start_time</span><span class="p">)</span><span class="o">.</span><span class="n">alias</span><span class="p">(</span><span class="s1">'total_time'</span><span class="p">)</span>
<span class="p">)</span><span class="o">.</span><span class="n">group_by</span><span class="p">(</span><span class="n">Event</span><span class="o">.</span><span class="n">event_type</span><span class="p">)</span>
</code></pre></div>
<p>The clauses accept as their arguments one or more columns, scalars, functions, or any
combination thereof. These components can be combined using logical OR and
AND to create query trees. They also support common operations like addition and
subtraction, allowing you to express atomic updates and things like the above
example where we subtract one column from another.</p>
<h2>Peewee is more consistent <em>and</em> expressive</h2>
<p>Peewee shed about a third of its code-base during the rewrite, going from 2400 SLOC
to around 1600! It also became much more expressive -- more than once I have
written a fairly complex query and been pleasently surprised to see that it
<strong>just works</strong>. Rather than losing functionality, I have gained flexibility which
in turn produces functionality.</p>
<h2>"Fixing" Django</h2>
<p>I think that Django would benefit from a similar rewrite. The ORM is one of the
most complicated parts of Django and sees some pretty crazy bugs. I'm sure I'm
not the only person who thinks this should happen...the problem is:</p>
<p><em>How can the ORM change without breaking backwards compatibility?</em></p>
<p>From the beginning Django has been committed to backwards compatibility. This
may be one of the biggest contributors to Django's adoption.</p>
<p>I would suggest building out a new API that is similar to peewee's. Since
the existing APIs are a subset of the functionality possible in peewee, the existing
APIs could be rewritten to use the new APIs and marked for deprecation. When rewriting
peewee, I included a backwards-compatible method to allow the "django-style"
double-underscore querying that had been possible in the older version.</p>
<h2>What do you think?</h2>
<p>What is your take on the Django ORM? If you use something like SQLAlchemy or peewee,
how do you feel that it compares to Django's ORM? Are there things Django can do that Peewee or SQLAlchemy <em>cannot</em>? One example is that, while peewee supports "<a href="https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related">prefetch_related</a>", it is not quite as powerful as Django's implementation (Full disclosure, I think this feature is kind of gross).</p>
<p>Please feel free to <a href="#comments">leave a comment</a> below. I'd also invite
you to check out <a href="http://docs.peewee-orm.com/">the peewee documentation</a> if you'd
like to see more examples!</p>
<h2>Links</h2>
<ul>
<li><a href="http://www.youtube.com/watch?v=GxL9MnWlCwo">"Why I hate the Django ORM"</a>, talk by Alex Gaynor at ChiPY and <a href="https://speakerdeck.com/alex/why-i-hate-the-django-orm">slides</a></li>
<li><a href="https://docs.djangoproject.com/en/dev/misc/design-philosophies/">Django's design philosophy</a></li>
<li><a href="http://docs.peewee-orm.com/en/latest/peewee/upgrading.html#upgrading">Notes from the peewee rewrite</a></li>
<li><a href="https://github.com/coleifer/peewee">Peewee project page</a></li>
</ul>
<p><strong>Update</strong></p>
<p>Discussions on <a href="http://www.reddit.com/r/programming/comments/14x0pw/shortcomings_in_the_django_orm_and_a_look_at/">reddit</a> and <a href="http://news.ycombinator.com/item?id=4926627">hackernews</a></p>Working around Django's ORM to do interesting things with GFKshttp://charlesleifer.com/blog/working-around-django-s-orm-to-do-interesting-things-with-gfks/2012-05-03T00:05:01Z2012-05-03T00:05:01Zcharles leifer<p>In this post I want to discuss how to work around some of the shortcomings of
djangos ORM when dealing with Generic Foreign Keys (GFKs).</p>
<p>At the end of the post I'll show how to work around django's lack of correctly
CAST-ing when the generic foreign key is of a different column type than the objects
it may point to.</p>
<h2>A quick primer on content-types and GFKs</h2>
<p><em>If the Generic Foreign Key did not exist, it would be necessary to invent it</em></p>
<p>Thanks to the content-types framework, part of <code>django.contrib</code>, we do not have to
do any inventing, however. The content-types framework is an app that is responsible
for mapping python models to the database layer -- a dirty job, but it makes a number
of other things easier to implement. Content-types are used to provide granular
permissions via the auth/permissions framework and, notably, they have been used to
implement GFKs.</p>
<p>A GFK is simply a foreign key to a content-type and an additional column to store a related primary key. It is not really a foreign key at all in the sense of it being an actual database
constraint. Nor is it a foreign key in the same sense as django's <code>db.models.ForeignKey</code>,
because the ORM barely supports querying against them and offers no "on_delete"
support (which often results in orphaned rows or messy signal handlers).</p>
<p>You cannot perform annotations or aggregations using the new querying
APIs (<a href="https://github.com/coleifer/django-generic-aggregation">actually, you can with a little work</a>), and
to traverse a reverse generic relation (akin to the <code>related_name</code> attribute of
a <code>ForeignKey</code>), you need to add a custom manager to your model -- not very DRY.</p>
<p>Here are some tidbits from the <a href="https://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/">django docs</a></p>
<ul>
<li><em>Due to the way GenericForeignKey is implemented, you cannot use such fields directly with filters (filter() and exclude(), for example) via the database API.</em></li>
<li><em>Django's database aggregation API doesn't work with a GenericRelation... For now, if you need aggregates on generic relations, you'll need to calculate them without using the aggregation API.</em></li>
<li><em>Unlike ForeignKey, GenericForeignKey does not accept an on_delete argument to customize this behavior</em></li>
</ul>
<h3>Why use GFKs at all?</h3>
<p>Whether you like it or not, the GFK can be a useful tool. It is often purported
to make apps "reusable" -- by not specifying a database-level constraint, you leave
implementers some freedom in choosing their modelling. Consider the <code>django.contrib.comments</code>
app -- it uses a GFK so that the same comments app can be used on any model, even
on itself. The comments are all stored in a single database table and are all
accessible using the same API and via the same admin interface. GFKs have also
been used for some <a href="http://charlesleifer.com/blog/connecting-anything-to-anything-with-django/">silly hacks</a>.</p>
<h2>Getting to the interesting stuff</h2>
<p>Coming around to the point of this post, I think we can all agree that GFKs are
at best a gross hack that makes some tasks easier. Yet as gross as they are,
they are still implemented at the database level, so even though the "stock" ORM
does not offer many niceties in the way of dealing with GFKs we can easily use some
of django's lesser-known features to build some nice APIs on top. The rest of the
post will be in a "problem/solution" format. I tested the example code against
django 1.2, 1.3.1 and current master (ddfc7c253019).</p>
<p>For reference, I'll pretend we're talking about a generic rating model that looks
something like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Rating</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">rating</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">()</span>
<span class="n">object_id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">()</span>
<span class="n">content_type</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">ContentType</span><span class="p">)</span>
<span class="n">content_object</span> <span class="o">=</span> <span class="n">GenericForeignKey</span><span class="p">(</span><span class="n">ct_field</span><span class="o">=</span><span class="s1">'content_type'</span><span class="p">,</span> <span class="n">fk_field</span><span class="o">=</span><span class="s1">'object_id'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Food</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>
<span class="n">ratings</span> <span class="o">=</span> <span class="n">generic</span><span class="o">.</span><span class="n">GenericRelation</span><span class="p">(</span><span class="n">Rating</span><span class="p">)</span> <span class="c1"># reverse generic relation</span>
</code></pre></div>
<h2>Filtering ratings to a particular subset of rated items</h2>
<p>This is not as easy as it sounds at first. Let's say we want to get ratings on
foods that start with the letter "A". You might try the following:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Rating</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">content_object__name__startswith</span><span class="o">=</span><span class="s2">"a"</span><span class="p">)</span>
</code></pre></div>
<p>The problem is, maybe not all "content objects" have a "name" column, so how should
django know that you mean just foods? There is no API you can use to "hint" at this,
though you could try:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Rating</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">content_type</span><span class="o">=</span><span class="n">food_ctype</span><span class="p">,</span> <span class="n">content_object__name__startswith</span><span class="o">=</span><span class="s2">"a"</span><span class="p">)</span>
</code></pre></div>
<p>You'll still get the field error, of course.</p>
<p>The more experienced django programmer's reflex might be to use a subquery:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">a_foods</span> <span class="o">=</span> <span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">name__startswith</span><span class="o">=</span><span class="s1">'a'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">Rating</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">content_object__in</span><span class="o">=</span><span class="n">a_foods</span><span class="p">)</span>
</code></pre></div>
<p>And eventually end up at:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Rating</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="go"> content_type=food_ctype,</span>
<span class="go"> object_id__in=a_foods.values('id')</span>
<span class="go"> )</span>
</code></pre></div>
<p>Yes, this will work, provided Rating.object_id is of the same type as Food.id (both are
integers, so we're good). Note that we are no longer querying against the
fake <code>content_object</code> field. The SQL can end up looking pretty funky though,
since that subquery is not converted into a database-level <code>JOIN</code>. The generated
code looks something like:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"rating"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"id"</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"> </span><span class="n">U0</span><span class="w"></span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"name"</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'a%'</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="w"></span>
<span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>I populated the database with 30,000 foods, 10,000 for each letter "a", "b", "c", and a
rating for each food.</p>
<p>The fact is, though, that we <em>can</em> express this using a JOIN:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"rating"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="k">INNER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"></span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'a%'</span><span class="w"></span>
</code></pre></div>
<p>Somewhat to my surprise, the subquery can actually be more performant when using postgresql's
query planner, which can optimize the subquery to a <a href="http://en.wikipedia.org/wiki/Hash_join#Hash_semi-join">hash semi join</a>.
Conventional wisdom states that joins are better than subqueries, because the query
planner can make optimizations if it knows what it is joining on, yet it seems at least
with postgres, that either way can yield good results.</p>
<h4>Subquery</h4>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="nf">Hash</span><span class="w"> </span><span class="n">Semi</span><span class="w"> </span><span class="nf">Join</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">662.00</span><span class="p">.</span><span class="mf">.1525.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">13.339</span><span class="p">.</span><span class="mf">.25.814</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nf">Hash</span><span class="w"> </span><span class="n">Cond</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">app_rating</span><span class="p">.</span><span class="n">object_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">u0</span><span class="p">.</span><span class="n">id</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="kr">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_rating</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.538.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">0.017</span><span class="p">.</span><span class="mf">.4.416</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Filter</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">content_type_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nf">Hash</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">537.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">13.303</span><span class="p">.</span><span class="mf">.13.303</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Buckets</span><span class="o">:</span><span class="w"> </span><span class="mi">1024</span><span class="w"> </span><span class="n">Batches</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">Memory</span><span class="w"> </span><span class="n">Usage</span><span class="o">:</span><span class="w"> </span><span class="mi">352</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="kr">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_food</span><span class="w"> </span><span class="n">u0</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">0.011</span><span class="p">.</span><span class="mf">.9.652</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Filter</span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">name</span><span class="p">)</span><span class="o">::</span><span class="n">text</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="s">'a%'</span><span class="o">::</span><span class="n">text</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Total</span><span class="w"> </span><span class="n">runtime</span><span class="o">:</span><span class="w"> </span><span class="mf">26.216</span><span class="w"> </span><span class="n">ms</span><span class="w"></span>
<span class="p">(</span><span class="mi">9</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<h4>Join</h4>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="nf">Hash</span><span class="w"> </span><span class="nf">Join</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">662.00</span><span class="p">.</span><span class="mf">.1750.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">14.001</span><span class="p">.</span><span class="mf">.30.933</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nf">Hash</span><span class="w"> </span><span class="n">Cond</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">app_rating</span><span class="p">.</span><span class="n">object_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">app_food</span><span class="p">.</span><span class="n">id</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="kr">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_rating</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.538.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">0.016</span><span class="p">.</span><span class="mf">.5.490</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Filter</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">content_type_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nf">Hash</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">537.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">13.964</span><span class="p">.</span><span class="mf">.13.964</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Buckets</span><span class="o">:</span><span class="w"> </span><span class="mi">1024</span><span class="w"> </span><span class="n">Batches</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">Memory</span><span class="w"> </span><span class="n">Usage</span><span class="o">:</span><span class="w"> </span><span class="mi">352</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="kr">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_food</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="n">time</span><span class="o">=</span><span class="mf">0.011</span><span class="p">.</span><span class="mf">.10.382</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mi">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Filter</span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">name</span><span class="p">)</span><span class="o">::</span><span class="n">text</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="s">'a%'</span><span class="o">::</span><span class="n">text</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Total</span><span class="w"> </span><span class="n">runtime</span><span class="o">:</span><span class="w"> </span><span class="mf">31.576</span><span class="w"> </span><span class="n">ms</span><span class="w"></span>
<span class="p">(</span><span class="mi">9</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>The problem with examples like these is that they are taken as a rule of thumb --
there may be instances when the subquery is a fine option, or there may be instances
where it will need to be expressed as a JOIN (particularly if using a RDBMS that
does a lousy job with subqueries). At any rate, I hope this shows that you can
approach this type of query either way -- via the ORM w/a subquery or via SQL using
a JOIN.</p>
<h2>Annotating records, or getting the highest-rated items</h2>
<p>Suppose you want to list all the foods ordered by rating. Using django's <code>annotate()</code>
method, it should be easy:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">avg_score</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'ratings__rating'</span><span class="p">))</span> <span class="c1"># does not work correctly</span>
</code></pre></div>
<p>The generated query ends up looking like:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="p">,</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"rating"</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"avg_score"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"></span>
<span class="k">LEFT</span><span class="w"> </span><span class="k">OUTER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="p">)</span><span class="w"></span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="w"></span>
</code></pre></div>
<p>The missing piece is the additional filter on the content-type -- the JOIN is correct,
but if there are any ratings on <em>other</em> content-types that have object_ids that collide
with food ids, the results will be off.</p>
<p>What happens if we try to get clever and query on the content type explicitly? The
generated SQL is a bit odd, but the results will be correct:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">ctype</span> <span class="o">=</span> <span class="n">ContentType</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_for_model</span><span class="p">(</span><span class="n">Food</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">foods</span> <span class="o">=</span> <span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">ratings__content_type</span><span class="o">=</span><span class="n">ctype</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">foods</span> <span class="o">=</span> <span class="n">foods</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">avg_score</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'ratings__rating'</span><span class="p">))</span>
</code></pre></div>
<p>This generates the following query, which is correct though it contains a duplicate
where clause:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="p">,</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"rating"</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"avg_score"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"></span>
<span class="k">LEFT</span><span class="w"> </span><span class="k">OUTER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="p">)</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="w"></span>
<span class="p">)</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="w"></span>
</code></pre></div>
<p>As an alternative, you can express this as a subquery using django's <a href="https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.extra">extra() method</a>
on queryset. The implementation is fairly straightforward if a little unconventional
since it requires writing some SQL. The subquery will be a little different this time
in that it will reference the row from the outer query. Here is how you do this
in django:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">extra_select</span> <span class="o">=</span> <span class="s2">"""</span>
<span class="go"> SELECT Avg("rating") AS aggregate_score</span>
<span class="go"> FROM "app_rating"</span>
<span class="go"> WHERE</span>
<span class="go"> "content_type_id"=%s AND</span>
<span class="go"> "object_id"="app_food"."id"</span>
<span class="go">"""</span>
<span class="gp">>>> </span><span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span><span class="o">.</span><span class="n">extra</span><span class="p">(</span>
<span class="go"> select={'avg_score': extra_select},</span>
<span class="go"> select_params=[ctype.id],</span>
<span class="go">)</span>
</code></pre></div>
<p>Taking a look at the entire query, we can see how django ties in the extra select
query, as well as how our reference to "app_food"."id" maps to the outer query:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="k">Avg</span><span class="p">(</span><span class="ss">"rating"</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">aggregate_score</span><span class="w"></span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"></span>
<span class="w"> </span><span class="ss">"content_type_id"</span><span class="o">=</span><span class="mi">7</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="ss">"object_id"</span><span class="o">=</span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"></span>
<span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"avg_score"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"></span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="ss">"avg_score"</span><span class="w"> </span><span class="k">DESC</span><span class="w"></span>
</code></pre></div>
<p>In this case, the JOIN performed much better than the subquery. Here are the two
query plans:</p>
<h4>Join</h4>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">8595.80</span><span class="p">.</span><span class="mf">.8670.80</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">13</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">135.585</span><span class="p">.</span><span class="mf">.140.832</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="nl">Key:</span><span class="w"> </span><span class="p">(</span><span class="n">avg</span><span class="p">(</span><span class="n">app_rating</span><span class="p">.</span><span class="n">rating</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="nl">Method:</span><span class="w"> </span><span class="n">external</span><span class="w"> </span><span class="n">merge</span><span class="w"> </span><span class="nl">Disk:</span><span class="w"> </span><span class="mh">728</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">GroupAggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">5175.40</span><span class="p">.</span><span class="mf">.5850.40</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">13</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">58.430</span><span class="p">.</span><span class="mf">.85.801</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">5175.40</span><span class="p">.</span><span class="mf">.5250.40</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">13</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">58.413</span><span class="p">.</span><span class="mf">.63.321</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="nl">Key:</span><span class="w"> </span><span class="n">app_food</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">app_food</span><span class="p">.</span><span class="n">name</span><span class="w"></span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="nl">Method:</span><span class="w"> </span><span class="n">external</span><span class="w"> </span><span class="n">merge</span><span class="w"> </span><span class="nl">Disk:</span><span class="w"> </span><span class="mh">760</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="n">Join</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">984.00</span><span class="p">.</span><span class="mf">.2430.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">13</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">14.967</span><span class="p">.</span><span class="mf">.38.724</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="nl">Cond:</span><span class="w"> </span><span class="p">(</span><span class="n">app_rating</span><span class="p">.</span><span class="n">object_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">app_food</span><span class="p">.</span><span class="n">id</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_rating</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.538.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">8</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">0.019</span><span class="p">.</span><span class="mf">.5.853</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Filter:</span><span class="w"> </span><span class="p">(</span><span class="n">content_type_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">7</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">462.00</span><span class="p">.</span><span class="mf">.462.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">9</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">14.922</span><span class="p">.</span><span class="mf">.14.922</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Buckets:</span><span class="w"> </span><span class="mh">4096</span><span class="w"> </span><span class="nl">Batches:</span><span class="w"> </span><span class="mh">2</span><span class="w"> </span><span class="n">Memory</span><span class="w"> </span><span class="nl">Usage:</span><span class="w"> </span><span class="mh">615</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_food</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.462.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">9</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">0.011</span><span class="p">.</span><span class="mf">.5.447</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Total</span><span class="w"> </span><span class="nl">runtime:</span><span class="w"> </span><span class="mf">142.461</span><span class="w"> </span><span class="n">ms</span><span class="w"></span>
<span class="p">(</span><span class="mh">15</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<h4>Subquery</h4>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">18393657.40</span><span class="p">.</span><span class="mf">.18393732.40</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">9</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">108387.761</span><span class="p">.</span><span class="mf">.108395.111</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="nl">Key:</span><span class="w"> </span><span class="p">((</span><span class="n">SubPlan</span><span class="w"> </span><span class="mh">1</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="nl">Method:</span><span class="w"> </span><span class="n">external</span><span class="w"> </span><span class="n">merge</span><span class="w"> </span><span class="nl">Disk:</span><span class="w"> </span><span class="mh">816</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_food</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.18390912.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">9</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">9.570</span><span class="p">.</span><span class="mf">.108281.379</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">SubPlan</span><span class="w"> </span><span class="mh">1</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Aggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">613.01</span><span class="p">.</span><span class="mf">.613.02</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">3.607</span><span class="p">.</span><span class="mf">.3.607</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">30000</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_rating</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.613.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">1.804</span><span class="p">.</span><span class="mf">.3.602</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">30000</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Filter:</span><span class="w"> </span><span class="p">((</span><span class="n">content_type_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">7</span><span class="p">)</span><span class="w"> </span><span class="n">AND</span><span class="w"> </span><span class="p">(</span><span class="n">object_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">app_food</span><span class="p">.</span><span class="n">id</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="n">Total</span><span class="w"> </span><span class="nl">runtime:</span><span class="w"> </span><span class="mf">108396.599</span><span class="w"> </span><span class="n">ms</span><span class="w"></span>
<span class="p">(</span><span class="mh">9</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>Holy shit, that subquery is killing us -- about 750x slower! Granted I don't have my local
postgres tuned to do large sorts in memory, but still things are clearly not as efficient.
This is yet another example, your mileage may vary depending on the size or your datasets,
presence of indexes, query plans generated, and amount of working memory available.
Or, in other words, check up on your query plans -- you may be surprised by the results.</p>
<h2>Aggregating records to generate a total score</h2>
<p>The last item I'll take a look at is aggregating records. This is somewhat the
inverse of annotation -- rather than generating a score for every record based on
some condition, I'm generating an aggregate across <em>all</em> records, expressed as a single scalar value. What if we
wanted to know the average rating across all foods starting with "a"? Based on
what we've seen so far, we could probably guess that this approach may not work
exactly right:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">name__startswith</span><span class="o">=</span><span class="s1">'a'</span><span class="p">)</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">avg_score</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'ratings__rating'</span><span class="p">))</span>
</code></pre></div>
<p>And sure enough, looking at the generated SQL the query is again missing the filter
on content-type:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"rating"</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"avg_score"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"></span>
<span class="k">LEFT</span><span class="w"> </span><span class="k">OUTER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="p">)</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'a%'</span><span class="w"></span>
</code></pre></div>
<p>So we'll do the same thing as above for annotate, which is to explicitly query
the content_type exposed by the GenericRelation:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">ctype</span> <span class="o">=</span> <span class="n">ContentType</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_for_model</span><span class="p">(</span><span class="n">Food</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">avg</span> <span class="o">=</span> <span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="go"> name__startswith='a',</span>
<span class="go"> ratings__content_type=ctype,</span>
<span class="go">).aggregate(avg_score=Avg('ratings__rating'))</span>
</code></pre></div>
<p>Looking at the generated SQL the filter on content-type is present (although duplicated):</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"rating"</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"avg_score"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"></span>
<span class="k">LEFT</span><span class="w"> </span><span class="k">OUTER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="p">)</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_food"</span><span class="p">.</span><span class="ss">"name"</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'a%'</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_rating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="w"></span>
<span class="w"> </span><span class="p">))</span><span class="w"></span>
</code></pre></div>
<p>Again, we can express this as a subquery instead of a JOIN. The inner query is
fairly straightforward though not easily expressed by the django ORM. What we
end up with, though, would look something like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="k">Avg</span><span class="p">(</span><span class="ss">"rating"</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">aggregate_score</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_rating"</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"></span>
<span class="w"> </span><span class="ss">"content_type_id"</span><span class="o">=</span><span class="mi">7</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="ss">"object_id"</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"id"</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"> </span><span class="n">U0</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"name"</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="n">E</span><span class="s1">'a%'</span><span class="w"></span>
<span class="w"> </span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>How do these different methods compare? They are actually nearly identical. The
only difference is that the use of the JOIN causes the query planner to use a
Hash Join, which is a bit costlier than the Hash Semi Join used by the subquery.</p>
<h4>Join</h4>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">Aggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">1775.00</span><span class="p">.</span><span class="mf">.1775.01</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">34.389</span><span class="p">.</span><span class="mf">.34.389</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="n">Join</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">662.00</span><span class="p">.</span><span class="mf">.1750.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">13.804</span><span class="p">.</span><span class="mf">.32.909</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="nl">Cond:</span><span class="w"> </span><span class="p">(</span><span class="n">app_rating</span><span class="p">.</span><span class="n">object_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">app_food</span><span class="p">.</span><span class="n">id</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_rating</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.538.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">8</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">0.017</span><span class="p">.</span><span class="mf">.5.986</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Filter:</span><span class="w"> </span><span class="p">(</span><span class="n">content_type_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">7</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">537.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">13.765</span><span class="p">.</span><span class="mf">.13.765</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Buckets:</span><span class="w"> </span><span class="mh">1024</span><span class="w"> </span><span class="nl">Batches:</span><span class="w"> </span><span class="mh">1</span><span class="w"> </span><span class="n">Memory</span><span class="w"> </span><span class="nl">Usage:</span><span class="w"> </span><span class="mh">352</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_food</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">0.011</span><span class="p">.</span><span class="mf">.10.198</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Filter:</span><span class="w"> </span><span class="p">((</span><span class="n">name</span><span class="p">)</span><span class="o">::</span><span class="n">text</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="p">'</span><span class="n">a</span><span class="o">%</span><span class="p">'</span><span class="o">::</span><span class="n">text</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Total</span><span class="w"> </span><span class="nl">runtime:</span><span class="w"> </span><span class="mf">34.449</span><span class="w"> </span><span class="n">ms</span><span class="w"></span>
<span class="p">(</span><span class="mh">10</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<h4>Subquery</h4>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">Aggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">1550.00</span><span class="p">.</span><span class="mf">.1550.01</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">32.597</span><span class="p">.</span><span class="mf">.32.598</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">1</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="n">Semi</span><span class="w"> </span><span class="n">Join</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">662.00</span><span class="p">.</span><span class="mf">.1525.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">14.598</span><span class="p">.</span><span class="mf">.31.128</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="nl">Cond:</span><span class="w"> </span><span class="p">(</span><span class="n">app_rating</span><span class="p">.</span><span class="n">object_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">u0</span><span class="p">.</span><span class="n">id</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_rating</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.538.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">8</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">0.016</span><span class="p">.</span><span class="mf">.5.988</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">30000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Filter:</span><span class="w"> </span><span class="p">(</span><span class="n">content_type_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mh">7</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">537.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">14.557</span><span class="p">.</span><span class="mf">.14.557</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Buckets:</span><span class="w"> </span><span class="mh">1024</span><span class="w"> </span><span class="nl">Batches:</span><span class="w"> </span><span class="mh">1</span><span class="w"> </span><span class="n">Memory</span><span class="w"> </span><span class="nl">Usage:</span><span class="w"> </span><span class="mh">352</span><span class="n">kB</span><span class="w"></span>
<span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Seq</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">app_food</span><span class="w"> </span><span class="n">u0</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mf">0.00</span><span class="p">.</span><span class="mf">.537.00</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mh">4</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">actual</span><span class="w"> </span><span class="kt">time</span><span class="o">=</span><span class="mf">0.011</span><span class="p">.</span><span class="mf">.10.873</span><span class="w"> </span><span class="n">rows</span><span class="o">=</span><span class="mh">10000</span><span class="w"> </span><span class="n">loops</span><span class="o">=</span><span class="mh">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">Filter:</span><span class="w"> </span><span class="p">((</span><span class="n">name</span><span class="p">)</span><span class="o">::</span><span class="n">text</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="p">'</span><span class="n">a</span><span class="o">%</span><span class="p">'</span><span class="o">::</span><span class="n">text</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">Total</span><span class="w"> </span><span class="nl">runtime:</span><span class="w"> </span><span class="mf">32.659</span><span class="w"> </span><span class="n">ms</span><span class="w"></span>
<span class="p">(</span><span class="mh">10</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<h2>When the above examples fail, and how you can fix it</h2>
<p><small>Note: if you're using sqlite, this will not actually be a problem for you</small></p>
<p>What happens when your model, for the sake of being SUPER generic, uses a TextField
to represent the "object_id" portion of the GFK? In fact, django's own comments app
does this. So let's rewrite our ratings model:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">TextRating</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">object_id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="n">content_type</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">ContentType</span><span class="p">)</span>
<span class="n">content_object</span> <span class="o">=</span> <span class="n">GenericForeignKey</span><span class="p">(</span><span class="n">ct_field</span><span class="o">=</span><span class="s1">'content_type'</span><span class="p">,</span> <span class="n">fk_field</span><span class="o">=</span><span class="s1">'object_id'</span><span class="p">)</span>
</code></pre></div>
<p>If you're using any of the three methods I described above, get ready to see
a lot of these:</p>
<div class="highlight"><pre><span></span><code><span class="n">DatabaseError</span><span class="o">:</span><span class="w"> </span><span class="n">operator</span><span class="w"> </span><span class="n">does</span><span class="w"> </span><span class="n">not</span><span class="w"> </span><span class="n">exist</span><span class="o">:</span><span class="w"> </span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">integer</span><span class="w"></span>
</code></pre></div>
<p>The reason for this is that, although django is able to in many cases generate some
pretty good SQL -- it does <em>not</em> handle casting the object_pk correctly:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="ss">"app_textrating"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_textrating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="p">,</span><span class="w"> </span><span class="ss">"app_textrating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="ss">"app_textrating"</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_textrating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"id"</span><span class="w"></span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="ss">"app_food"</span><span class="w"> </span><span class="n">U0</span><span class="w"></span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">U0</span><span class="p">.</span><span class="ss">"name"</span><span class="p">::</span><span class="nb">text</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="n">a</span><span class="o">%</span><span class="w"></span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="ss">"app_textrating"</span><span class="p">.</span><span class="ss">"content_type_id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7</span><span class="w"></span>
<span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>Django does not do this cast because it doesn't know that we are querying against
a table where the object_id is going to be an integer -- it might be able to <em>infer</em>
that based on the fact that we're restricting the queryset to a particular contenttype,
which uses an integer for its primary key, but alas... What we want to see is this:</p>
<div class="highlight"><pre><span></span><code><span class="k">WHERE</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="k">CAST</span><span class="p">(</span><span class="ss">"app_textrating"</span><span class="p">.</span><span class="ss">"object_id"</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">integer</span><span class="p">)</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="p">)</span><span class="w"></span>
<span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>This is a problem for any of the following queries:</p>
<h4>Filtering a set of ratings using <code>object_id__in</code> or similar:</h4>
<div class="highlight"><pre><span></span><code><span class="n">TextRating</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">content_type</span><span class="o">=</span><span class="n">food_ctype</span><span class="p">,</span>
<span class="n">object_id__in</span><span class="o">=</span><span class="n">a_foods</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span> <span class="c1"># <-- missing CAST</span>
<span class="p">)</span>
</code></pre></div>
<h4>Annotating records using django's annotate() api:</h4>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">base</span> <span class="o">=</span> <span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">text_ratings__content_type</span><span class="o">=</span><span class="n">ctype</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">foods</span> <span class="o">=</span> <span class="n">base</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">avg_score</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'text_ratings__rating'</span><span class="p">))</span>
</code></pre></div>
<h4>Aggregating records using django's aggregate() api:</h4>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">base</span> <span class="o">=</span> <span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">text_ratings__content_type</span><span class="o">=</span><span class="n">ctype</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">foods</span> <span class="o">=</span> <span class="n">base</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">avg_score</span><span class="o">=</span><span class="n">Avg</span><span class="p">(</span><span class="s1">'text_ratings__rating'</span><span class="p">))</span>
</code></pre></div>
<p>The reasons for the last two not working is because Django sets up a JOIN for
us between Food.id and TextRating.object_id, except of course the database engine
will complain that we need to CAST before doing the JOIN.</p>
<p>The workaround I came up with is pretty nasty and involves some abuse of django's
extra(). If you'd like to take a look, the code is available on github:</p>
<p><a href="https://github.com/coleifer/django-generic-aggregation/blob/master/generic_aggregation/utils.py">https://github.com/coleifer/django-generic-aggregation/blob/master/generic_aggregation/utils.py</a></p>
<p>You can safely call the three methods presented in the library and, in the event
the django filter would fail, it will fall back to a method that will return the
correct results. The fallbacks are possibly less efficiently since the queries generated are
different from django's, using subqueries instead of JOINs:</p>
<ul>
<li>generic_filter</li>
<li>generic_annotate</li>
<li>generic_aggregate</li>
</ul>
<p>If you're interested in reading more, check out the <a href="http://django-generic-aggregation.readthedocs.org/">docs</a>.</p>
<h2>Reading more</h2>
<p>Thanks for reading, I hope the post was informative. Feel free to leave any
comments or suggestions below, and if you've got any stories of wacky things you've
done using GFKs please share!</p>
<p>Here are a couple links you might be interested in:</p>
<ul>
<li><a href="https://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#generic-relations">django's documentation</a></li>
<li><a href="http://alexgaynor.net/2010/may/04/cool-new-django-taggit-api/">django-taggit's novel approach to GFKs</a></li>
<li><a href="http://blip.tv/djangocon/why-django-sucks-and-how-we-can-fix-it-4131303">why django sucks and how we can fix it</a> <a href="http://www.scribd.com/doc/37113340/Why-Django-Sucks-and-How-we-Can-Fix-it#page=13">slides</a></li>
</ul>Micawber, a python library for extracting rich content from URLshttp://charlesleifer.com/blog/micawber-a-python-library-for-extracting-rich-content-from-urls/2012-04-19T11:13:47Z2012-04-19T11:13:47Zcharles leifer<p><a href="https://media.charlesleifer.com/blog/photos/micawber-logo-0.png" title="photos/micawber-logo-0.png"><img alt="photos/micawber-logo-0.png" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/micawber-logo-0.png?key=uqA_HlClKG586SU_vDa27w=="/></a></p>
<p><a href="http://oembed.com/">OEmbed</a> is a simple, open API standard for embedding rich content and retrieving content metadata. The way OEmbed works is actually kind of ingenious, because the only things a consumer of the API needs to know are the location of the OEmbed endpoint, and the URL to the piece of content they want to embed.</p>
<p>YouTube, for example, maintains an OEmbed endpoint at <a href="https://www.youtube.com/oembed">youtube.com/oembed</a>. Using the OEmbed endpoint, we can very easily retrieve the HTML for an embedded video player along with metadata about the clip:</p>
<div class="highlight"><pre><span></span><code><span class="err">GET https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=nda_OSWeyn8</span>
</code></pre></div>
<p>Response:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s2">"provider_url"</span><span class="o">:</span> <span class="s2">"https://www.youtube.com/"</span><span class="p">,</span>
<span class="s2">"title"</span><span class="o">:</span> <span class="s2">"Leprechaun in Mobile, Alabama"</span><span class="p">,</span>
<span class="s2">"type"</span><span class="o">:</span> <span class="s2">"video"</span><span class="p">,</span>
<span class="s2">"html"</span><span class="o">:</span> <span class="s2">"<iframe width=\"459\" height=\"344\" src=\"https://www.youtube.com/embed/nda_OSWeyn8?feature=oembed\" frameborder=\"0\" allowfullscreen></iframe>"</span><span class="p">,</span>
<span class="s2">"thumbnail_width"</span><span class="o">:</span> <span class="mf">480</span><span class="p">,</span>
<span class="s2">"height"</span><span class="o">:</span> <span class="mf">344</span><span class="p">,</span>
<span class="s2">"width"</span><span class="o">:</span> <span class="mf">459</span><span class="p">,</span>
<span class="s2">"version"</span><span class="o">:</span> <span class="s2">"1.0"</span><span class="p">,</span>
<span class="s2">"author_name"</span><span class="o">:</span> <span class="s2">"botmib"</span><span class="p">,</span>
<span class="s2">"thumbnail_height"</span><span class="o">:</span> <span class="mf">360</span><span class="p">,</span>
<span class="s2">"thumbnail_url"</span><span class="o">:</span> <span class="s2">"https://i.ytimg.com/vi/nda_OSWeyn8/hqdefault.jpg"</span><span class="p">,</span>
<span class="s2">"provider_name"</span><span class="o">:</span> <span class="s2">"YouTube"</span><span class="p">,</span>
<span class="s2">"author_url"</span><span class="o">:</span> <span class="s2">"https://www.youtube.com/user/botmib"</span>
<span class="p">}</span>
</code></pre></div>
<p>The oembed spec defines four types of content along with a number of required attributes
for each content type. This makes it a snap for consumers to use a single interface
for handling things like:</p>
<ul>
<li>youtube videos</li>
<li>flickr photos</li>
<li>hulu videos</li>
<li>slideshare decks</li>
<li><a href="http://embed.ly">and many more</a></li>
</ul>
<h3>A quick note on embed.ly</h3>
<p>If you click that last link in the list it will send you to <a href="http://embed.ly/">http://embed.ly/</a> --
a service that launched a year or so ago that provides a single endpoint for all
sorts of different content. Many big sites provide their own endpoints, however,
so the decision to use a service like embedly really depends on your individual needs.
I tried out their free tier and found it to be much slower than using the native
endpoints provided by youtube and flickr, however the sheer number of sites they
support makes them a pretty good option. Luckily, you don't have to decide right
now, <a href="http://micawber.readthedocs.org/">micawber</a> supports both workflows.</p>
<h3>Back to micawber</h3>
<p>Micawber was designed for embedding rich content using the oembed API. In many ways
it is a successor to an earlier project <a href="https://github.com/worldcompany/djangoembed">djangoembed</a>,
which I have not been very good at maintaining, but instead of being limited to django
micawber can be used with any python project. It supports a low-level API capable of:</p>
<ul>
<li>requesting rich metadata for a URL from a given endpoint</li>
<li>extracting metadata from a block of text or html</li>
<li>parsing a block of text or HTML and replacing URLs with rich content</li>
</ul>
<p>If you're using Flask or Django, there is a higher-level API consisting of a couple of
template filters which do the same things.</p>
<p>I put a demo up on appengine (hoping it doesn't break too bad, this will be my first appengine deploy). Try entering some URLs to things like youtube videos or flickr photos: <a href="http://micawberdemo.appspot.com/">http://micawberdemo.appspot.com/</a></p>
<h3>Providers</h3>
<p>Behind-the-scenes, your app creates a mapping of partial URL regex to a particular
endpoint, e.g.:</p>
<div class="highlight"><pre><span></span><code><span class="n">http</span><span class="o">://</span><span class="err">\</span><span class="n">S</span><span class="o">*</span><span class="p">.</span><span class="n">youtu</span><span class="p">(</span><span class="err">\</span><span class="p">.</span><span class="n">be</span><span class="o">|</span><span class="n">be</span><span class="err">\</span><span class="p">.</span><span class="n">com</span><span class="p">)</span><span class="o">/</span><span class="n">watch</span><span class="err">\</span><span class="n">S</span><span class="o">*</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">http</span><span class="o">://</span><span class="n">www</span><span class="p">.</span><span class="n">youtube</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">oembed</span><span class="w"></span>
</code></pre></div>
<p>What happens when you ask the youtube oembed endpoint for metadata about a video
simply by providing that video's URL?</p>
<div class="highlight"><pre><span></span><code>curl http://www.youtube.com/oembed?url=http://www.youtube.com/watch?v=nda_OSWeyn8
</code></pre></div>
<p>Results in the following output:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="err">'au</span><span class="kc">t</span><span class="err">hor_</span><span class="kc">na</span><span class="err">me'</span><span class="p">:</span><span class="w"> </span><span class="err">u'bo</span><span class="kc">t</span><span class="err">mib'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'au</span><span class="kc">t</span><span class="err">hor_url'</span><span class="p">:</span><span class="w"> </span><span class="err">u'h</span><span class="kc">tt</span><span class="err">p</span><span class="p">:</span><span class="err">//www.you</span><span class="kc">tu</span><span class="err">be.com/user/bo</span><span class="kc">t</span><span class="err">mib'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'heigh</span><span class="kc">t</span><span class="err">'</span><span class="p">:</span><span class="w"> </span><span class="mi">344</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'h</span><span class="kc">t</span><span class="err">ml'</span><span class="p">:</span><span class="w"> </span><span class="err">u'<i</span><span class="kc">fra</span><span class="err">me</span><span class="w"> </span><span class="err">wid</span><span class="kc">t</span><span class="err">h=</span><span class="s2">"459"</span><span class="w"> </span><span class="err">heigh</span><span class="kc">t</span><span class="err">=</span><span class="s2">"344"</span><span class="w"> </span><span class="err">src=</span><span class="s2">"http://www.youtube.com/embed/nda_OSWeyn8?fs=1&feature=oembed"</span><span class="w"> </span><span class="kc">fra</span><span class="err">meborder=</span><span class="s2">"0"</span><span class="w"> </span><span class="err">allow</span><span class="kc">fulls</span><span class="err">cree</span><span class="kc">n</span><span class="err">></i</span><span class="kc">fra</span><span class="err">me>'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'provider_</span><span class="kc">na</span><span class="err">me'</span><span class="p">:</span><span class="w"> </span><span class="err">u'YouTube'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'provider_url'</span><span class="p">:</span><span class="w"> </span><span class="err">u'h</span><span class="kc">tt</span><span class="err">p</span><span class="p">:</span><span class="err">//www.you</span><span class="kc">tu</span><span class="err">be.com/'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'</span><span class="kc">t</span><span class="err">humb</span><span class="kc">na</span><span class="err">il_heigh</span><span class="kc">t</span><span class="err">'</span><span class="p">:</span><span class="w"> </span><span class="mi">360</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'</span><span class="kc">t</span><span class="err">humb</span><span class="kc">na</span><span class="err">il_url'</span><span class="p">:</span><span class="w"> </span><span class="err">u'h</span><span class="kc">tt</span><span class="err">p</span><span class="p">:</span><span class="err">//i</span><span class="mf">3.</span><span class="err">y</span><span class="kc">t</span><span class="err">img.com/vi/</span><span class="kc">n</span><span class="err">da_OSWey</span><span class="kc">n</span><span class="mi">8</span><span class="err">/hqde</span><span class="kc">fault</span><span class="err">.jpg'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'</span><span class="kc">t</span><span class="err">humb</span><span class="kc">na</span><span class="err">il_wid</span><span class="kc">t</span><span class="err">h'</span><span class="p">:</span><span class="w"> </span><span class="mi">480</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'</span><span class="kc">t</span><span class="err">i</span><span class="kc">tle</span><span class="err">'</span><span class="p">:</span><span class="w"> </span><span class="err">u'Leprechau</span><span class="kc">n</span><span class="w"> </span><span class="err">i</span><span class="kc">n</span><span class="w"> </span><span class="err">Mobile</span><span class="p">,</span><span class="w"> </span><span class="err">Alabama'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'</span><span class="kc">t</span><span class="err">ype'</span><span class="p">:</span><span class="w"> </span><span class="err">u'video'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'versio</span><span class="kc">n</span><span class="err">'</span><span class="p">:</span><span class="w"> </span><span class="err">u'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="err">'wid</span><span class="kc">t</span><span class="err">h'</span><span class="p">:</span><span class="w"> </span><span class="mi">459</span><span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Using these providers it is a snap to add nice thumbnail previews of content
within blocks of text, or to even parse blocks of text or HTML and replace
URLs with rich content (e.g. URL becomes flash player or img tag).</p>
<p>For simplicity, micawber comes with two "bootstrap" functions to get you a prepopulated
list of providers:</p>
<ul>
<li><a href="http://micawber.readthedocs.org/en/latest/api.html#micawber.providers.bootstrap_basic">bootstrap_basic()</a>, which loads up a list of providers with native endpoints</li>
<li><a href="http://micawber.readthedocs.org/en/latest/api.html#micawber.providers.bootstrap_embedly">bootstrap_embedly()</a>, which asks embedly for a list of providers and configures them</li>
</ul>
<h3>Interactive shell session</h3>
<p>Below is an annotated interactive shell session showing how these components work.</p>
<p>Import micawber and load up a list of providers. It comes prepopulated with a handful of providers.</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">micawber</span>
<span class="gp">>>> </span><span class="n">providers</span> <span class="o">=</span> <span class="n">micawber</span><span class="o">.</span><span class="n">bootstrap_basic</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">providers</span>
<span class="go"><micawber.providers.ProviderRegistry at 0x2681690></span>
<span class="gp">>>> </span><span class="n">providers</span><span class="o">.</span><span class="n">_registry</span>
<span class="go">{'http://\S*.youtu(\\.be|be\\.com)/watch\\S*': <micawber.providers.Provider at 0x2681d90>,</span>
<span class="go"> 'http://\S*?flickr.com/\\S*': <micawber.providers.Provider at 0x2681d50>,</span>
<span class="go"> 'http://vimeo.com/\S*': <micawber.providers.Provider at 0x2681e10>,</span>
<span class="go"> 'http://www.hulu.com/watch/\S*': <micawber.providers.Provider at 0x2681dd0>,</span>
<span class="go"> 'http://www.slideshare.net/[^\\/]+/\S*': <micawber.providers.Provider at 0x2681e50>}</span>
</code></pre></div>
<p>Request some metadata about a URL we know about and a dictionary is returned. All metadata returned follows the <a href="http://oembed.com/#section2">oembed spec</a>, which specifies various response parameters:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">providers</span><span class="o">.</span><span class="n">request</span><span class="p">(</span><span class="s1">'http://www.youtube.com/watch?v=nda_OSWeyn8'</span><span class="p">)</span>
<span class="go">{'author_name': u'botmib',</span>
<span class="go"> 'author_url': u'http://www.youtube.com/user/botmib',</span>
<span class="go"> 'height': 344,</span>
<span class="go"> ...</span>
<span class="go">}</span>
</code></pre></div>
<p>URLs we do not have providers for will raise <code>ProviderException</code></p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">providers</span><span class="o">.</span><span class="n">request</span><span class="p">(</span><span class="s1">'http://www.google.com/'</span><span class="p">)</span>
<span class="go">ProviderException: Provider not found for "http://www.google.com/"</span>
</code></pre></div>
<p>There are higher-level functions which can parse text or HTML, either replacing
the URLs with rich content or extracting the metadata and returning it in a dictionary.
The <a href="http://micawber.readthedocs.org/en/latest/api.html#functions-for-extracting-rich-content-from-text-and-html">extract</a> functions return a 2-tuple containing a list of all URLs in order of appearance, and then a dictionary keyed by URL containing any URLs we found metadata for:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">micawber</span><span class="o">.</span><span class="n">extract</span><span class="p">(</span><span class="s2">"http://google.com/ and http://www.youtube.com/watch?v=nda_OSWeyn8"</span><span class="p">,</span> <span class="n">providers</span><span class="p">)</span>
<span class="go">(['http://google.com/', 'http://www.youtube.com/watch?v=nda_OSWeyn8'],</span>
<span class="go"> {'http://www.youtube.com/watch?v=nda_OSWeyn8': {</span>
<span class="go"> 'author_name': u'botmib',</span>
<span class="go"> 'author_url': u'http://www.youtube.com/user/botmib',</span>
<span class="go"> 'height': 344,</span>
<span class="go"> ... etc ...</span>
<span class="go"> }</span>
<span class="go"> })</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="n">micawber</span><span class="o">.</span><span class="n">parse_text</span><span class="p">(</span><span class="s2">"this is a test</span><span class="se">\n</span><span class="s2">http://www.youtube.com/watch?v=nda_OSWeyn8"</span><span class="p">,</span> <span class="n">providers</span><span class="p">)</span>
<span class="go">this is a test</span>
<span class="go"><iframe width="459" height="344"</span>
<span class="go"> src="http://www.youtube.com/embed/nda_OSWeyn8?fs=1&feature=oembed"</span>
<span class="go"> frameborder="0" allowfullscreen></iframe></span>
</code></pre></div>
<p>Finally, if using <a href="http://micawber.readthedocs.org/en/latest/django.html">Django</a> or <a href="http://micawber.readthedocs.org/en/latest/flask.html">Flask</a> there are template filters for doing the same:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.template</span> <span class="kn">import</span> <span class="n">Template</span><span class="p">,</span> <span class="n">Context</span>
<span class="gp">>>> </span><span class="n">t</span> <span class="o">=</span> <span class="n">Template</span><span class="p">(</span><span class="s1">'{</span><span class="si">% lo</span><span class="s1">ad micawber_tags %}{{ "http://www.youtube.com/watch?v=nda_OSWeyn8"|oembed }}'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">t</span><span class="o">.</span><span class="n">render</span><span class="p">(</span><span class="n">Context</span><span class="p">())</span>
<span class="go"><iframe width="459" height="344"</span>
<span class="go"> src="http://www.youtube.com/embed/nda_OSWeyn8?fs=1&feature=oembed"</span>
<span class="go"> frameborder="0" allowfullscreen></iframe></span>
</code></pre></div>
<h3>Reading more</h3>
<p>If you're interested in learning more about the project, check out the
<a href="http://micawber.readthedocs.org/">documentation</a>, hosted on readthedocs. You
can also browse the <a href="https://github.com/coleifer/micawber">source code</a>, hosted on
GitHub. There's a live demo hosted on appengine: <a href="http://micawberdemo.appspot.com/">http://micawberdemo.appspot.com/</a></p>
<p>Hope you enjoyed reading about this project, I've had a lot of fun working on it.
Please let me know if you have any questions or suggestions about this project by
leaving a comment or <a href="/contact/">contacting me</a>.</p>
<h2>An example, for your viewing pleasure</h2>
<p><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="200" src="https://www.youtube.com/embed/qeL3tAb7yV4?feature=oembed" title="Beethoven - Piano sonata n°17 op.31 n°2 - Richter studio" width="267"></iframe></p>Integrating the flask microframework with the peewee ORMhttp://charlesleifer.com/blog/integrating-flask-microframework-peewee-orm/2011-09-27T10:52:38Z2011-09-27T10:52:38Zcharles leifer<p><img src="http://media.charlesleifer.com/blog/photos/flask-peewee.png" style="float: right;"/></p>
<p>I'd like to write a post about a project I've been working on for the past month
or so. I've had a great time working on it and am excited to start putting it
to use. The project is called <a href="https://github.com/coleifer/flask-peewee">flask-peewee</a> --
it is a set of utilities that bridges the python microframework <a href="http://flask.pocoo.org">flask</a>
and the lightweight ORM <a href="https://github.com/coleifer/peewee">peewee</a>. It is packaged
as a flask extension and comes with the following batteries included:</p>
<ul>
<li><a href="http://flask-peewee.readthedocs.org/en/latest/_images/fp-admin.jpg">Admin interface</a> a-la django</li>
<li>RESTful API toolkit a-la <a href="https://github.com/toastdriven/django-tastypie">tastypie</a></li>
<li>Authentication system</li>
</ul>
<p>The <a href="http://flask-peewee.readthedocs.org/">documentation</a> provides
in-depth explanations on the usage of these features, but if you are already familiar
with <a href="http://djangoproject.com">django</a> things shouldn't look <em>too</em> strange. The
purpose of this post will be to highlight the main features of the project and
discuss a little bit about their implementation.</p>
<h3>Admin interface</h3>
<p>Users of the django framework often say how valuable the admin interface is. It
is, for many site managers, their primary means of interacting with django-powered
sites, and for developers it is great for rapid prototyping. Considering this,
one of the first things that bit me when I started working with flask was that
there was no admin interface. Several times I've stopped a project before it even
got very far and switched to django just because I wanted the admin interface.</p>
<p>So, to scratch that itch I wrote an admin interface that borrows many concepts from
django. It looks like this:</p>
<p><a href="https://media.charlesleifer.com/blog/photos/s1398466769.85.png" title="Admin for UncommonVision"><img alt="Admin for UncommonVision" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/s1398466769.85.png?key=QyOCmtcBcx0jmPLtJA1X9A=="/></a></p>
<p>What you are looking at is the dashboard view, which by default is at <em>/admin/</em>.
The screenshot shows two <a href="http://flask-peewee.readthedocs.org/en/latest/admin.html#adminpanel">Panels</a>, one which allows administrators to write "Notes", and another
which shows some stats on signups. Below the panels is a list of models. Clicking on the "Message"
link will take you to a <a href="http://flask-peewee.readthedocs.org/en/latest/admin.html#modeladmin">ModelAdmin</a> page, where you can filter/edit/add/delete instances
of the the model. It looks like this:</p>
<p><a href="https://media.charlesleifer.com/blog/photos/s1398466710.11.png" title="Flask-Peewee modeladmin example"><img alt="Flask-Peewee modeladmin example" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/s1398466710.11.png?key=OfTH_O51m7Cnu7qqGZ4jSg=="/></a></p>
<p>To code up something like this, you would create a flask app (which is roughly
analagous to a django project). All of this code is taken from the
<a href="https://github.com/coleifer/flask-peewee/tree/master/example">example app</a>, so feel
free to refer there for the complete source code.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span>
<span class="c1"># create a flask app</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
<span class="c1"># load app configuration</span>
<span class="n">app</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">from_object</span><span class="p">(</span><span class="s1">'config.Configuration'</span><span class="p">)</span>
</code></pre></div>
<p>We need to tell our models how to connect to the database, so we'll use the
flask-peewee <a href="http://flask-peewee.readthedocs.org/en/latest/database.html">database wrapper</a>.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flaskext.db</span> <span class="kn">import</span> <span class="n">Database</span>
<span class="c1"># initialize the db wrapper, passing it a reference to our flask app</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">Database</span><span class="p">(</span><span class="n">app</span><span class="p">)</span>
</code></pre></div>
<p>The database wrapper sets up connections when requests are started and closes them
when the request finishes. It also provides a base model class which interacts
with the database specified by the app's configuration.</p>
<p>Now we can start coding up the models:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">peewee</span> <span class="kn">import</span> <span class="o">*</span>
<span class="c1"># assume the other project models, such as "User" are defined right here</span>
<span class="k">class</span> <span class="nc">Message</span><span class="p">(</span><span class="n">db</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">ForeignKeyField</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">TextField</span><span class="p">()</span>
<span class="n">pub_date</span> <span class="o">=</span> <span class="n">DateTimeField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__unicode__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">content</span><span class="p">)</span>
</code></pre></div>
<p>In order to protect the admin area, we need to configure some authentication. The
auth system works with a "User" model and provides utilities for detecting the
logged-in user, marking areas as "login-required", as well as configuring login/logout
views.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flaskext.auth</span> <span class="kn">import</span> <span class="n">Auth</span>
<span class="c1"># initialize the auth layer, passing in a User model</span>
<span class="n">auth</span> <span class="o">=</span> <span class="n">Auth</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">db</span><span class="p">,</span> <span class="n">user_model</span><span class="o">=</span><span class="n">User</span><span class="p">)</span>
</code></pre></div>
<p>Lastly, we create our admin. The process should look familiar if you've used the
django framework:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flaskext.admin</span> <span class="kn">import</span> <span class="n">Admin</span><span class="p">,</span> <span class="n">ModelAdmin</span>
<span class="c1"># initialize the admin layer, passing in a reference to our app and</span>
<span class="c1"># the auth object we're using</span>
<span class="n">admin</span> <span class="o">=</span> <span class="n">Admin</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">auth</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">MessageAdmin</span><span class="p">(</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">columns</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="s1">'content'</span><span class="p">,</span> <span class="s1">'pub_date'</span><span class="p">,)</span>
<span class="n">admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Message</span><span class="p">,</span> <span class="n">MessageAdmin</span><span class="p">)</span>
</code></pre></div>
<p>Under-the-hood, the "auth" and "admin" objects create flask <a href="http://flask.pocoo.org/docs/blueprints/">blueprints</a>.
A blueprint describes a component of a website and can be bound to a specific url prefix,
like <em>/admin/</em> or <em>/accounts/</em>. Blueprints can specify their own templates and static
media, as well, and are great for encapsulating functionality you wish to reuse across
projects.</p>
<p>For more examples, check out the <a href="https://github.com/coleifer/flask-peewee/tree/master/example">example app</a>
or the <a href="http://flask-peewee.readthedocs.org/en/latest/admin.html">admin documentation</a>.</p>
<h3>Rest API</h3>
<p>flask-peewee also comes with tools for exposing your app's models via a RESTful API.
For each model you register with the API, you will get the following urls:</p>
<ul>
<li><em>/api/{model}/</em>: GET and POST</li>
<li><em>/api/{model}/{primary key}/</em>: GET, PUT and DELETE</li>
</ul>
<p>The process is very similar to the one we followed for setting up the admin, but the
individual components are subtly different. The main difference is that, rather than
specifying a "global" authentication method (as we did for the admin), we can compose
different types of authentication with individual models exposed by the API. An example
of this would be:</p>
<ul>
<li>users should be able to POST new messages and edit their own existing messages,
but should not be able to alter other users messages or masquerade as other users
when posting.</li>
<li>only site administrators should be able to modify User accounts via the admin</li>
</ul>
<p>flask-peewee comes with a couple tools to simplify handling those use-cases. I'll
show some code and hopefully it will become clear how the pieces work together.</p>
<p>First we create a "RestAPI" instance. This is the container for our API, much the
same as the <em>Admin</em> object was the container for our admin interface. By default it
will be bound to <em>/api/</em>:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flaskext.rest</span> <span class="kn">import</span> <span class="n">RestAPI</span>
<span class="c1"># instantiate an api wrapper, passing in a reference to our flask app</span>
<span class="n">api</span> <span class="o">=</span> <span class="n">RestAPI</span><span class="p">(</span><span class="n">app</span><span class="p">)</span>
</code></pre></div>
<p>By default, though, our API will not know how to authenticate requests so all
POST/PUT/DELETE requests will fail with a 401 Unauthorized response. flask-peewee
comes with a <a href="http://flask-peewee.readthedocs.org/en/latest/rest-api.html#authentication">couple authentication backends</a>,
we'll use the <em>UserAuthentication</em> one as the default. This requires that incoming
requests use HTTP basic auth with credentials matching a registered user of the site.
Our code now looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flaskext.rest</span> <span class="kn">import</span> <span class="n">RestAPI</span><span class="p">,</span> <span class="n">UserAuthentication</span><span class="p">,</span> <span class="n">RestResource</span>
<span class="c1"># pass in a reference to our project's auth instance so it knows how to</span>
<span class="c1"># authenticate incoming requests</span>
<span class="n">user_auth</span> <span class="o">=</span> <span class="n">UserAuthentication</span><span class="p">(</span><span class="n">auth</span><span class="p">)</span>
<span class="c1"># instantiate an api wrapper, passing in a reference to our flask app</span>
<span class="n">api</span> <span class="o">=</span> <span class="n">RestAPI</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">default_auth</span><span class="o">=</span><span class="n">user_auth</span><span class="p">)</span>
<span class="c1"># register the message model, it will be exposed at /api/message/</span>
<span class="n">api</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Message</span><span class="p">,</span> <span class="n">RestResource</span><span class="p">)</span>
</code></pre></div>
<p>Recall the first bullet point about only allowing users to post and modify their
own messages. flask-peewee comes with a special class that makes this a snap, <a href="http://flask-peewee.readthedocs.org/en/latest/rest-api.html#RestrictOwnerResource">RestrictOwnerResource</a>.
The <em>RestrictOwnerResource</em> is a subclass of <em>RestResource</em> which enforces that
the requesting user be the "owner" of an object before modifying it:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flaskext.rest</span> <span class="kn">import</span> <span class="n">RestrictOwnerResource</span>
<span class="k">class</span> <span class="nc">MessageResource</span><span class="p">(</span><span class="n">RestrictOwnerResource</span><span class="p">):</span>
<span class="n">owner_field</span> <span class="o">=</span> <span class="s1">'user'</span>
<span class="n">api</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Message</span><span class="p">,</span> <span class="n">MessageResource</span><span class="p">)</span>
</code></pre></div>
<p>Going back to the second bullet point about only allowing administrators to modify
user objects via the API, there is a type of authentication that requires the
authenticated user be a "site administrator". Here's what the implementation would
look like:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flaskext.rest</span> <span class="kn">import</span> <span class="n">AdminAuthentication</span>
<span class="n">admin_auth</span> <span class="o">=</span> <span class="n">AdminAuthentication</span><span class="p">(</span><span class="n">auth</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">UserResource</span><span class="p">(</span><span class="n">RestResource</span><span class="p">):</span>
<span class="n">exclude</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'password'</span><span class="p">,</span> <span class="s1">'email'</span><span class="p">,)</span>
<span class="n">api</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">,</span> <span class="n">UserResource</span><span class="p">,</span> <span class="n">auth</span><span class="o">=</span><span class="n">admin_auth</span><span class="p">)</span>
</code></pre></div>
<h3>Finding out more</h3>
<p>If you're interested in learning a bit more, check out the following links:</p>
<ul>
<li><a href="https://github.com/coleifer/flask-peewee">flask-peewee on github</a></li>
<li><a href="http://flask-peewee.readthedocs.org/">documentation</a></li>
<li><a href="http://flask.pocoo.org/">flask microframework</a></li>
<li><a href="http://docs.peewee-orm.com/">peewee orm</a></li>
</ul>
<p>Here's a <a href="https://gist.github.com/3bee9ad91aea3b56b11c">gist</a> showing some simple blueprints using flask-peewee (pastebin, bookmarks, etc).</p>
<p>Thanks for reading!</p>Connecting anything to anything with Djangohttp://charlesleifer.com/blog/connecting-anything-to-anything-with-django/2011-02-17T19:18:03Z2011-02-17T19:18:03Zcharles leifer<h3>Edit 7/11/2011</h3>
<p>I've added <a href="http://readthedocs.org/docs/django-generic-m2m/en/latest/">documentation</a> and an <a href="http://readthedocs.org/docs/django-generic-m2m/en/latest/example.html">example app</a>.</p>
<h2>Introduction</h2>
<p>I'm writing this post to introduce a new project I've released, <a href="https://github.com/coleifer/django-generic-m2m">django-generic-m2m</a>,
which as its name would indicate is a generic ManyToMany implementation for
django models. The goal of this project was to provide a uniform API for both
creating and querying generically-related content in a flexible manner. One
use-case for this project would be creating semantic "tags" between diverse
objects in the database.</p>
<h2>Connecting Models</h2>
<p>What its all about is connecting models together and, if you want, creating some
metadata about the meaning of that relationship (i.e. a tag).</p>
<p><a href="https://media.charlesleifer.com/blog/photos/generic-model-relationships.png" title="Generic model relationships"><img alt="Generic model relationships" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/generic-model-relationships.png?key=yYa3VDwpw21u3yKWVhJ7uA=="/></a></p>
<p>To this end, django-generic-m2m does three things to make this behavior easier:</p>
<ol>
<li>wraps up all querying and connecting logic in a single attribute that acts
on both model instances and the model class</li>
<li>allows any model to be used as the intermediary "through" model</li>
<li>provides an optimized lookup when GenericForeignKeys are used</li>
</ol>
<h3>An example</h3>
<p>Referring back to the diagram, let's create some models (these are the same
models used in the <a href="https://github.com/coleifer/django-generic-m2m/blob/master/genericm2m/genericm2m_tests/tests.py">testcases</a>):</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">genericm2m.models</span> <span class="kn">import</span> <span class="n">RelatedObjectsDescriptor</span>
<span class="k">class</span> <span class="nc">Food</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">255</span><span class="p">)</span>
<span class="n">related</span> <span class="o">=</span> <span class="n">RelatedObjectsDescriptor</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">__unicode__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span>
<span class="k">class</span> <span class="nc">Beverage</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ... same as above</span>
<span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ... same as above ...</span>
</code></pre></div>
<p>The "related" attribute is the way the generic many-to-many is exposed for each
model. Behind-the-scenes it is using <strong>genericm2m.models.RelatedObject</strong>, which
looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">RelatedObject</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> A generic many-to-many implementation where diverse objects are related</span>
<span class="sd"> across a single model to other diverse objects -> using a dual GFK</span>
<span class="sd"> """</span>
<span class="c1"># SOURCE OBJECT:</span>
<span class="n">parent_type</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">ContentType</span><span class="p">,</span> <span class="n">related_name</span><span class="o">=</span><span class="s2">"child_</span><span class="si">%(class)s</span><span class="s2">"</span><span class="p">)</span>
<span class="n">parent_id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span><span class="n">db_index</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">parent</span> <span class="o">=</span> <span class="n">GenericForeignKey</span><span class="p">(</span><span class="n">ct_field</span><span class="o">=</span><span class="s2">"parent_type"</span><span class="p">,</span> <span class="n">fk_field</span><span class="o">=</span><span class="s2">"parent_id"</span><span class="p">)</span>
<span class="c1"># ACTUAL RELATED OBJECT:</span>
<span class="n">object_type</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">ContentType</span><span class="p">,</span> <span class="n">related_name</span><span class="o">=</span><span class="s2">"related_</span><span class="si">%(class)s</span><span class="s2">"</span><span class="p">)</span>
<span class="n">object_id</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span><span class="n">db_index</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="nb">object</span> <span class="o">=</span> <span class="n">GenericForeignKey</span><span class="p">(</span><span class="n">ct_field</span><span class="o">=</span><span class="s2">"object_type"</span><span class="p">,</span> <span class="n">fk_field</span><span class="o">=</span><span class="s2">"object_id"</span><span class="p">)</span>
<span class="n">alias</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">255</span><span class="p">,</span> <span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">creation_date</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">auto_now_add</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">ordering</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'-creation_date'</span><span class="p">,)</span>
<span class="k">def</span> <span class="nf">__unicode__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">%s</span><span class="s1"> related to </span><span class="si">%s</span><span class="s1"> ("</span><span class="si">%s</span><span class="s1">")'</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">parent</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">object</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">alias</span><span class="p">)</span>
</code></pre></div>
<p>There's not really too much that should be weird about this model. It contains
two GenericForeignKeys, one to represent the "from" object, the source of the
connection, and another to represent to "to" object (what "from" is being
connected with). Additionally, the model is storing a little bit of metadata
about the relationship, specifically an "alias" which is just a character string,
and a creation_date to mark when this relationship was created.</p>
<p>So now that there are some models to work with, here's a contrived interactive
shell session with some annotations to show how objects can be connected. First
need to create some model instances, though:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">pizza</span> <span class="o">=</span> <span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'pizza'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">cereal</span> <span class="o">=</span> <span class="n">Food</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'cereal'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">beer</span> <span class="o">=</span> <span class="n">Beverage</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'beer'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">soda</span> <span class="o">=</span> <span class="n">Beverage</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'soda'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">milk</span> <span class="o">=</span> <span class="n">Beverage</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'milk'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">mario</span> <span class="o">=</span> <span class="n">Person</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'Mario'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">chocula</span> <span class="o">=</span> <span class="n">Person</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'Count Chocula'</span><span class="p">)</span>
</code></pre></div>
<p>Now that we have some Food, Beverage and Person objects, create some connections
between them:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">rel_obj</span> <span class="o">=</span> <span class="n">pizza</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">beer</span><span class="p">,</span> <span class="n">alias</span><span class="o">=</span><span class="s1">'Beer and pizza are good'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">type</span><span class="p">(</span><span class="n">rel_obj</span><span class="p">)</span>
<span class="go"><class 'genericm2m.models.RelatedObject'></span>
</code></pre></div>
<p>The object that represents the connection is an instance of whatever is passed
to the RelatedObjectDescriptor when it is added to a model, but the default is
"genericm2m.models.RelatedObject". Here are the interesting properties of the
new related object:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">rel_obj</span><span class="o">.</span><span class="n">parent</span>
<span class="go"><Food: pizza></span>
<span class="gp">>>> </span><span class="n">rel_obj</span><span class="o">.</span><span class="n">object</span>
<span class="go"><Beverage: beer></span>
<span class="gp">>>> </span><span class="n">rel_obj</span><span class="o">.</span><span class="n">alias</span>
<span class="go">'Beer and pizza are good'</span>
</code></pre></div>
<p>These relationships can be queried:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">pizza</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="go">[<RelatedObject: pizza related to beer ("Beer and pizza are good")>]</span>
</code></pre></div>
<p>When the RelatedObject is a GFK, as is the case here, the RelatedObjectsDescriptor
will return a special QuerySet class that provides an optimized lookup of any
GFK-ed objects:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="nb">type</span><span class="p">(</span><span class="n">pizza</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">all</span><span class="p">())</span>
<span class="go"><class 'genericm2m.models.GFKOptimizedQuerySet'></span>
<span class="gp">>>> </span><span class="n">pizza</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">all</span><span class="p">()</span><span class="o">.</span><span class="n">generic_objects</span><span class="p">()</span>
<span class="go">[<Beverage: beer>]</span>
</code></pre></div>
<p>If the object on the back-side of the relationship also has a RelatedObjectsDescriptor
with the same intermediary model, reverse lookups are possible:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">beer</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">related_to</span><span class="p">()</span>
<span class="go">[<RelatedObject: pizza related to beer ("Beer and pizza are good")>]</span>
</code></pre></div>
<p>Create some more connections - any combination of models can be used. Below
I'm connectiong a Food (cereal) to both Beverage objects (milk) and Person objects (Chocula):</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">cereal</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">milk</span><span class="p">)</span>
<span class="go"><RelatedObject: cereal related to milk ("")></span>
<span class="gp">>>> </span><span class="n">cereal</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">chocula</span><span class="p">)</span>
<span class="go"><RelatedObject: cereal related to Count Chocula ("")></span>
<span class="gp">>>> </span><span class="n">cereal</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="go">[<RelatedObject: cereal related to Count Chocula ("")>,</span>
<span class="go"> <RelatedObject: cereal related to milk ("")>]</span>
<span class="gp">>>> </span><span class="n">chocula</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="go">[]</span>
<span class="gp">>>> </span><span class="n">chocula</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">related_to</span><span class="p">()</span>
<span class="go">[<RelatedObject: cereal related to Count Chocula ("")>]</span>
</code></pre></div>
<p>Also worth noting is that the RelatedObjectsDescriptor works on both the instance-level
and the class-level, so if we wanted to see all objects related to foods:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Food</span><span class="o">.</span><span class="n">related</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="go">[<RelatedObject: cereal related to Count Chocula ("")>,</span>
<span class="go"> <RelatedObject: cereal related to milk ("")>,</span>
<span class="go"> <RelatedObject: pizza related to beer ("Beer and pizza are good")>]</span>
</code></pre></div>
<h3>Using a custom through model</h3>
<p>It's possible to use a custom "through" model in place of the default RelatedObject.
If you know you're only going to be using a couple models, this can be a handy way
to save queries. Looking at the tests, here's another silly example where we
have a "RelatedBeverage" model that our Food model will use:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">RelatedBeverage</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">food</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="s1">'Food'</span><span class="p">)</span>
<span class="n">beverage</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="s1">'Beverage'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">ordering</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'-id'</span><span class="p">,)</span>
<span class="k">class</span> <span class="nc">Food</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># ... same as above except for this new attribute:</span>
<span class="n">related_beverages</span> <span class="o">=</span> <span class="n">RelatedObjectsDescriptor</span><span class="p">(</span><span class="n">RelatedBeverage</span><span class="p">,</span> <span class="s1">'food'</span><span class="p">,</span> <span class="s1">'beverage'</span><span class="p">)</span>
</code></pre></div>
<p>The "related_beverages" attribute is an instance of "RelatedObjectsDescriptor",
but it is instantiated with a couple of arguments:</p>
<ul>
<li>RelatedBeverage: the model to be used to hold the "connections"</li>
<li>'food': the field name on the above model which maps to the "from" object</li>
<li>'beverage': the field name which maps to the "to" object</li>
</ul>
<p>Continuing the shell session from above with the same models, foods can be
connected to beverages using the new "related_beverages" attribute:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">pizza</span><span class="o">.</span><span class="n">related_beverages</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">soda</span><span class="p">)</span>
<span class="go"><RelatedBeverage: RelatedBeverage object></span>
</code></pre></div>
<p>Querying provides the same interface, but since the "to" object is a direct
ForeignKey to Beverage, a normal django QuerySet is used:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">pizza</span><span class="o">.</span><span class="n">related_beverages</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="go">[<RelatedBeverage: RelatedBeverage object>]</span>
<span class="gp">>>> </span><span class="nb">type</span><span class="p">(</span><span class="n">pizza</span><span class="o">.</span><span class="n">related_beverages</span><span class="o">.</span><span class="n">all</span><span class="p">())</span>
<span class="go"><class 'django.db.models.query.QuerySet'></span>
</code></pre></div>
<p>A TypeError will be raised if you try to connect an invalid object, such as a
Person to the "related_beverages":</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">pizza</span><span class="o">.</span><span class="n">related_beverages</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">mario</span><span class="p">)</span>
<span class="go">*** TypeError: Unable to query ...</span>
</code></pre></div>
<p>And lastly, just like before, its possible to query on the class to get all the
RelatedBeverage objects for our foods:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Food</span><span class="o">.</span><span class="n">related_beverages</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="go">[<RelatedBeverage: RelatedBeverage object>]</span>
</code></pre></div>
<h2>Reading more</h2>
<p>Until I write some docs, the <a href="https://github.com/coleifer/django-generic-m2m/blob/master/genericm2m/genericm2m_tests/tests.py">tests</a>
are going to be the best place to see the entire API. A good chunk of this code
is based on ideas already present in django, specifically the <a href="http://code.djangoproject.com/browser/django/trunk/django/db/models/fields/related.py#L377">ForeignRelatedObjectsDescriptor</a>,
which dynamically creates a Manager to handle instance-specific functionality. Also
worth checking out: <a href="http://code.djangoproject.com/browser/django/trunk/django/contrib/contenttypes/generic.py#L243">GenericRelatedObjectManager</a>.</p>
<p>Some hints for optimizing the GFK lookup came from bconstantin's <a href="https://github.com/bconstantin/django_polymorphic/blob/master/polymorphic/query.py#L101">django_polymorphic</a>
project which has a "_get_real_instances" method that does some cool stuff. Also, thanks to <a href="http://alexgaynor.net/">Alex</a>
for helping me out today getting that stuff working.</p>
<p>I'm very interested in extending the functionality of this library, so any and
all suggestions, patches, etc would be greatly appreciated! Thanks for reading!</p>Peewee, a lightweight Python ORM - Original Posthttp://charlesleifer.com/blog/peewee-a-lightweight-python-orm---original-post/2010-11-28T15:01:00Z2010-11-28T15:01:00Zcharles leifer<h2>Edit</h2>
<ul>
<li>I rewrote peewee from the ground up. The query examples in this post are no longer supported.</li>
<li>Edit, Jul 24, 2011: added support for Postgresql and MySQL (in addition to SQLite).</li>
<li>Edit, June 8, 2011: added <a href="http://github.com/coleifer/peewee">support for MySQL</a></li>
</ul>
<p>For the past month or so I've been working on writing my own <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a>
in Python. The project grew out of a need for a lightweight persistence layer for use in <a href="http://flask.pocoo.org">Flask</a> web apps. As I've grown so familiar with the <a href="http://docs.djangoproject.com/en/1.2/topics/db/queries/">Django ORM</a> over the past year,
many of the ideas in <a href="http://github.com/coleifer/peewee">Peewee</a> are analagous
to the concepts in Django.
My goal from the beginning has been to keep the implementation <strong>simple</strong>
without sacrificing functionality, and to ultimately create something hackable
that others might be able to read and contribute to.</p>
<p>Weighing in at about 1000 lines of code, Peewee doesn't come close to matching
Django's ORM (15K LOC) in terms of API cleanliness or functionality, but it does
hit many of the basic use-cases for an app that needs lightweight persistence and
querying. This has definitely been one of the most rewarding projects I've
worked on!</p>
<h2>Benchmarks</h2>
<p>In terms of speed, peewee is generally 25% faster than django when creating
rows or grabbing simple lists of objects. Peewee is 77% faster than Django
for simple one-row calls to ".get()", and almost 50% faster when doing
".get_or_create()". When doing a single join,
peewee is only 23% faster, but when executing a comparable query using a subquery
instead of a join, peewee is almost 60% faster than django!</p>
<div class="highlight"><pre><span></span><code>. | django_bench | peewee_bench | djang diff |
test_creation | 11.441360 | 9.212356 | 19.481986% |
test_get_user_count | 0.048042 | 0.023086 | 51.946383% |
test_list_users | 2.612983 | 2.037555 | 22.021881% |
test_list_users_ordered | 3.022387 | 2.397535 | 20.674121% |
test_get_user | 0.232575 | 0.053378 | 77.049199% |
test_get_or_create | 0.789696 | 0.406243 | 48.557027% |
test_list_blogs_for_user | 1.423771 | 0.646669 | 54.580533% |
test_list_entries_for_user | 1.684540 | 1.292385 | 23.279645% |
test_list_entries_subquery | 2.413966 | 0.979424 | 59.426772% |
</code></pre></div>
<p>Benchmark code can be viewed <a href="https://github.com/coleifer/peewee/tree/master/bench/">here</a>.
The benchmarks were run against Django 1.2.3 final. Both benchmarks used an
on-disk SQLite database.</p>
<h2>Aspects of the design</h2>
<h3>Cribbed from Django</h3>
<ul>
<li>Declarative model definitions</li>
<li>Arbitrarily complex querying with "Q" objects</li>
<li>Using a double-underscore to denote a special query lookup</li>
<li>Exposing the reverse side of a ForeignKey relationship as a descriptor</li>
<li>Iterating over a query result causes evaluation</li>
</ul>
<h3>Off in left-field</h3>
<ul>
<li>Joins are denoted explicitly</li>
<li>Where clauses</li>
<li>Pagination as opposed to limiting/offsetting/slicing</li>
<li>Only support for SQLite (at the moment)</li>
</ul>
<h3>Examples of some queries</h3>
<div class="highlight"><pre><span></span><code><span class="c1"># a simple query selecting a user</span>
<span class="n">User</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="s1">'charles'</span><span class="p">)</span>
<span class="c1"># get the staff and super users</span>
<span class="n">editors</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">select</span><span class="p">()</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">Q</span><span class="p">(</span><span class="n">is_staff</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="o">|</span> <span class="n">Q</span><span class="p">(</span><span class="n">is_superuser</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span>
<span class="c1"># get tweets by editors using a subquery</span>
<span class="n">Tweet</span><span class="o">.</span><span class="n">select</span><span class="p">()</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">user__in</span><span class="o">=</span><span class="n">editors</span><span class="p">)</span>
<span class="c1"># get tweets by editors using a join</span>
<span class="n">Tweet</span><span class="o">.</span><span class="n">select</span><span class="p">()</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">User</span><span class="p">)</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">Q</span><span class="p">(</span><span class="n">is_staff</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="o">|</span> <span class="n">Q</span><span class="p">(</span><span class="n">is_superuser</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span>
<span class="c1"># how many active users are there?</span>
<span class="n">User</span><span class="o">.</span><span class="n">select</span><span class="p">()</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">active</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="c1"># paginate the user table and show me page 3 (users 41-60)</span>
<span class="n">User</span><span class="o">.</span><span class="n">select</span><span class="p">()</span><span class="o">.</span><span class="n">order_by</span><span class="p">((</span><span class="s1">'username'</span><span class="p">,</span> <span class="s1">'asc'</span><span class="p">))</span><span class="o">.</span><span class="n">paginate</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">20</span><span class="p">)</span>
<span class="c1"># order users by number of tweets</span>
<span class="n">User</span><span class="o">.</span><span class="n">select</span><span class="p">({</span>
<span class="n">User</span><span class="p">:</span> <span class="p">[</span><span class="s1">'*'</span><span class="p">],</span>
<span class="n">Tweet</span><span class="p">:</span> <span class="p">[</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="s1">'num_tweets'</span><span class="p">)]</span>
<span class="p">})</span><span class="o">.</span><span class="n">group_by</span><span class="p">(</span><span class="s1">'id'</span><span class="p">)</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">Tweet</span><span class="p">)</span><span class="o">.</span><span class="n">order_by</span><span class="p">((</span><span class="s1">'num_tweets'</span><span class="p">,</span> <span class="s1">'desc'</span><span class="p">))</span>
</code></pre></div>
<h2>Example App</h2>
<p><a href="https://media.charlesleifer.com/blog/photos/tweepee.jpg" title="tweepee"><img alt="tweepee" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/tweepee.jpg?key=S5RqVLRc5yTZySIKn8gilw=="/></a></p>
<p>I originally wrote peewee to provide some lightweight persistence for
<a href="http://flask.pocoo.org/">flask</a> apps. As an example app, I've written a
(cheesy) twitter-alike --
the <a href="https://github.com/coleifer/peewee/blob/master/examples/twitter/app.py">entire app</a>
is under 250 lines of code and exemplifies quite a few of peewee's features.</p>
<p>For instructions on running the example app yourself, or for an in-depth
walkthrough, check the <a href="http://docs.peewee-orm.com/en/latest/peewee/example.html">example app docs</a>.</p>
<h2>Documentation</h2>
<p>If you're interested in readming more, please check out the
<a href="http://docs.peewee-orm.com/">documentation</a>. The docs are currently broken up into 3 main sections:</p>
<ul>
<li><a href="http://docs.peewee-orm.com/en/latest/peewee/example.html">Example App</a></li>
<li><a href="http://docs.peewee-orm.com/en/latest/peewee/models.html">Model API</a></li>
<li><a href="http://docs.peewee-orm.com/en/latest/peewee/querying.html">Querying API</a></li>
</ul>
<h2>Conclusion</h2>
<p>I've had a great time working on this code and plan on continuing to develop on it. Please feel free to <a href="http://github.com/coleifer/peewee">contribute</a> if you're interested! I'll close with a quote from the <a href="http://www.faqs.org/docs/artu/">Art of Unix Programming</a>:</p>
<blockquote>
<p><em>"Software design and implementation should be a joyous art, a kind of high-level play."</em></p>
</blockquote>
<h2>EDIT:</h2>
<p>Based on reader interest, I put together a set of benchmarks for SQLAlchemy's ORM. The only benchmark I had trouble replicating was the <em>test_list_entries_subquery</em> one, but the other ones should be right. Any SQLAlchemy users out there, I'd appreciate if you could sanity check my <a href="https://github.com/coleifer/peewee/blob/master/bench/sqlalc_bench/bench.py">benchmarking code</a>! SQLAlchemy outperformed Peewee when listing the blogs for a user (selecting on the back side of a ForeignKey), but performance was quite a bit slower in all other tests:</p>
<div class="highlight"><pre><span></span><code>. |django_bench |peewee_bench |sqlalc_bench | djang diff | sqlal diff |
test_creation | 10.585136 | 9.183795 | 16.681107 | 13.238762% | 44.944931% |
test_get_user_count | 0.040043 | 0.019856 | 0.085400 | 50.413509% | 76.749471% |
test_list_users | 2.769415 | 2.124650 | 3.540615 | 23.281629% | 39.992065% |
test_list_users_ordered | 3.154166 | 2.533434 | 3.901886 | 19.679759% | 35.071555% |
test_get_user | 0.236253 | 0.053331 | 0.221075 | 77.426364% | 75.876565% |
test_get_or_create | 0.644203 | 0.434922 | 0.618659 | 32.486832% | 29.699242% |
test_list_blogs_for_user | 1.683750 | 0.684124 | 0.661601 | 59.369032% | -3.404306% |
test_list_entries_for_user | 1.866349 | 1.330273 | 2.662108 | 28.723246% | 50.029331% |
test_list_entries_subquery | 2.621584 | 1.029036 | 0.002033 | 60.747545% | N/A |
</code></pre></div>Django Patterns: Model Inheritancehttp://charlesleifer.com/blog/django-patterns-model-inheritance/2010-10-09T14:41:33Z2010-10-09T14:41:33Zcharles leifer<p>This post discusses the two flavors of model inheritance supported by Django,
some of their use-cases as well as some potential gotchas.</p>
<h2>Overview</h2>
<p>When the queryset refactor landed a couple years ago, Django's ORM grew support for model inheritance. Model inheritance comes in two flavors, abstract and ... not. What are the important differences in how Django handles these two types of inheritance?</p>
<h3>Multi-table inheritance (not abstract)</h3>
<p>Directly extending a model results in two tables where the shared fields are
stored in one table (the parent model's table) and the fields unique to the
child model are stored on the child model's table. The child model contains a foreign key to the parent model and whenever queried automatically includes the joins.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Media</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">255</span><span class="p">)</span>
<span class="n">pub_date</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">Photo</span><span class="p">(</span><span class="n">Media</span><span class="p">):</span> <span class="c1"># note that Photo extends Media</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ImageField</span><span class="p">(</span><span class="n">upload_to</span><span class="o">=</span><span class="s1">'photos'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Video</span><span class="p">(</span><span class="n">Media</span><span class="p">):</span>
<span class="n">video</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">FileField</span><span class="p">(</span><span class="n">upload_to</span><span class="o">=</span><span class="s1">'videos'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">VideoWithThumbnail</span><span class="p">(</span><span class="n">Video</span><span class="p">,</span> <span class="n">Photo</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Querying this object will result in 3 inner joins on filters/gets</span>
<span class="sd"> Saving/deleting will require at least 4 queries, but in my testing</span>
<span class="sd"> saving actually required 10 queries and deleting 13!</span>
<span class="sd"> """</span>
<span class="k">pass</span>
</code></pre></div>
<p>Because of the way these items are stored in the database, it is possible to
query against <em>all</em> media objects, whether they're photos, videos, or just
plain-old <code>Media</code> objects. Querying <code>Media</code>, the base class, will return <code>Media</code> instances without either the special <code>photo</code> or <code>video</code> fields:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Media</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span> <span class="c1"># get all the media objects, photos or videos</span>
<span class="go">[<Media: Media object>, <Media: Media object>]</span>
<span class="gp">>>> </span><span class="n">Photo</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span> <span class="c1"># just the photos</span>
<span class="go">[<Photo: Photo object>]</span>
<span class="gp">>>> </span><span class="n">Video</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span> <span class="c1"># just the videos</span>
<span class="go">[<Video: Video object>]</span>
</code></pre></div>
<p>I find multi-table inheritance useful in the following circumstances:</p>
<ul>
<li>Query against all objects of a type, i.e. all media</li>
<li>Relate (via a ForeignKey/M2M) to all objects of a type</li>
</ul>
<p>I see two main downsides to this type of inheritance:</p>
<ul>
<li>More queries: every insert/update/delete must cascade to all the tables
in the inheritance chain</li>
<li>More joins: every select must against join against all tables
in the inheritance chain.</li>
</ul>
<p>There are a couple other things to watch out for:</p>
<ul>
<li>Since the parent model and all the descendants have unique content types,
Generic ForeignKeys can be a bit cumbersome.</li>
<li>Django's model signals do not cascade to child models, so a post_save handler registered <em>sender=Media</em> would not be called when a Photo object gets saved.</li>
<li>You can't override fields defined on subclasses (true for mixins as well). This makes sense if you think about how this data is being stored.</li>
</ul>
<p>If you're interested, there's a neat project called <a href="http://github.com/bconstantin/django_polymorphic">django_polymorphic</a>
that optimizes this type of inheritance and lets you query the base class and always
returns the "most specific subclass" transparently.</p>
<p>The rest of this post will deal with Abstract models.</p>
<h3>Abstract Models, or "Mixins"</h3>
<p>OK, multi-table inheritance is admittedly a pretty complicated affair and there a quite a few things to watch out for. Defining a model as abstract and using it as a mixin I find much more intuitive, especially at the database level.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">AbstractMedia</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">255</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">abstract</span> <span class="o">=</span> <span class="kc">True</span> <span class="c1"># <--- denotes our model as abstract</span>
<span class="k">class</span> <span class="nc">Photo</span><span class="p">(</span><span class="n">AbstractMedia</span><span class="p">):</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ImageField</span><span class="p">(</span><span class="n">upload_to</span><span class="o">=</span><span class="s1">'photos'</span><span class="p">)</span>
</code></pre></div>
<p>Basically, an abstract model doesn't get a table. This has several implications:</p>
<ul>
<li>Subclasses contain all the fields on their table (no joining/parent-fk)</li>
<li>Abstract model can't be queried against</li>
<li>Abstract model cannot have a ForeignKey or M2M to it</li>
</ul>
<p>As Eric Florenzano pointed out in his talk
<a href="http://djangocon.blip.tv/file/4112452/">"Why Django Sucks and How We Can Fix It"</a>,
abstract models introduce a whole new set of problems:</p>
<ul>
<li>trading implementation for configuration - this is probably more in reference to the idea that abstract models provide an elegant solution to the <a href="http://python.mirocommunity.org/video/1867/djangocon-2010-rethinking-the-">"reuable app problem"</a></li>
<li>extra level of indirection</li>
<li>what fields does my model have, and what base class provided them?</li>
</ul>
<p>That being said, there are definitely valid use-cases for abstract models.
Commonly you'll hear abstract models referred to as "mixins" and this pretty
well describes what I see as their main strength. They allow bits
of functionality to be wrapped up in a class and reused in many models without forcing you to incur the extra database overhead.</p>
<p>I read in <em>Head First: Design Patterns</em> to "favor composition over inheritance", which I take to mean that it's better to build your objects out of smaller pieces than extending/overriding and creating a tall class hierarchy.</p>
<h3>Looking at some examples</h3>
<p>Mixins can do just about everything that normal models can do, so you can use
them to encapsulate bits of common model functionality. One thing I find myself
doing a lot is auto-generating a slug for my models that have title fields.
This can be wrapped up neatly using a mixin:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span><span class="p">,</span> <span class="n">IntegrityError</span><span class="p">,</span> <span class="n">transaction</span>
<span class="kn">from</span> <span class="nn">django.template.defaultfilters</span> <span class="kn">import</span> <span class="n">slugify</span>
<span class="k">class</span> <span class="nc">TitleSlugModel</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">255</span><span class="p">)</span>
<span class="n">slug</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">SlugField</span><span class="p">(</span><span class="n">unique</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">abstract</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">save</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Based on the Tag save() method in django-taggit, this method simply</span>
<span class="sd"> stores a slugified version of the title, ensuring that the unique</span>
<span class="sd"> constraint is observed</span>
<span class="sd"> """</span>
<span class="bp">self</span><span class="o">.</span><span class="n">slug</span> <span class="o">=</span> <span class="n">slug</span> <span class="o">=</span> <span class="n">slugify</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">title</span><span class="p">)</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">savepoint</span> <span class="o">=</span> <span class="n">transaction</span><span class="o">.</span><span class="n">savepoint</span><span class="p">()</span>
<span class="n">res</span> <span class="o">=</span> <span class="nb">super</span><span class="p">(</span><span class="n">TitleSlugModel</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">transaction</span><span class="o">.</span><span class="n">savepoint_commit</span><span class="p">(</span><span class="n">savepoint</span><span class="p">)</span>
<span class="k">return</span> <span class="n">res</span>
<span class="k">except</span> <span class="n">IntegrityError</span><span class="p">:</span>
<span class="n">transaction</span><span class="o">.</span><span class="n">savepoint_rollback</span><span class="p">(</span><span class="n">savepoint</span><span class="p">)</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">slug</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">_</span><span class="si">%d</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">slug</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">ContentObject</span><span class="p">(</span><span class="n">TitleSlugModel</span><span class="p">):</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
</code></pre></div>
<p>Another common thing I find myself doing is storing a pub/created/modified date.
In this example the <em>ContentObject</em> model will be composed of two abstract models,
the <em>TitleSlug</em> model and the new <em>DateAwareModel</em>. Note that both of the ABCs
are overriding the save() method, but that, because of python's method resolution
order, everything gets called as we would expect.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">datetime</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span><span class="p">,</span> <span class="n">IntegrityError</span><span class="p">,</span> <span class="n">transaction</span>
<span class="kn">from</span> <span class="nn">django.template.defaultfilters</span> <span class="kn">import</span> <span class="n">slugify</span>
<span class="k">class</span> <span class="nc">TitleSlugModel</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># everything the same as above</span>
<span class="k">def</span> <span class="nf">save</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">print</span> <span class="s1">'title slug model save() called'</span>
<span class="c1"># ... same as above except adding a "print" statement at the top</span>
<span class="c1"># of the method</span>
<span class="k">class</span> <span class="nc">DateAwareModel</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">pub_date</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="n">modified_date</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="n">created_date</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">abstract</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">save</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">print</span> <span class="s1">'date aware model save() called'</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">pk</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">created_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">modified_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">DateAwareModel</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">ContentObject</span><span class="p">(</span><span class="n">TitleSlugModel</span><span class="p">,</span> <span class="n">DateAwareModel</span><span class="p">):</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">save</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">print</span> <span class="s1">'content model save() called'</span>
<span class="nb">super</span><span class="p">(</span><span class="n">ContentObject</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</code></pre></div>
<p>Here's some sample output from the shell showing that all 3 save() methods are
called and our model gets a slug and a modified date.</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">media.models</span> <span class="kn">import</span> <span class="o">*</span>
<span class="gp">>>> </span><span class="n">content_obj</span> <span class="o">=</span> <span class="n">ContentObject</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s1">'testing'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">content_obj</span><span class="o">.</span><span class="n">pub_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2010</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">content_obj</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="go">content model save() called</span>
<span class="go">title slug model save() called</span>
<span class="go">date aware model save() called</span>
<span class="gp">>>> </span><span class="n">content_obj</span><span class="o">.</span><span class="n">modified_date</span>
<span class="go">datetime.datetime(2010, 10, 9, 13, 31, 57, 782511)</span>
<span class="gp">>>> </span><span class="n">content_obj</span><span class="o">.</span><span class="n">slug</span>
<span class="go">u'testing'</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>Thanks for reading, I hope you found this post informative! Both multi-table
inheritance and abstract models have their place in the django
developer's toolkit. There's potential for some serious overhead when using
multi-table inheritance, but you gain the ability to query against all objects
of a base type. Abstract base classes avoid the database overhead by not
creating explicit links in the database, but you lose the ability to query across
subclasses. As always, any comments, feedback, suggestions, errata, etc are
appreciated.</p>
<h2>Links</h2>
<ul>
<li><a href="http://www.eflorenzano.com/blog/post/exploring-mixins-django-model-inheritance/">Exploring mixins with Django Model Inheritance</a></li>
<li><a href="http://toastdriven.com/fresh/abstract-model-metadata/">Abstract Model Metadata</a></li>
<li><a href="http://howiworkdaily.com/post/2008/jun/17/django-tutorial-abstract-base-classes-vs-model-inh/">Abstract Base Class vs. Multi-table Inheritance</a></li>
<li><a href="http://djangocon.blip.tv/file/4112452/">Why Django Sucks, and How We Can Fix It</a></li>
<li>bconstantin's <a href="http://github.com/bconstantin/django_polymorphic">django_polymorphic</a></li>
</ul>Django Patterns: Pluggable Backendshttp://charlesleifer.com/blog/django-patterns-pluggable-backends/2010-09-15T11:20:02Z2010-09-15T11:20:02Zcharles leifer<p>As the first installment in a series on common patterns in <a href="http://www.djangoproject.com/">Django</a> development, I'd like to discuss the <em>Pluggable Backend</em> pattern. The pattern addresses the common problem of providing extensible support for multiple implementations of a lower-level function, for example caching, database querying, etc.</p>
<h2>Problem</h2>
<p>The use of this pattern often coincides with places where the application needs to be configurable to use one of many possible solutions, as in the case of database engine support. Consider the following:</p>
<p><a href="https://media.charlesleifer.com/blog/photos/pluggable-interfaces.png" title="Pluggable Interfaces"><img alt="Pluggable Interfaces" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/pluggable-interfaces.png?key=TAtC1eETNq9dL81sscSZYg=="/></a></p>
<p>The application needs to support <em>Backend A</em> and <em>Backend B</em> but if you look
closely at the methods exposed there are some discrepancies:</p>
<ul>
<li>Backend A has slightly more verbose method names than Backend B</li>
<li>Backend B does not accept a default value on the get() method</li>
</ul>
<p>In Django we see this pattern all over the place:</p>
<ul>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/contrib/auth/backends.py">django.contrib.auth</a></li>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/contrib/messages/storage">django.contrib.messages</a></li>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/contrib/sessions/backends">django.contrib.sessions</a></li>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/core/cache/backends">django.core.cache</a></li>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/core/files/storage.py">django.core.files.storage</a></li>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/core/mail/backends">django.core.mail</a></li>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/core/serializers">django.core.serializers</a></li>
<li><a href="http://code.djangoproject.com/browser/django/branches/releases/1.2.X/django/db/backends">django.db</a></li>
</ul>
<h2>Analysis</h2>
<p>Here is a first stab at getting our Application to talk to backend A and B:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">ApplicationBackend</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">backend</span><span class="p">):</span>
<span class="c1"># here we are</span>
<span class="bp">self</span><span class="o">.</span><span class="n">backend</span> <span class="o">=</span> <span class="n">backend</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="p">,</span> <span class="n">BackendA</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">get_data</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="p">,</span> <span class="n">BackendB</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="k">except</span> <span class="n">BackendB</span><span class="o">.</span><span class="n">KeyError</span><span class="p">:</span>
<span class="k">return</span> <span class="n">default</span>
<span class="k">def</span> <span class="nf">set</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="o">...</span> <span class="n">etc</span> <span class="o">...</span>
</code></pre></div>
<p>Notice how tightly-coupled our Application is to backends A & B. If backend C
comes along, then we're in our code adding extra <code>elif</code> checks all over the
place. What if an end-user wants to write support for a proprietary backend?
Then they have to go into your code and add the special-casing -- there has to
be a better way!</p>
<h2>Solution</h2>
<p>The solution is to add a layer of abstraction between your application and the
backend that unifies the APIs.</p>
<p><a href="https://media.charlesleifer.com/blog/photos/pluggable-adapters.png" title="Pluggable Adapters"><img alt="Pluggable Adapters" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/pluggable-adapters.png?key=KJxRCeYlGAAxGdC2l--jYw=="/></a></p>
<p>Most commonly, you will specify your API in a BaseBackend (in this case the
BaseAdapter class is the BaseBackend). You will also specify some way
of routing between the interface you expose through the Application and the
Adapter that communicates directly with the Backend. In Django this is usually
done by specifying a path to your module then dynamically importing the backend
at runtime.</p>
<p>Let's see some code:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">ApplicationBackend</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Our application ships all logic off to the adapter, which has a</span>
<span class="sd"> single, unified interface</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">adapter</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">adapter</span> <span class="o">=</span> <span class="n">adapter</span> <span class="c1"># we'll cover dynamic loading below</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">adapter</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">set</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">adapter</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>As you can see, our application only has to know how to talk to the BaseAdapter, which in this case implements two methods, <code>get()</code> and <code>set()</code>. Which adapter our application uses is configured at instantiation. Here is what the <code>BaseAdapter</code> looks like. It provides a default implementation of <code>get()</code> and <code>set()</code>, but could just as easily raise a <code>NotImplementedError</code> and force every subclass to define its own implementation:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">BaseAdapter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">backend</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_backend</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Since Python does not have interfaces, its common to raise</span>
<span class="sd"> NotImplementedErrors when specifying a base class that you wish to</span>
<span class="sd"> act as an interface. If you did not want to specify a default</span>
<span class="sd"> behavior but leave all implementation up to your adapters, you</span>
<span class="sd"> would raise an exception here</span>
<span class="sd"> """</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">set</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="sd">"""Same applies here as for the get() method"""</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>Now it is just a matter of writing our specific adapters for <em>Backend A</em> and <em>Backend B</em>:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">AdapterA</span><span class="p">(</span><span class="n">BaseAdapter</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Since BackendA uses different method names, we need to override the</span>
<span class="sd"> default behavior specified by the BaseBackend</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">get_backend</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">connect_to_backend_a</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">get_data</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">set</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">set_data</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">AdapterB</span><span class="p">(</span><span class="n">BaseAdapter</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Since BackendB does not support a default value for the get() operation,</span>
<span class="sd"> we'll be sure to wrap the call in a try/except, catching the error and</span>
<span class="sd"> returning default.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">get_backend</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">connect_to_backend_b</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">AdapterB</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="k">except</span> <span class="bp">self</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">KeyError</span><span class="p">:</span>
<span class="k">return</span> <span class="n">default</span>
</code></pre></div>
<p>In the code snippets above, our Application loaded its adapter at initialization. Django rarely does this, favoring a setting instead. Let's look at an example of how you might allow a module path to be used to
specify the default backend:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.conf</span> <span class="kn">import</span> <span class="n">settings</span>
<span class="kn">from</span> <span class="nn">django.utils.importlib</span> <span class="kn">import</span> <span class="n">import_module</span>
<span class="c1"># provide a sane default?</span>
<span class="n">APP_ADAPTER</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="s1">'APP_ADAPTER'</span><span class="p">,</span> <span class="s1">'app.backends.adapter_a.AdapterA'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_adapter</span><span class="p">():</span>
<span class="c1"># grab the classname off of the backend string</span>
<span class="n">package</span><span class="p">,</span> <span class="n">klass</span> <span class="o">=</span> <span class="n">APP_ADAPTER</span><span class="o">.</span><span class="n">rsplit</span><span class="p">(</span><span class="s1">'.'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># dynamically import the module, in this case app.backends.adapter_a</span>
<span class="n">module</span> <span class="o">=</span> <span class="n">import_module</span><span class="p">(</span><span class="n">package</span><span class="p">)</span>
<span class="c1"># pull the class off the module and return</span>
<span class="k">return</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">module</span><span class="p">,</span> <span class="n">klass</span><span class="p">)</span>
</code></pre></div>
<h2>Strengths and Weaknesses</h2>
<p>This pattern allows loose coupling between the API you expose and the underlying code that does the actual work. The loose coupling also makes our interface extensible, as additional implementations
can be written without needing to touch the actual application code.</p>
<p>The biggest weakness is feature loss, as this is generally a lowest-common-denominator solution. Suppose backend A has some
awesome features that are not supported by backend B - in the interest of
maintaining a consistent interface you're stuck either leaving those features
out or implementing them yourself in AdapterB.</p>
<h2>Other Uses</h2>
<p>There are other uses for this pattern besides talking to various
cache/db/storage backends. Spammers recently targeted some of my company's
sites, hitting things like Comments and Blog Entries. We needed a solution that
would work for these content types, as well as for any we decided to add down
the road. The various models that needed filtering had a lot in common - like
the user that created them (and their email address and IP), the content field
that contained the spammy links, etc, but the fields were named something
different, or were on a related model. We could have used introspection or done
a special-case solution, but instead I opted for a pluggable approach.</p>
<p><a href="https://media.charlesleifer.com/blog/photos/spam-filter.png" title="Spam Filter"><img alt="Spam Filter" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/spam-filter.png?key=C4nGNUbZOoFRodupHXxehQ=="/></a></p>
<p>The big difference between this example and the example above is that the routing logic is baked right into the <code>BaseBackend</code>, which in this case is the <code>SpamFilter</code> itself. So the <code>SpamFilter</code> class contains not only the logic for handling spammy content, but also contains a registry of more specialized spam filters. The workflow is something like this:</p>
<ul>
<li>Create custom filters for any models you need to special case, in this case comments and blog entries</li>
<li>Register those filters with the <code>SpamFilter</code> so it can use them when comments or blog entries come through</li>
<li>Whenever new user-generated-content is created, send it to the <code>SpamFilter.check_spam()</code> method</li>
<li>The <code>SpamFilter</code> instance will see if it has a filter for the new piece of content, falling back to a default implementation</li>
</ul>
<p>The code works like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">SpamFilter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="n">_filters</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">add_filter</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">filter_class</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_filters</span><span class="p">[</span><span class="n">model</span><span class="p">]</span> <span class="o">=</span> <span class="n">filter_class</span>
<span class="k">def</span> <span class="nf">remove_filter</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
<span class="k">del</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_filters</span><span class="p">[</span><span class="n">model</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">get_filter_for_object</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="c1"># return the proper filter to use for this model instance - if one</span>
<span class="c1"># does not exist, fall back to the default implementation provided</span>
<span class="c1"># by the SpamFilter class (which uses introspection)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">filter_class</span><span class="p">)</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_filters</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">model_instance</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
<span class="k">return</span> <span class="n">filter_class</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="nf">check_spam</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="c1"># grab the correct filter to use for this model</span>
<span class="n">spam_filter</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_filter_for_object</span><span class="p">(</span><span class="n">model_instance</span><span class="p">)</span>
<span class="c1"># use our custom backend to get the right fields off the object</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">spam_filter</span><span class="o">.</span><span class="n">get_user</span><span class="p">(</span><span class="n">model_instance</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">spam_filter</span><span class="o">.</span><span class="n">get_content</span><span class="p">(</span><span class="n">model_instance</span><span class="p">)</span>
<span class="c1"># call out to Akismet, or whatever here</span>
<span class="n">object_is_spam</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">make_api_call</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">content</span><span class="p">)</span>
<span class="k">if</span> <span class="n">object_is_spam</span><span class="p">:</span>
<span class="c1"># if the object is spam, allow the spam filter to specify a</span>
<span class="c1"># callback that will do the appropriate thing. with comments</span>
<span class="c1"># this generally means marking is_public = False, with blogs</span>
<span class="c1"># it means setting the status to a special spam flag</span>
<span class="n">spam_filter</span><span class="o">.</span><span class="n">object_is_spam</span><span class="p">(</span><span class="n">model_instance</span><span class="p">)</span>
<span class="k">return</span> <span class="n">object_is_spam</span>
<span class="k">def</span> <span class="nf">get_user</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="c1"># introspect the model - subclasses should override</span>
<span class="k">def</span> <span class="nf">get_content</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="c1"># introspect the model - subclasses should override</span>
<span class="k">def</span> <span class="nf">object_is_spam</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="c1"># default behavior when spam is found is to mail the managers</span>
<span class="n">mail_managers</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CommentFilter</span><span class="p">(</span><span class="n">SpamFilter</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_user</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="k">return</span> <span class="n">model_instance</span><span class="o">.</span><span class="n">user</span>
<span class="k">def</span> <span class="nf">get_content</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="k">return</span> <span class="n">model_instance</span><span class="o">.</span><span class="n">comment</span>
<span class="k">def</span> <span class="nf">object_is_spam</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model_instance</span><span class="p">):</span>
<span class="n">model_instance</span><span class="o">.</span><span class="n">is_public</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">model_instance</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="n">spam_filter</span> <span class="o">=</span> <span class="n">SpamFilter</span><span class="p">()</span>
<span class="n">spam_filter</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Comment</span><span class="p">,</span> <span class="n">CommentFilter</span><span class="p">)</span>
</code></pre></div>
<h2>Conclusion</h2>
<p>I hope you found this information useful. It is one of the more common patterns I see both in Django and in the wider sphere of reusable apps. I'm planning a couple more entries in this vein, <em>Django Patterns</em>, so keep an eye out for new posts! As always, any comments, feedback, suggestions, errata, etc are appreciated. Thanks for reading.</p>
<h2>More Examples</h2>
<ul>
<li><a href="http://github.com/worldcompany/djangoembed">djangoembed</a>
<a href="http://github.com/worldcompany/djangoembed/tree/master/oembed/image_processors/">thumbnailing</a>
and <a href="http://github.com/worldcompany/djangoembed/tree/master/oembed/parsers/">parsing</a></li>
<li><a href="http://haystacksearch.org/">haystack</a> <a href="http://github.com/toastdriven/django-haystack/tree/master/haystack/backends/">search backends</a></li>
<li><a href="http://code.google.com/p/queues/">queues</a></li>
</ul>Generating aggregate data across generic relationshttp://charlesleifer.com/blog/generating-aggregate-data-across-generic-relations/2010-05-22T19:22:24Z2010-05-22T19:22:24Zcharles leifer<p><strong>Edit:</strong> I've created a <a href="http://github.com/coleifer/django-generic-aggregation">github repo</a> for performing generic aggregation and annotation based on the code from this entry</p>
<p><a href="http://docs.djangoproject.com/en/dev/topics/db/aggregation/">Aggregation</a> support was added to django's ORM in version 1.1, allowing you to generate Sums, Counts, and more without having to write any SQL. According to the <a href="http://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#generic-relations-and-aggregation">docs</a> aggregation is not supported for generic relations. This entry describes how to work around this using the <strong>.extra()</strong> method.</p>
<h2>The state of the art</h2>
<p>To take an example from the docs, it is possible to span relationships when performing aggregations:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Store</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">min_price</span><span class="o">=</span><span class="n">Min</span><span class="p">(</span><span class="s1">'books__price'</span><span class="p">),</span> <span class="n">max_price</span><span class="o">=</span><span class="n">Max</span><span class="p">(</span><span class="s1">'books__price'</span><span class="p">))</span>
</code></pre></div>
<p>Here we are querying the Store object and annotating the result set with two extra attributes, 'min_price' and 'max_price', which contain the minimum and maximum price of books that are sold at that store. Conversely, if we want to find the minimum and maximum book price over the entire queryset, you would write:</p>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="n">Store</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">min_price</span><span class="o">=</span><span class="n">Min</span><span class="p">(</span><span class="s1">'books__price'</span><span class="p">),</span> <span class="n">max_price</span><span class="o">=</span><span class="n">Max</span><span class="p">(</span><span class="s1">'books__price'</span><span class="p">))</span>
</code></pre></div>
<p>The aggregate() method returns a dictionary as opposed to a queryset. This is an incredibly clean API!</p>
<h2>Suppose you want to aggregate across a <a href="http://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/">GFK</a></h2>
<p>This is tricky. A <a href="http://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#id1">generic foreign key</a> is comprised of two attributes, a ContentType and a foreign key. The models that are GFKed to do not contain a reverse relationship to the GFK model by default. This is a little obscure, but basically it means that you can create a django Comment <strong>on</strong> any object since it supports generic relations but you can't go <strong>from</strong> any model to their associated comments (without creating a <a href="http://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#reverse-generic-relations">reverse generic relation</a>).</p>
<p>So assume you have a simple weblog and would like to sort entries by which have the most comments. Unfortunately, this does not work:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Entry</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="o">...</span>
<span class="n">comments</span> <span class="o">=</span> <span class="n">generic</span><span class="o">.</span><span class="n">GenericRelation</span><span class="p">(</span><span class="n">Comment</span><span class="p">)</span> <span class="c1"># reverse generic relation</span>
<span class="o">...</span>
<span class="n">Entry</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">count</span><span class="o">=</span><span class="n">Count</span><span class="p">(</span><span class="s1">'comments'</span><span class="p">))</span> <span class="c1"># does not work!</span>
</code></pre></div>
<p>There's a <a href="http://code.djangoproject.com/ticket/10870">ticket</a> for this, but it is marked for the 1.3 milestone. So how I solved this in the meantime was to use the <strong>extra</strong> method. Here's what it might look like:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Count</span>
<span class="kn">from</span> <span class="nn">django.contrib.comments.models</span> <span class="kn">import</span> <span class="n">Comment</span>
<span class="kn">from</span> <span class="nn">django.contrib.contenttypes.models</span> <span class="kn">import</span> <span class="n">ContentType</span>
<span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Entry</span>
<span class="n">ctype</span> <span class="o">=</span> <span class="n">ContentType</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_for_model</span><span class="p">(</span><span class="n">Entry</span><span class="p">)</span>
<span class="n">qs</span> <span class="o">=</span> <span class="n">Entry</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">extra</span><span class="p">(</span><span class="n">select</span><span class="o">=</span><span class="p">{</span>
<span class="s1">'count'</span><span class="p">:</span> <span class="s2">"""</span>
<span class="s2"> SELECT COUNT(*) AS comment_count</span>
<span class="s2"> FROM django_comments</span>
<span class="s2"> WHERE</span>
<span class="s2"> content_type_id=</span><span class="si">%s</span><span class="s2"> AND</span>
<span class="s2"> object_pk=CAST(blog_entries.id as text)</span>
<span class="s2"> """</span>
<span class="p">},</span>
<span class="n">select_params</span><span class="o">=</span><span class="p">[</span><span class="n">ctype</span><span class="o">.</span><span class="n">pk</span><span class="p">],</span>
<span class="n">order_by</span><span class="o">=</span><span class="p">[</span><span class="s1">'-count'</span><span class="p">])</span>
</code></pre></div>
<p>This code essentially performs a subquery for every entry returned, which calculates the number of comments associated with that entry. Two 'magical' things are happening here:</p>
<ol>
<li>Inside the CAST() we're referring to <strong>blog_entries.id</strong> -- this is a field that is being retrieved by default, outside of the inner query.</li>
<li>The <strong>order_by</strong> is inside the <strong>extra()</strong> -- note how the COUNT() function is returning its result as "comment_count", but the select dictionary is keyed using "count". It is the key we specify that can be used for custom ordering, as opposed to whatever is used in the query.</li>
</ol>
<h2>Genericizing it</h2>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">generic_annotate</span><span class="p">(</span><span class="n">queryset</span><span class="p">,</span> <span class="n">gfk_field</span><span class="p">,</span> <span class="n">aggregate_field</span><span class="p">,</span> <span class="n">aggregator</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">Sum</span><span class="p">,</span> <span class="n">desc</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="n">ordering</span> <span class="o">=</span> <span class="n">desc</span> <span class="ow">and</span> <span class="s1">'-score'</span> <span class="ow">or</span> <span class="s1">'score'</span>
<span class="n">content_type</span> <span class="o">=</span> <span class="n">ContentType</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_for_model</span><span class="p">(</span><span class="n">queryset</span><span class="o">.</span><span class="n">model</span><span class="p">)</span>
<span class="c1"># collect the params we'll be using</span>
<span class="n">params</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">aggregator</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="c1"># the function that's doing the aggregation</span>
<span class="n">aggregate_field</span><span class="p">,</span> <span class="c1"># the field containing the value to aggregate</span>
<span class="n">gfk_field</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">db_table</span><span class="p">,</span> <span class="c1"># table holding gfk'd item info</span>
<span class="n">gfk_field</span><span class="o">.</span><span class="n">ct_field</span><span class="p">,</span> <span class="c1"># the content_type field on the GFK</span>
<span class="n">content_type</span><span class="o">.</span><span class="n">pk</span><span class="p">,</span> <span class="c1"># the content_type id we need to match</span>
<span class="n">gfk_field</span><span class="o">.</span><span class="n">fk_field</span><span class="p">,</span> <span class="c1"># the object_id field on the GFK</span>
<span class="n">queryset</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">db_table</span><span class="p">,</span> <span class="c1"># the table and pk from the main</span>
<span class="n">queryset</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">pk</span><span class="o">.</span><span class="n">name</span> <span class="c1"># part of the query</span>
<span class="p">)</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">extra</span><span class="p">(</span><span class="n">select</span><span class="o">=</span><span class="p">{</span>
<span class="s1">'score'</span><span class="p">:</span> <span class="s2">"""</span>
<span class="s2"> SELECT </span><span class="si">%s</span><span class="s2">(</span><span class="si">%s</span><span class="s2">) AS aggregate_score</span>
<span class="s2"> FROM </span><span class="si">%s</span><span class="s2"></span>
<span class="s2"> WHERE</span>
<span class="s2"> </span><span class="si">%s</span><span class="s2">_id=</span><span class="si">%s</span><span class="s2"> AND</span>
<span class="s2"> </span><span class="si">%s</span><span class="s2">=</span><span class="si">%s</span><span class="s2">.</span><span class="si">%s</span><span class="s2"></span>
<span class="s2"> """</span> <span class="o">%</span> <span class="n">params</span>
<span class="p">},</span>
<span class="n">order_by</span><span class="o">=</span><span class="p">[</span><span class="n">ordering</span><span class="p">])</span>
<span class="k">return</span> <span class="n">queryset</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">misc</span> <span class="kn">import</span> <span class="n">generic_annotate</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Entry</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">tagging.models</span> <span class="kn">import</span> <span class="n">TaggedItem</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Count</span>
<span class="gp">>>> </span><span class="n">qs</span> <span class="o">=</span> <span class="n">generic_annotate</span><span class="p">(</span><span class="n">Entry</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">(),</span> <span class="n">TaggedItem</span><span class="o">.</span><span class="n">object</span><span class="p">,</span> <span class="s1">'id'</span><span class="p">,</span> <span class="n">Count</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">qs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">score</span>
<span class="go">5L</span>
<span class="gp">>>> </span><span class="n">qs</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">score</span>
<span class="go">4L</span>
<span class="gp">>>> </span><span class="n">qs</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">tags</span>
<span class="go">u'databases django many-to-many python'</span>
</code></pre></div>
<p>Note that this example works for situations in which you'd use annotate(), but it doesn't allow you to mimic aggregate(), which is useful for generating summary-type data about items in a queryset. This is not too difficult to accomplish - simply invert the logic a bit so that we generate the aggregate on the outside and the result set on the inside:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">generic_aggregate</span><span class="p">(</span><span class="n">queryset</span><span class="p">,</span> <span class="n">gfk_field</span><span class="p">,</span> <span class="n">aggregate_field</span><span class="p">,</span> <span class="n">aggregator</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">Sum</span><span class="p">):</span>
<span class="n">content_type</span> <span class="o">=</span> <span class="n">ContentType</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_for_model</span><span class="p">(</span><span class="n">queryset</span><span class="o">.</span><span class="n">model</span><span class="p">)</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'pk'</span><span class="p">)</span> <span class="c1"># just the pks</span>
<span class="n">sql</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">query</span><span class="o">.</span><span class="n">as_sql</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span> <span class="o">%</span> <span class="n">queryset</span><span class="o">.</span><span class="n">query</span><span class="o">.</span><span class="n">as_sql</span><span class="p">()[</span><span class="mi">1</span><span class="p">]</span>
<span class="c1"># collect the params we'll be using</span>
<span class="n">params</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">aggregator</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="c1"># the function that's doing the aggregation</span>
<span class="n">aggregate_field</span><span class="p">,</span> <span class="c1"># the field containing the value to aggregate</span>
<span class="n">gfk_field</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">db_table</span><span class="p">,</span> <span class="c1"># table holding gfk'd item info</span>
<span class="n">gfk_field</span><span class="o">.</span><span class="n">ct_field</span><span class="p">,</span> <span class="c1"># the content_type field on the GFK</span>
<span class="n">content_type</span><span class="o">.</span><span class="n">pk</span><span class="p">,</span> <span class="c1"># the content_type id we need to match</span>
<span class="n">gfk_field</span><span class="o">.</span><span class="n">fk_field</span><span class="p">,</span> <span class="c1"># the object_id field on the GFK</span>
<span class="n">sql</span>
<span class="p">)</span>
<span class="n">query</span> <span class="o">=</span> <span class="s2">"""</span>
<span class="s2"> SELECT </span><span class="si">%s</span><span class="s2">(</span><span class="si">%s</span><span class="s2">) AS aggregate_score</span>
<span class="s2"> FROM </span><span class="si">%s</span><span class="s2"></span>
<span class="s2"> WHERE</span>
<span class="s2"> </span><span class="si">%s</span><span class="s2">_id=</span><span class="si">%s</span><span class="s2"> AND</span>
<span class="s2"> </span><span class="si">%s</span><span class="s2"> IN (</span>
<span class="s2"> </span><span class="si">%s</span><span class="s2"></span>
<span class="s2"> )</span>
<span class="s2"> """</span> <span class="o">%</span> <span class="n">params</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
<span class="n">row</span> <span class="o">=</span> <span class="n">cursor</span><span class="o">.</span><span class="n">fetchone</span><span class="p">()</span>
<span class="k">return</span> <span class="n">row</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">misc</span> <span class="kn">import</span> <span class="n">generic_aggregate</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Entry</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">tagging.models</span> <span class="kn">import</span> <span class="n">TaggedItem</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.db.models</span> <span class="kn">import</span> <span class="n">Count</span>
<span class="gp">>>> </span><span class="n">qs</span> <span class="o">=</span> <span class="n">generic_aggregate</span><span class="p">(</span><span class="n">Entry</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">(),</span> <span class="n">TaggedItem</span><span class="o">.</span><span class="n">object</span><span class="p">,</span> <span class="s1">'id'</span><span class="p">,</span> <span class="n">Count</span><span class="p">)</span>
<span class="go">106L # the total number of times a tag was added to an entry</span>
</code></pre></div>
<h2>More!</h2>
<ul>
<li>Check out the project that generated this entry, <a href="http://github.com/coleifer/django-simple-ratings">django-simple-ratings</a> -- more to come on this!</li>
<li><a href="http://code.djangoproject.com/ticket/10870">Ticket 10870</a></li>
<li><a href="http://docs.djangoproject.com/en/dev/topics/db/aggregation/">Aggregation</a> docs</li>
<li><a href="http://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#id1">GFK</a> docs</li>
</ul>Self-referencing many-to-many throughhttp://charlesleifer.com/blog/self-referencing-many-many-through/2010-01-16T18:36:37Z2010-01-16T18:36:37Zcharles leifer<p>Django's ManyToMany through attribute allows you to describe relationships between objects. I've written a post about this - (<a href="/blog/describing-relationships-djangos-manytomany-through/">Describing Relationships, Django's ManyToMany Through</a>) - and so I won't cover here the details of its implementation or usage. What I want to talk about in this post is how to create ManyToMany relationships between objects of the same kind, and more than that, to show how those relationships can be described using through models.</p>
<h3>Asymmetrical Relationships - the Twitter model</h3>
<p>On twitter you follow people. Maybe some people follow you, but the relationships are all in one direction, asymmetrical. In Django you can implement this using a ManyToMany relationship. We don't need a special <em>through</em> model for this, but suppose we wanted to attach some metadata to those relationships. Below is sample code for a twitter-style database of people and their relationships with one another. The relationships carry a <em>status</em> column denoting whether a particular user is <em>following</em> another or <em>blocking</em> another:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="n">relationships</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="s1">'self'</span><span class="p">,</span> <span class="n">through</span><span class="o">=</span><span class="s1">'Relationship'</span><span class="p">,</span>
<span class="n">symmetrical</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">related_name</span><span class="o">=</span><span class="s1">'related_to'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__unicode__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span>
<span class="n">RELATIONSHIP_FOLLOWING</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">RELATIONSHIP_BLOCKED</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">RELATIONSHIP_STATUSES</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">(</span><span class="n">RELATIONSHIP_FOLLOWING</span><span class="p">,</span> <span class="s1">'Following'</span><span class="p">),</span>
<span class="p">(</span><span class="n">RELATIONSHIP_BLOCKED</span><span class="p">,</span> <span class="s1">'Blocked'</span><span class="p">),</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">Relationship</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">from_person</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Person</span><span class="p">,</span> <span class="n">related_name</span><span class="o">=</span><span class="s1">'from_people'</span><span class="p">)</span>
<span class="n">to_person</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Person</span><span class="p">,</span> <span class="n">related_name</span><span class="o">=</span><span class="s1">'to_people'</span><span class="p">)</span>
<span class="n">status</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">(</span><span class="n">choices</span><span class="o">=</span><span class="n">RELATIONSHIP_STATUSES</span><span class="p">)</span>
</code></pre></div>
<p>Taking a look at the models, what's important to note is that on the Person model I've created a ManyToMany to <code>self</code> through <code>Relationship</code>. The attribute <code>asymmetrical</code> is True, but when you're using intermediary models in Django this is a must because Django won't know exactly how to describe the other side of relationship since the through model may have any number of fields besides ForeignKeys. Which brings up the next model, Relationship. Relationship has two foreign keys to Person, and a status, which indicates the type of relationship 'from_person' has to 'to_person'. Now, let's add some methods to the Person model to make it easier to talk about how these relationships can be used:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">add_relationship</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">person</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="n">relationship</span><span class="p">,</span> <span class="n">created</span> <span class="o">=</span> <span class="n">Relationship</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span>
<span class="n">from_person</span><span class="o">=</span><span class="bp">self</span><span class="p">,</span>
<span class="n">to_person</span><span class="o">=</span><span class="n">person</span><span class="p">,</span>
<span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="p">)</span>
<span class="k">return</span> <span class="n">relationship</span>
<span class="k">def</span> <span class="nf">remove_relationship</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">person</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="n">Relationship</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">from_person</span><span class="o">=</span><span class="bp">self</span><span class="p">,</span>
<span class="n">to_person</span><span class="o">=</span><span class="n">person</span><span class="p">,</span>
<span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="p">)</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="k">return</span>
</code></pre></div>
<p>Adding and removing relationships requires no magic - we can deal directly with the Relationship model and create or delete instances of it. If we wanted to find out who is following a user, though, it's sort of obnoxious to query Relationship and then extract the people from the returned queryset. This is where the ManyToMany comes in. We can query the 'relationships' (and its partner 'related_to') to look at Relationship objects and return the people they refer to. Here are some more methods for the Person model:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">get_relationships</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">relationships</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">to_people__status</span><span class="o">=</span><span class="n">status</span><span class="p">,</span>
<span class="n">to_people__from_person</span><span class="o">=</span><span class="bp">self</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_related_to</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">related_to</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">from_people__status</span><span class="o">=</span><span class="n">status</span><span class="p">,</span>
<span class="n">from_people__to_person</span><span class="o">=</span><span class="bp">self</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_following</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_relationships</span><span class="p">(</span><span class="n">RELATIONSHIP_FOLLOWING</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_followers</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_related_to</span><span class="p">(</span><span class="n">RELATIONSHIP_FOLLOWING</span><span class="p">)</span>
</code></pre></div>
<p>Looking at the actual SQL helps me understand what these ORM incantations actually mean. Creating a relationship between two users is a simple INSERT into the relationships table. But reading relationships out and referring them back to people in a meaningful and efficient way is the biggest win of using the ManyToMany. Here is the SQL for getting who a person is following:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="n">twitter_person</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">twitter_person</span><span class="p">.</span><span class="n">name</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="n">twitter_person</span><span class="w"></span>
<span class="k">INNER</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">twitter_relationship</span><span class="w"></span>
<span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="p">(</span><span class="n">twitter_person</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">to_person_id</span><span class="p">)</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">from_person_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>This is opposed to what the ORM would run if we got a Relationship queryset and then iterated over it to find out who the 'to_user' was:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"> </span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">from_person_id</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">to_person_id</span><span class="p">,</span><span class="w"> </span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">status</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="n">twitter_relationship</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">AND</span><span class="w"></span>
<span class="w"> </span><span class="n">twitter_relationship</span><span class="p">.</span><span class="n">from_person_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="c1">-- followed by this for every twitter user returned:</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">twitter_person</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">X</span><span class="w"></span>
</code></pre></div>
<p>It's generally much more efficient to use the JOIN and execute just one query. The 'get_relationships' and 'get_related_to' are simple wrappers around filter which creates the appropriate query. Here's an example of what you might do:</p>
<div class="highlight"><pre><span></span><code><span class="go">In [1]: from twitter.models import Person</span>
<span class="go">In [2]: john = Person.objects.create(name='John')</span>
<span class="go">In [3]: paul = Person.objects.create(name='Paul')</span>
<span class="go">In [4]: from twitter.models import RELATIONSHIP_FOLLOWING</span>
<span class="go">In [5]: john.add_relationship(paul, RELATIONSHIP_FOLLOWING)</span>
<span class="go">Out[5]: <Relationship: Relationship object></span>
<span class="go">In [6]: john.get_following()</span>
<span class="go">Out[6]: [<Person: Paul>]</span>
<span class="go">In [7]: paul.get_followers()</span>
<span class="go">Out[7]: [<Person: John>]</span>
<span class="go">In [8]: paul.add_relationship(john, RELATIONSHIP_FOLLOWING)</span>
<span class="go">Out[8]: <Relationship: Relationship object></span>
<span class="go">In [9]: paul.get_following()</span>
<span class="go">Out[9]: [<Person: John>]</span>
<span class="go">In [10]: yoko = Person.objects.create(name='Yoko')</span>
<span class="go">In [11]: john.add_relationship(yoko, RELATIONSHIP_FOLLOWING)</span>
<span class="go">Out[11]: <Relationship: Relationship object></span>
<span class="go">In [12]: paul.remove_relationship(john, RELATIONSHIP_FOLLOWING)</span>
<span class="go">In [13]: john.get_following()</span>
<span class="go">Out[13]: [<Person: Paul>, <Person: Yoko>]</span>
<span class="go">In [14]: paul.get_following()</span>
<span class="go">Out[14]: []</span>
</code></pre></div>
<p>Now, let's add one more thing to the mix. Say that if two people are following eachother, we'll call them 'friends'. How you would implement this is by combining the two queries for get_followers and get_following:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">get_friends</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">relationships</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">to_people__status</span><span class="o">=</span><span class="n">RELATIONSHIP_FOLLOWING</span><span class="p">,</span>
<span class="n">to_people__from_person</span><span class="o">=</span><span class="bp">self</span><span class="p">,</span>
<span class="n">from_people__status</span><span class="o">=</span><span class="n">RELATIONSHIP_FOLLOWING</span><span class="p">,</span>
<span class="n">from_people__to_person</span><span class="o">=</span><span class="bp">self</span><span class="p">)</span>
</code></pre></div>
<h3>Symmetrical Relationships - the Facebook model</h3>
<p>Django's ManyToManyField allows you to specify a 'symmetrical' attribute, but you cannot use this when also specifying a 'through' model. We can actually use most of the model definitions from above -- the only change will be to the ManyToMany field:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="n">relationships</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="s1">'self'</span><span class="p">,</span> <span class="n">through</span><span class="o">=</span><span class="s1">'Relationship'</span><span class="p">,</span>
<span class="n">symmetrical</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">related_name</span><span class="o">=</span><span class="s1">'related_to+'</span><span class="p">)</span>
</code></pre></div>
<p>It's hard to spot the difference. Note the plus-sign at the end of <code>related_name</code>. This indicates to Django that the reverse relationship should not be exposed. Since the relationships are symmetrical, this is the desired behavior, after all, if I am friends with person A, then person A is friends with me. Django won't create the symmetrical relationships for you, so a bit needs to get added to the add_relationship and remove_relationship methods to explicitly handle the other side of the relationship:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">add_relationship</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">person</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">symm</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="n">relationship</span><span class="p">,</span> <span class="n">created</span> <span class="o">=</span> <span class="n">Relationship</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span>
<span class="n">from_person</span><span class="o">=</span><span class="bp">self</span><span class="p">,</span>
<span class="n">to_person</span><span class="o">=</span><span class="n">person</span><span class="p">,</span>
<span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="p">)</span>
<span class="k">if</span> <span class="n">symm</span><span class="p">:</span>
<span class="c1"># avoid recursion by passing `symm=False`</span>
<span class="n">person</span><span class="o">.</span><span class="n">add_relationship</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
<span class="k">return</span> <span class="n">relationship</span>
<span class="k">def</span> <span class="nf">remove_relationship</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">person</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">symm</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="n">Relationship</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">from_person</span><span class="o">=</span><span class="bp">self</span><span class="p">,</span>
<span class="n">to_person</span><span class="o">=</span><span class="n">person</span><span class="p">,</span>
<span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="p">)</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="k">if</span> <span class="n">symm</span><span class="p">:</span>
<span class="c1"># avoid recursion by passing `symm=False`</span>
<span class="n">person</span><span class="o">.</span><span class="n">remove_relationship</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
</code></pre></div>
<p>Now, whenever we create a relationship going one way, its complement is created (or removed). Since the relationships go in both directions, we can get rid of the following/followers stuff and simply use:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">get_relationships</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">relationships</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">to_people__status</span><span class="o">=</span><span class="n">status</span><span class="p">,</span>
<span class="n">to_people__from_person</span><span class="o">=</span><span class="bp">self</span><span class="p">)</span>
</code></pre></div>
<h3>Using it in the admin</h3>
<p>You may want to use access the Relationships in the context of a Person in the admin. Since the Relationship model has two foreign keys to Person, the underlying code that instantiates the inlines will blow up unless you specify a ForeignKey to use. Here's how to make it work:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># admin.py</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="kn">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">twitter.models</span> <span class="kn">import</span> <span class="n">Person</span><span class="p">,</span> <span class="n">Relationship</span>
<span class="k">class</span> <span class="nc">RelationshipInline</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">StackedInline</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Relationship</span>
<span class="n">fk_name</span> <span class="o">=</span> <span class="s1">'from_person'</span>
<span class="k">class</span> <span class="nc">PersonAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">inlines</span> <span class="o">=</span> <span class="p">[</span><span class="n">RelationshipInline</span><span class="p">]</span>
<span class="n">admin</span><span class="o">.</span><span class="n">site</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">Person</span><span class="p">,</span> <span class="n">PersonAdmin</span><span class="p">)</span>
</code></pre></div>
<h3>Doing other cool stuff</h3>
<p>The pattern here can be used to do a lot more than just describe who's friends with who. One possible improvement is to normalize status types into its own proper model, so status-types can be defined more dynamically. You could even make a Relationship's status a ManyToMany itself. Another possible use would be if you had a tree-like structure but wanted to describe relationships between Nodes that may not be direct descendants of one another. Anyways, that's about it for this post - I hope you found it useful!</p>
<h3>Relationships App</h3>
<ul>
<li><a href="http://github.com/coleifer/django-relationships">django-relationships</a></li>
</ul>Looking at registration patterns in Djangohttp://charlesleifer.com/blog/looking-registration-patterns-django/2010-01-10T14:37:57Z2010-01-10T14:37:57Zcharles leifer<p>Most developers who have written a Django application are familiar with the admin interface. In this post I'll talk about the way the <code>admin</code> module uses a registration pattern to allow tools like <code>admin.autodiscover()</code> and <code>admin.site.urls</code> to do their magic.</p>
<p>Registration patterns are useful when developing flexible and extensible libraries. By specifying an interface and allowing you to register your custom implementations, the library code remains decoupled from your own custom code.</p>
<p>To get an idea of how these patterns work, let's take a look at the <code>django.contrib.admin.sites</code> module, we find a class called <code>AdminSite</code> which is instantiated at the bottom of the file (essentially a singleton that is used by default across your apps). The first lines of the <code>__init__</code> method reveal that at the heart of this class, there's an attribute called <code>_registry</code>, which is a dictionary of <code>Model</code> classes and <code>ModelAdmin</code> instances.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">app_name</span><span class="o">=</span><span class="s1">'admin'</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_registry</span> <span class="o">=</span> <span class="p">{}</span> <span class="c1"># model_class class -> admin_class instance</span>
</code></pre></div>
<p>When we import <code>admin</code> and run <code>admin.site.register()</code>, the <em>register</em> method on <code>AdminSite</code> is called, which performs some validation and then adds the model/modeladmin to its internal dictionary:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Instantiate the admin class to save in the registry</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_registry</span><span class="p">[</span><span class="n">model</span><span class="p">]</span> <span class="o">=</span> <span class="n">admin_class</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span>
</code></pre></div>
<p>When we include <code>admin.site.urls</code> in the urlconf, the <code>urls</code> property refers to the method <code>get_urls()</code>, at the bottom of which is this code:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Add in each model's views.</span>
<span class="k">for</span> <span class="n">model</span><span class="p">,</span> <span class="n">model_admin</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_registry</span><span class="o">.</span><span class="n">iteritems</span><span class="p">():</span>
<span class="n">urlpatterns</span> <span class="o">+=</span> <span class="n">patterns</span><span class="p">(</span><span class="s1">''</span><span class="p">,</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^</span><span class="si">%s</span><span class="s1">/</span><span class="si">%s</span><span class="s1">/'</span> <span class="o">%</span> <span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">app_label</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">module_name</span><span class="p">),</span>
<span class="n">include</span><span class="p">(</span><span class="n">model_admin</span><span class="o">.</span><span class="n">urls</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div>
<p>In this way, URL patterns are created for all the models we've registered. Inside the <code>ModelAdmin</code>, there is a property called <code>urls</code> which similarly refers to a <code>get_urls()</code> method - this method exposes the CRUD views. Thus we have a system whereby any number of models can register admin classes with the admin system. We don't have to create url patterns for all the models we wish to use, and we can lean on the <code>ModelAdmin</code> class to provide the functionality we need, extending it as much or as little as we want without having to change the way the AdminSite works.</p>
<p>The last bit of the admin I want to talk about is its <code>autodiscover()</code> method, which lives in <code>contrib.admin.__init__</code>. On line 4, we can see that it is importing the AdminSite class and the singleton <code>site</code> instance:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.contrib.admin.sites</span> <span class="kn">import</span> <span class="n">AdminSite</span><span class="p">,</span> <span class="n">site</span>
</code></pre></div>
<p>Here is the code that does the importing. It figures out the path of each app in <code>INSTALLED_APPS</code>, looks for an <em>admin</em> module inside each package, and finally imports it. When the admin module is imported, all those calls to <code>admin.site.register()</code> get executed and the AdminSite's internal registry gets populated.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">imp</span>
<span class="kn">from</span> <span class="nn">django.conf</span> <span class="kn">import</span> <span class="n">settings</span>
<span class="k">for</span> <span class="n">app</span> <span class="ow">in</span> <span class="n">settings</span><span class="o">.</span><span class="n">INSTALLED_APPS</span><span class="p">:</span>
<span class="c1"># For each app, we need to look for an admin.py inside that app's</span>
<span class="c1"># package. We can't use os.path here -- recall that modules may be</span>
<span class="c1"># imported different ways (think zip files) -- so we need to get</span>
<span class="c1"># the app's __path__ and look for admin.py on that path.</span>
<span class="c1"># Step 1: find out the app's __path__ Import errors here will (and</span>
<span class="c1"># should) bubble up, but a missing __path__ (which is legal, but weird)</span>
<span class="c1"># fails silently -- apps that do weird things with __path__ might</span>
<span class="c1"># need to roll their own admin registration.</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">app_path</span> <span class="o">=</span> <span class="n">import_module</span><span class="p">(</span><span class="n">app</span><span class="p">)</span><span class="o">.</span><span class="n">__path__</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
<span class="k">continue</span>
<span class="c1"># Step 2: use imp.find_module to find the app's admin.py. For some</span>
<span class="c1"># reason imp.find_module raises ImportError if the app can't be found</span>
<span class="c1"># but doesn't actually try to import the module. So skip this app if</span>
<span class="c1"># its admin.py doesn't exist</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">imp</span><span class="o">.</span><span class="n">find_module</span><span class="p">(</span><span class="s1">'admin'</span><span class="p">,</span> <span class="n">app_path</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
<span class="k">continue</span>
<span class="c1"># Step 3: import the app's admin file. If this has errors we want them</span>
<span class="c1"># to bubble up.</span>
<span class="n">import_module</span><span class="p">(</span><span class="s2">"</span><span class="si">%s</span><span class="s2">.admin"</span> <span class="o">%</span> <span class="n">app</span><span class="p">)</span>
<span class="c1"># autodiscover was successful, reset loading flag.</span>
<span class="n">LOADING</span> <span class="o">=</span> <span class="kc">False</span>
</code></pre></div>
<h3>Another approach: Metaclasses and Django's Models</h3>
<p>Django's models are a pretty awesome feat of programming. They, too, implement a registry pattern. Taking a look at django.db.models.base, at the top of the file there's a class definition for ModelBase, which subclasses 'type'. It overrides the <a href="http://code.djangoproject.com/browser/django/trunk/django/db/models/base.py#L27"><strong>new</strong></a> method of type, adding custom behavior to all models (which specify ModelBase as their metaclass) when they are created. At the bottom of the <code>__new__</code> method, on <a href="http://code.djangoproject.com/browser/django/trunk/django/db/models/base.py#L190">line 190</a>, there is the following:</p>
<div class="highlight"><pre><span></span><code><span class="n">register_models</span><span class="p">(</span><span class="n">new_class</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">app_label</span><span class="p">,</span> <span class="n">new_class</span><span class="p">)</span>
<span class="c1">#...</span>
<span class="k">return</span> <span class="n">get_model</span><span class="p">(</span><span class="n">new_class</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">app_label</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
</code></pre></div>
<p>Following the code to <code>db.models.loading</code>, we see a class definition for an object called <code>AppCache</code> and at the bottom of the file that class being instantiated (<a href="http://code.djangoproject.com/browser/django/trunk/django/db/models/loading.py#L202">cache = AppCache()</a>) - this is similar to the way AdminSite works. When <code>register_models()</code> is called in <code>ModelBase</code>, it populates AppCache's internal registry (the "app_models" attribute) with a key/value of app_label -> SortedDict(), which in turn is a dictionary of model name / model class pairs. The <code>__new__</code> method of ModelBase calls its parent-class' <code>__new__</code> method to create a class, adds a ton of functionality to that newly-created class, and then registers it with AppCache. If you look at Model's class definition, which follows ModelBase in <code>db.models.base</code>, you will see:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Model</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="n">__metaclass__</span> <span class="o">=</span> <span class="n">ModelBase</span>
</code></pre></div>
<p>Whenever a subclass of Model is created, it uses ModelBase to create itself, allowing all this functionality to happen automatically, behind-the-scenes. To see how this stuff gets used, take a look at syncdb's code (core.management.commands.syncdb). It calls <code>get_apps()</code> on the AppCache, which is a SortedDict of app_label -> SortedDict(), which in turn is a dictionary of model name / model classes:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Create the tables for each model</span>
<span class="k">for</span> <span class="n">app</span> <span class="ow">in</span> <span class="n">models</span><span class="o">.</span><span class="n">get_apps</span><span class="p">():</span>
<span class="n">app_name</span> <span class="o">=</span> <span class="n">app</span><span class="o">.</span><span class="vm">__name__</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="n">model_list</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">get_models</span><span class="p">(</span><span class="n">app</span><span class="p">)</span>
<span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">model_list</span><span class="p">:</span>
<span class="c1"># Create the model's database table, if it doesn't already exist.</span>
<span class="c1"># ...</span>
</code></pre></div>
<p><a href="http://code.djangoproject.com/browser/django/trunk/django/db/models/__init__.py#L4">Line 4</a> of django.db.models.<strong>init</strong> imports AppCache functions, so they are available directly from django.db.models:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.db.models.loading</span> <span class="kn">import</span> <span class="n">get_apps</span><span class="p">,</span> <span class="n">get_app</span><span class="p">,</span> <span class="n">get_models</span><span class="p">,</span> <span class="n">get_model</span><span class="p">,</span> <span class="n">register_models</span>
</code></pre></div>
<p>This is a very powerful pattern, and it is talked about in greater detail in Marty Alchin's book <a href="http://prodjango.com/">Pro Django</a>.</p>
<h3>The plug-in approach to metaclasses</h3>
<p>A similar approach to using Metaclasses is discussed in <em>Pro Django</em> as a way of creating a plug-in system. A real-life example of this technique can be found in the code for <a href="http://code.google.com/p/oohembed/">oohEmbed</a>.</p>
<p>Looking at <a href="http://code.google.com/p/oohembed/source/browse/app/provider/base.py#7">line 7</a> of <code>base.py</code>, there is the following Metaclass definition:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">ProviderMount</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">attrs</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="s1">'plugins'</span><span class="p">):</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">plugins</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">plugins</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">cls</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_providers</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">p</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">plugins</span><span class="p">]</span>
</code></pre></div>
<p>It does not override the <code>__new__</code> method, as ModelBase does, but it does do some interesting stuff in the <code>__init__</code>. It creates a list attribute on the class called <code>plugins</code> when Provider is defined, then for all subclasses of Provider the metaclass adds the class being created to this list. When <code>get_providers</code> is called, it iterates over the plugins, instantiating them and returing them as a list.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Provider</span><span class="p">:</span>
<span class="n">__metaclass__</span> <span class="o">=</span> <span class="n">ProviderMount</span>
</code></pre></div>
<p>The Provider class specifies ProviderMount as its metaclass, and taking a look at <code>videoprovider.py</code>, on <a href="http://code.google.com/p/oohembed/source/browse/app/provider/videoprovider.py#11">line 11</a> a YouTubeProvider is defined, inheriting from Provider. In the app's main.py, Provider is imported and all the provider subclasses are retrieved and instantiated:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">provider</span> <span class="kn">import</span> <span class="o">*</span>
<span class="k">class</span> <span class="nc">EndPoint</span><span class="p">(</span><span class="n">webapp</span><span class="o">.</span><span class="n">RequestHandler</span><span class="p">):</span>
<span class="n">providers</span> <span class="o">=</span> <span class="n">Provider</span><span class="o">.</span><span class="n">get_providers</span><span class="p">()</span>
</code></pre></div>
<p>Whenever a request comes in, the get handler iterates over the providers, searching for the right one.</p>
<div class="highlight"><pre><span></span><code><span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">providers</span><span class="p">:</span>
</code></pre></div>
<h3>Implementing it yourself</h3>
<p>The way you choose to implement this pattern depends on your needs. The Admin method has an explicit registration step, as well as the ability to programmatically unregister a model. I've seen this technique used to unregister a default ModelAdmin (for example one included in contrib) and register your own version that adds some functionality. The metaclass method removes the explicit step of registration/discovery and handles autodiscovery at class-creation, but still relies on import-time side-effects.</p>
<p>However you end up doing it, I hope this post was informative! Any corrections / points of clarification are appreciated.</p>
<h3>Further Reading</h3>
<ul>
<li><a href="http://prodjango.com">Pro Django</a></li>
<li><a href="http://code.djangoproject.com/browser/django/trunk/django/contrib/admin/sites.py">django.contrib.admin.sites</a></li>
<li><a href="http://code.djangoproject.com/browser/django/trunk/django/contrib/admin/__init__.py">django.contrib.admin.<strong>init</strong></a></li>
<li><a href="http://code.djangoproject.com/browser/django/trunk/django/db/models/base.py">django.db.models.base</a></li>
<li><a href="http://code.djangoproject.com/browser/django/trunk/django/db/models/loading.py">django.db.models.loading</a></li>
<li><a href="http://code.google.com/p/oohembed/source/browse/app/provider/base.py">oohEmbed source</a></li>
</ul>Describing Relationships: Django's ManyToMany Throughhttp://charlesleifer.com/blog/describing-relationships-djangos-manytomany-through/2009-11-03T09:34:52Z2009-11-03T09:34:52Zcharles leifer<p>In this post I'll describe extending many-to-many relationships in Django to support additional columns on the junction table. I'll be using the following table structure as the starting place. Imagine an RSS aggregator consisting of feeds, articles and categories. Articles come from a feed and may belong to any number of categories.</p>
<p><a href="https://media.charlesleifer.com/blog/photos/many-to-many.png" title="ManyToMany"><img alt="ManyToMany" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/many-to-many.png?key=yDY4K6Ajp46FizWVtEqspA=="/></a></p>
<p>Suppose I wanted to implement a white-listing feature, wherein articles would only get added to categories if they matched a list of keywords assigned to that category. This is a good time to use a <em>many-to-many through</em> relationship. The idea here is that we are describing the intersection of two objects:</p>
<p><a href="https://media.charlesleifer.com/blog/photos/many-to-many-thru.png" title="ManyToMany Through"><img alt="ManyToMany Through" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/many-to-many-thru.png?key=DWhqweaWNjO32s1AMnMotg=="/></a></p>
<p class="caption">The intersection of feeds and categories contains an extra piece of data: which filter to apply to that feed before inserting articles into the category.</p>
<h3>Django Implementation</h3>
<p>Here is how the schema translates into Django model definitions:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Feed</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">255</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">URLField</span><span class="p">()</span>
<span class="c1"># Note the `through` keyword argument:</span>
<span class="n">categories</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="n">Category</span><span class="p">,</span> <span class="n">through</span><span class="o">=</span><span class="s1">'FeedCategoryRelationship'</span><span class="p">)</span>
<span class="n">source</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Source</span><span class="p">)</span>
<span class="n">last_download</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateField</span><span class="p">(</span><span class="n">auto_now</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">new_articles_added</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">PositiveSmallIntegerField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">editable</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">active</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="o">...</span>
<span class="k">class</span> <span class="nc">FeedCategoryRelationship</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">feed</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Feed</span><span class="p">)</span>
<span class="n">category</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Category</span><span class="p">)</span>
<span class="n">white_list</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="n">WhiteListFilter</span><span class="p">,</span> <span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="o">...</span>
</code></pre></div>
<p>Accessing related data is a snap:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">perform_download</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">"""Download articles associated with this feed"""</span>
<span class="k">for</span> <span class="n">category</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">categories</span><span class="o">.</span><span class="n">all</span><span class="p">():</span>
<span class="c1"># Directly query the junction model:</span>
<span class="n">relationship_queryset</span> <span class="o">=</span> <span class="n">FeedCategoryRelationship</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">feed</span><span class="o">=</span><span class="bp">self</span><span class="p">,</span> <span class="n">category</span><span class="o">=</span><span class="n">category</span><span class="p">)</span>
<span class="k">for</span> <span class="n">relationship</span> <span class="ow">in</span> <span class="n">relationship_queryset</span><span class="o">.</span><span class="n">all</span><span class="p">():</span>
<span class="n">whitelist</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">white_list</span> <span class="ow">in</span> <span class="n">relationship</span><span class="o">.</span><span class="n">white_list</span><span class="o">.</span><span class="n">all</span><span class="p">():</span>
<span class="n">whitelist</span> <span class="o">+=</span> <span class="n">white_list</span><span class="o">.</span><span class="n">keywords</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span>
<span class="o">...</span>
</code></pre></div>
<h3>Admin Interface</h3>
<p><a href="https://media.charlesleifer.com/blog/photos/django-admin.png" title="ManyToMany Through Admin"><img alt="ManyToMany Through Admin" class="img-responsive" src="https://m.charlesleifer.com/t/800x-/blog/photos/django-admin.png?key=uRXTDXuemPUMRX5WKSH5Nw=="/></a></p>
<p>By default, the <code>FeedCategoryRelationship</code> is not exposed in either the <code>Category</code> or the <code>Feed</code> admin. To include the junction table in the <code>Feed</code> and <code>Category</code> admin interfaces, you can create an <code>Inline</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">FeedCategoryRelationshipInline</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">TabularInline</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">FeedCategoryRelationship</span>
<span class="n">extra</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">class</span> <span class="nc">FeedAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">inlines</span> <span class="o">=</span> <span class="p">(</span><span class="n">FeedCategoryRelationshipInline</span><span class="p">,)</span>
<span class="k">class</span> <span class="nc">CategoryAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">inlines</span> <span class="o">=</span> <span class="p">(</span><span class="n">FeedCategoryRelationshipInline</span><span class="p">,)</span>
<span class="n">prepopulated_fields</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">"slug"</span><span class="p">:</span> <span class="p">(</span><span class="s2">"name"</span><span class="p">,)</span> <span class="p">}</span>
</code></pre></div>
<h3>Further Reading</h3>
<ul>
<li><a href="http://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships">Extra fields on many-to-many relationships</a></li>
<li><a href="http://docs.djangoproject.com/en/dev/ref/contrib/admin/#working-with-many-to-many-intermediary-models">Working with many-to-many intermediary models</a></li>
</ul>