This post discusses the two flavors of model inheritance supported by Django, some of their use-cases as well as some potential gotchas.

Overview

When the queryset refactor landed a couple years ago Django's ORM grew support for model inheritance. Model inheritance comes in two flavors, abstract and ... not abstract. This is a bit like the "has-a/is-a" distinction sometimes talked about in object-oriented programming. There are some important differences in how Django handles these two types of inheritance.

Multi-table inheritance (not abstract)

Directly extending a model results in two tables where the shared fields are stored in one table (the parent model's table) and the fields unique to the child model are stored on the child model's table. The child model contains a foreign key to the parent model and whenever queried automatically follows the joins.

class Media(models.Model):
    title = models.CharField(max_length=255)
    pub_date = models.DateTimeField()

class Photo(Media): # note that Photo extends Media
    image = models.ImageField(upload_to='photos')

class Video(Media):
    video = models.FileField(upload_to='videos')

class VideoWithThumbnail(Video, Photo):
    """
    Querying this object will result in 3 inner joins on filters/gets

    Saving/deleting will require at least 4 queries, but in my testing
    saving actually required 10 queries and deleting 13!
    """
    pass

Because of the way these items are stored in the database, it is possible to query against all media objects, whether they're photos, videos, or just plain-old "media" objects. Querying Media, the base class, will return Media instances without either the special "photo" or "video" fields:

>>> Media.objects.all() # get all the media objects, photos or videos
[<Media: Media object>, <Media: Media object>]

>>> Photo.objects.all() # just the photos
[<Photo: Photo object>]

>>> Video.objects.all() # just the videos
[<Video: Video object>]

I find MTI useful in the following circumstances:

  • Query against all objects of a type, i.e. all media
  • Relate (via a ForeignKey/M2M) to all objects of a type

I see two main downsides to this type of inheritance:

  • More queries: every insert/update/delete must cascade to all the tables in the inheritance chain
  • More joins: every select must against join against all tables in the inheritance chain.

There are a couple other things to watch out for:

  • Since the parent model and all the descendants have unique content types, Generic ForeignKeys can be a bit cumbersome.
  • Django's model signals do not cascade to child models, so a post_save handler registered sender=Media would not be called when a Photo object gets saved.
  • You can't override fields defined on subclasses (true for mixins as well)

If you're interested, there's a neat project called django_polymorphic that optimizes this type of inheritance and lets you query the base class and always returns the "most specific subclass" transparently.

The rest of this post will deal with Abstract models.

Abstract Models, or "Mixins"

OK, MTI is admittedly a pretty complicated affair and there a quite a few things to watch out for. Defining a model as abstract and using it as a mixin I find much more intuitive, especially at the database level.

class AbstractMedia(models.Model):
    title = models.CharField(max_length=255)

    class Meta:
        abstract = True # <--- denotes our model as abstract

class Photo(AbstractMedia):
    image = models.ImageField(upload_to='photos')

Basically, an abstract model doesn't get a table. This has several implications:

  • Subclasses contain all the fields on their table (no joining/parent-fk)
  • Abstract model can't be queried against
  • Abstract model cannot have a ForeignKey or M2M to it

As Eric Florenzano pointed out in his talk "Why Django Sucks and How We Can Fix It", abstract models introduce a whole new set of problems:

  • trading implementation for configuration - this is probably more in reference to the idea that abstract models provide an elegant solution to the "reuable app problem"
  • extra level of indirection
  • what fields does my model have, again?

That being said, there are definitely valid use-cases for abstract models. Commonly you'll hear abstract models referred to as "mixins" and this pretty well describes what I see as their main strength. They allow bits of functionality to be wrapped up in a class and reused in many models without forcing you to incur the extra database overhead.

I read in "Head First: Design Patterns" to "favor composition over inheritance", which I take to mean that it's better to build your objects out of smaller pieces than extending/overriding and creating a large, brittle class hierarchy.

Looking at some examples

Mixins can do just about everything that normal models can do, so you can use them to encapsulate bits of common model functionality. One thing I find myself doing a lot is auto-generating a slug for my models that have title fields. This can be wrapped up neatly using a mixin:

from django.db import models, IntegrityError, transaction
from django.template.defaultfilters import slugify

class TitleSlugModel(models.Model):
    title = models.CharField(max_length=255)
    slug = models.SlugField(unique=True)

    class Meta:
        abstract = True

    def save(self, *args, **kwargs):
        """
        Based on the Tag save() method in django-taggit, this method simply
        stores a slugified version of the title, ensuring that the unique
        constraint is observed
        """
        self.slug = slug = slugify(self.title)
        i = 0
        while True:
            try:
                savepoint = transaction.savepoint()
                res = super(TitleSlugModel, self).save(*args, **kwargs)
                transaction.savepoint_commit(savepoint)
                return res
            except IntegrityError:
                transaction.savepoint_rollback(savepoint)
                i += 1
                self.slug = '%s_%d' % (slug, i)


class ContentObject(TitleSlugModel):
    content = models.TextField()

Another common thing I find myself doing is storing a pub/created/modified date. In this example the ContentObject model will be composed of two abstract models, the TitleSlug model and the new DateAwareModel. Note that both of the ABCs are overriding the save() method, but that, because of python's method resolution order, everything gets called as we would expect.

import datetime
from django.db import models, IntegrityError, transaction
from django.template.defaultfilters import slugify

class TitleSlugModel(models.Model):
    # everything the same as above

    def save(self, *args, **kwargs):
        print 'title slug model save() called'

        # ... same as above except adding a "print" statement at the top
        # of the method


class DateAwareModel(models.Model):
    pub_date = models.DateTimeField()
    modified_date = models.DateTimeField()
    created_date = models.DateTimeField()

    class Meta:
        abstract = True

    def save(self, *args, **kwargs):
        print 'date aware model save() called'

        if not self.pk:
            self.created_date = datetime.datetime.now()
        self.modified_date = datetime.datetime.now()

        return super(DateAwareModel, self).save(*args, **kwargs)


class ContentObject(TitleSlugModel, DateAwareModel):
    content = models.TextField()

    def save(self, *args, **kwargs):
        print 'content model save() called'
        super(ContentObject, self).save(*args, **kwargs)

Here's some sample output from the shell showing that all 3 save() methods are called and our model gets a slug and a modified date.

>>> import datetime
>>> from media.models import *
>>> content_obj = ContentObject(title='testing')
>>> content_obj.pub_date = datetime.datetime(2010, 10, 9)
>>> content_obj.save()
content model save() called
title slug model save() called
date aware model save() called

>>> content_obj.modified_date
datetime.datetime(2010, 10, 9, 13, 31, 57, 782511)

>>> content_obj.slug
u'testing'

Conclusion

Thanks for reading, I hope you found this post informative! Both multi-table inheritance and abstract models have their place in the django developer's toolkit. There's potential for some serious overhead when using multi-table inheritance, but you gain the ability to query against all objects of a base type. Abstract base classes avoid the database overhead by not creating explicit links in the database, but you lose the ability to query across subclasses. As always, any comments, feedback, suggestions, errata, etc are appreciated.

Links

Comments (5)

  • Martin Diers | October 2010, at 18:20

    I found a valid use case for abstract Models when I re-wrote django-mptt some time ago, using inheritance (this was before the developer of the app did the same). Here is a case where you really don't care what the extra fields are, because they are "managed" fields, so to speak, which only have meaning in the context of the functionality provided by the abstract class, but in themselves add no additional "definition" to the subclasses.


  • Ken Swift | October 2010, at 04:37

    As far as Django ORM is great for modelling database in python and for updating/creating simple rows etc, in my opinion it is not answer to all problems. I often see people trying to do everything at orm/python side. Why invoking 3 save methods when you can query your database only once? Why writing raw sql is so hard for django users? Why writing function/triggers at database side is so hard? You could easily write some template trigger for slug/date creation which would be much more efficient. Anyhow, presented pattern is very useful, but we need to keep it in mind that it is not the way to solve everything.


  • nielsle | October 2010, at 06:55

    Thank you for a nice write-up.

    By the way, it would be supercool to have a django interface to the non-rectangular arrays that you find in postgresql and oracle.


  • Charles | October 2010, at 14:33

    Ken - I totally agree with you, the ORM can do quite a bit but sometimes SQL is the only way to do something efficiently -- at work we use triggers to denormalize things like comment counts, and I can see them being useful for the title/date stuff I wrote about. As of 1.2 Django supports writing raw sql: http://docs.djangoproject.com/en/dev/topics/db/sql/


  • james canyon | October 2010, at 18:52

    this is an awesome series of articles. keep it up!


Commenting has been closed, but please feel free to contact me