Relating one data structure to 0..n of another is the cornerstone of relational database design.

One to many

One news feed may contain many articles

By using an intermediary table, a "Many to Many" relationship can be modeled. If I wanted to create some categories for these articles, where:

  1. an article could belong to many categories
  2. a category could contain many articles

I would use a many-to-many relationship.


Categories have many articles, articles may have many categories

It is sometimes useful to attach metadata to the relationship between objects. Say I wanted to implement a white-listing feature, wherein articles would only get added to categories if they matched a list of keywords assigned to that category. This is a good time to use a many-to-many through relationship. The idea here is that we are describing the intersection of two objects:

ManyToMany Through

The intersection of feeds and categories contains an extra piece of data: which filter to apply to that feed before inserting articles into the category

Django Implementation

This is essentially the database schema for a project I started this afternoon - Django News. Having written an RSS aggregator in PHP already, it was mostly an issue of figuring out how to implement the same features in Python. I also wanted to add a couple extra features, like infinite-depth categories, a black-list of keywords assignable to feed/categories, and white/black-listing of HTML. Some feeds play nice, containing only links and paragraph tags, while some contain script, embed, img and other tags that you really don't want on your site - so blocking certain HTML elements can be very useful.

The biggest stumbling block I anticipated was implementing the Many-to-many through relationship of feeds, categories, and keyword filters. Django, however, comes to the rescue with a great implementation:

class Feed(models.Model):
    name = models.CharField(max_length=255)
    url = models.URLField()
    categories = models.ManyToManyField(Category, through='FeedCategoryRelationship')
    source = models.ForeignKey(Source)
    last_download = models.DateField(auto_now=True)
    new_articles_added = models.PositiveSmallIntegerField(default=0, editable=False)
    active = models.BooleanField(default=True)

class FeedCategoryRelationship(models.Model):
    feed = models.ForeignKey(Feed)
    category = models.ForeignKey(Category)
    white_list = models.ManyToManyField(WhiteListFilter, blank=True)

Accessing related data is a snap:

def perform_download(self):
    """Download articles associated with this feed"""
    for category in self.categories.all():
        relationship_queryset = FeedCategoryRelationship.objects.filter(feed=self, category=category)

        for relationship in relationship_queryset.all():
            whitelist = []
            for white_list in relationship.white_list.all():
                whitelist += white_list.keywords.split(',')

Admin Interface

ManyToMany Through Admin

By default, the FeedCategoryRelationship is not exposed in either the Category or the Feed admin, so we add it using an inline:

class FeedCategoryRelationshipInline(admin.TabularInline):
    model = FeedCategoryRelationship
    extra = 1

class FeedAdmin(admin.ModelAdmin):
    inlines = (FeedCategoryRelationshipInline,)

class CategoryAdmin(admin.ModelAdmin):
    inlines = (FeedCategoryRelationshipInline,)
    prepopulated_fields = { "slug": ("name",) }

Further Reading

Comments (0)

Commenting has been closed, but please feel free to contact me