When I first developed this blog (using PHP, of course) I created my entries using raw HTML to generate links, img tags, and any other markup. Besides making my entries difficult to read when I wanted to edit them, it also seemed like I was doing double-duty -- writing both the HTML for the blog entry, and the HTML for the site that displayed the entry. To get around this I switched to using BBCode, a markup which replaces regular HTML tags with simpler, less crufty tags. Using PHP and a couple regular expressions I was able to implement a lightweight, fast BBCode parser. I even was able to get automatic syntax highlighting through PHP's built-in syntax highlighting, which I accessed using a regular-expression replace callback. This system was working great until I switched to Django.

I've been using Django for close to 2 months now and am firmly convinced that it is the framework of choice for developing dynamic sites. If you saw my last post, you'd know that I spent last week in a 5-day-long training given by Jacob Kaplan-Moss, one of the co-creators of Django. While there I worked on a practical example site, implementing new features throughout the week. One of the features I developed during the training was a replacement for my PHP BBCode system. Writing the regular expressions to convert BBCode to markup was not too difficult. The only tricky part was the syntax highlighting. Luckily, Python has a great library available called Pygments which can highlight virtually any language -- a huge bonus from my point-of-view, as currently I can only highlight PHP!

I created a template filter to parse the BBCode and apply the syntax highlighting -- here is how I did it:

import re
from django import template
from django.utils.safestring import mark_safe
from pygments import formatters, highlight, lexers
from pygments.lexers import guess_lexer, get_lexer_by_name

@register.filter
def bbcode(text):
    text = re.sub(re.compile('\[i\](.+?)\[\/i\]', re.DOTALL),
                  '<i>\\1</i>',
                  text)
    text = re.sub(re.compile('\[b\](.+?)\[\/b\]', re.DOTALL),
                  '<strong>\1</strong>',
                  text)
    text = re.sub(re.compile('\[code\](.+?)\[\/code\]', re.DOTALL),
                  highlight_callback,
                  text)
    text = re.sub(re.compile('\[center\](.+?)\[\/center\]', re.DOTALL),
                  '<center>\\1</center>',
                  text)
    text = re.sub(re.compile('\[img\](.+?)\[\/img\]', re.DOTALL),
                  '<img src="\\1" />',
                  text)
    text = re.sub(re.compile('\[img\s*width\s*=\s*(\d+)\](.+?)\[\/img\]', re.DOTALL),
                  '<img src="\\2" style="width: \\1px;" />',
                  text)
    text = re.sub(re.compile('\[url\s*=\s*((https?://)?[^\s]+)\](.+?)\[\/url\]', re.DOTALL),
                  '<a href="\\1">\\3</a>',
                  text)
    text = re.sub(re.compile('\[url\](.+?)\[\/url\]', re.DOTALL),
                  '<a href="\\1">\\1</a>',
                  text)
    return mark_safe(text)

def highlight_callback(match_object):
    try:
        lexer = guess_lexer(match_object.group(1))
    except:
        lexer = get_lexer_by_name("python", stripall=True)
    formatter = formatters.HtmlFormatter()
    result = highlight(match_object.group(1), lexer, formatter)
    return result

To use it in your templates, just do something like this: {{ object.body|bbcode|linebreaks }}

You might be wondering what the re.DOTALL is all about. This makes the '.' character, which is central to all the tag regular expressions, correspond to any character, including newlines. In short, it allows tags to span multiple lines. The other potential gotcha is the mark_safe at the end of the template filter - without this, the HTML generated will be escaped. If you're feeling like it, you can extend the system to accept ANY 'wrapper' style tags (ones that don't take a parameter) with something like this:

text = re.sub(re.compile('\[([a-z]+)\](.+?)\[\/\\1\]', re.DOTALL),
              '<\\1>\\2</\\1>',
              text)

Don't forget to use caching!

Comments (0)

Commenting has been disabled for this entry