Code formatting Django tag
Long ago I wrote a Perl script for representing programming code in HTML using lists. Recently I’ve been converting my pages from HTML to XHTML and have had to redo that script: I had mistakenly nested lists by placing sublists as direct children of the parent list. That isn’t right: sublists need to be children of a list item.
It occurred to me that since my new pages also use Django, that I should be able to write a Django template tag that formats code on the fly; I would then be able to make any changes and have them apply automatically to all code snippets on my site. After learning a bit about xml.dom.minidom for my excerpting partial XHTML project, I realized it was perfect for the task.
[toggle code]
- from django import template
- import re
- from xml.dom import minidom
- #display code as a list
-
def do_code(parser, token):
- nodelist = parser.parse(('endcode',))
- parser.delete_first_token()
- return codeNode(nodelist)
-
class codeNode(template.Node):
-
def __init__(self, nodelist):
- self.nodelist = nodelist
-
def render(self, context):
- #lines = self.rawSource().split("\n")
- lines = self.nodelist.render(context).split("\n")
- document = minidom.Document()
- code = self.createList(document, lines)
- code.setAttribute('class', 'code')
- code = code.toprettyxml()
- return code
-
def level(self, line):
-
if line.startswith("\t"):
- lineLevel = len(re.findall('^\t+', line)[0])
-
else:
- lineLevel = 0
- return lineLevel
-
if line.startswith("\t"):
-
def clean(self, line, lineLevel):
- cleanLine = line.strip()
- #only reset line level for lines with something in them
-
if cleanLine:
- lineLevel = self.level(line)
- return cleanLine, lineLevel
-
def createList(self, document, lines, listLevel=0):
- ul = document.createElement("ul")
- #no sections from empty lines at the start of the code
- #but afterwards, empty lines mean that the next element needs a section class
- started=False
- markSection = False
- currentLine = 0
- maxLine = len(lines)
- lineLevel = listLevel
-
while currentLine < maxLine:
- line = lines[currentLine]
- cleanLine, lineLevel = self.clean(line, lineLevel)
- #if the indentation has grown, send the sublines out to make a new list
-
if lineLevel > listLevel:
- subLines = []
-
while currentLine < maxLine and lineLevel > listLevel:
- subLines.append(line)
- currentLine = currentLine + 1
-
if currentLine < maxLine:
- line = lines[currentLine]
- cleanLine, lineLevel = self.clean(line, lineLevel)
-
else:
- cleanLine = ''
- lineLevel = 0
- markSection, subUL = self.createList(document, subLines, listLevel+1)
-
if not started:
- li = document.createElement("li")
- ul.appendChild(li)
- ul.childNodes[-1].appendChild(subUL)
- #what's left is text; create text node and put it in an LI
-
if cleanLine:
- li = document.createElement("li")
- li.appendChild(document.createTextNode(cleanLine))
- #after blank lines, give the list item a special class
-
if markSection:
- li.setAttribute('class', 'section')
- markSection = False
- ul.appendChild(li)
- started = True
-
elif started:
- markSection = True
- currentLine = currentLine + 1
- #sublists need to return both the list and whether or not there were blank lines left over
-
if listLevel > 0:
- return markSection, ul
-
else:
- return ul
-
def __init__(self, nodelist):
- register = template.Library()
- register.tag('code', do_code)
One thing I’m doing a bit differently here is that I’m not rendering the nodelist that comes back from parser.parse(). Instead, I’m pulling the raw source out (probably incorrectly, but I can’t find any documentation on it). This way, I don’t have to worry about Django tags being inside my code—they can be displayed, too. This is handled with the rawNode() method. It loops through each node in the nodelist, pulls out the raw source, and concatenates it together determines the starting and ending point of the code in the raw source, and extracts that portion. This is useful enough that I extracted it into a parent class that can be used for other template tags. The raw source is only available if DEBUG=True is set in settings.py. This is not a viable option. I’ve modified the above code to reflect this. For historical reasons, here is the old class for retrieving the raw source:
[toggle code]
- #this node is for template tags that need to be able to return their raw source
-
class RawNode(template.Node):
- #get the raw code
-
def rawSource(self):
- code, codeRange = self.source
- start = codeRange[1]
- end = start
-
for node in self.nodelist:
-
if hasattr(node, "source"):
- stringNode, codeRange = node.source
- end = codeRange[1]
-
if hasattr(node, "source"):
- return code.source[start:end]
Instead, you’ll need to use the “templatetag” template tag to display braces in code snippets:
- <p>Here’s an example of a Django tag {% templatetag openblock %} link whitehouse {% templatetag closeblock %}.</p>
Code to be displayed is put between a {% code %} and {% endcode %} tag. The only code that won’t be able to be displayed is the code that ends the block are Django tags.
Otherwise, this is pretty straightforward. The “createList” method loops through every line; if a line is indented further than expected, createList calls itself with all of the sub-lines and increments the expected indentation level. Sub-levels are added as childNodes to the most recent element (which should always be an <LI>). Blank lines mean that the next list item is marked with the class “section”, which can be styled in the CSS file to put extra space in front of it.
When the overall list returns to the render method, it adds the “code” class to the top-level element (in this case, a <UL>). The whole thing is converted from XML nodes to a string of XHTML.
It can start mid-block as well; if it encounters immediate indentation, it will recurse until it gets to content, and then make up empty list items to hold the lists.
[toggle code]
-
-
- markSection, subUL = self.createList(document, subLines, listLevel+1)
- ul.childNodes[-1].appendChild(subUL)
- #what's left is text; create text node and put it in an LI
- if cleanLine:
-
If this works, I’ll have to take a look at some of the syntax highlighters available for Python, such as Pygments.
- Django using too much memory? Turn off DEBUG=True!
- DEBUG=False can save hundreds of megabytes in Django command-line scripts, and probably in Django web processes.
- Excerpting partial XHTML using minidom
- You can use xml.dom.minidom to parse partial XHTML as long as you use a few tricks and don’t mind that getElementById doesn’t work.
- Pygments
- “Pygments is a generic syntax highlighter for general use in all kinds of software such as forum systems, wikis or other applications that need to prettify source code.”
- Representing code in HTML
- A minor epiphany that may not be new to others on how to display programming and HTML code in HTML.
More Django
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- Custom managers for Django ForeignKeys
- I’ve got one really annoying model for keywords. There’s one category of keywords that, by default, should not show up when used as a ForeignKey for most models. Key word: most.
- Fixing Django 1.2.4’s SuspiciousOperation on filtering
- When you get the message “Filtering by keyword not allowed” in Django 1.2.4, here’s one way to fix it.
- Reusing Django’s filter_horizontal
- Just as with pop-ups, it’s possible to use the built-in JavaScript for filtering multiple-selection popups on custom forms.
- Django formsets and date/time fields
- Date/Time fields in Django formsets appear to have incompatible default values, resulting in forms using them always looking as though they’ve got a new entry when they don’t.
- 25 more pages with the topic Django, and other related pages
More XML
- Parsing JSKit/Echo XML comments files
- While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
- Auto-closing HTML tags in comments
- One of the biggest problems on blogs is that comments often get stuck with unclosed italics, bold, or links. You can automatically close them by transforming the HTML snippet into an XML document.
- minidom self-closes empty SCRIPT tags
- Python’s minidom will self-close empty script tags—as it should. But it turns out that Firefox 3.6 and IE 8 don’t support empty script tags.
- A present for Palm
- Palm needs a little help understanding XML.
- Automatically distributing images within XHTML
- One of the nice things about XHTML is that the tools for reading XML have finally matured. So if, for example, I want to have a series of images automatically placed within my web page, I can parse the XHTML of the content to ensure that nothing is broken.
- Three more pages with the topic XML, and other related pages

June 13, 2009: I’ve modified the rawNode method to more reliably (I hope) get the full raw source of the tag.
June 14, 2009: It turns out that the raw source is only available when DEBUG=True is on. That’s too bad.