Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Code formatting Django tag

Jerry Stratton, April 22, 2009

Long ago I wrote a Perl script for representing programming code in HTML using lists. Recently I’ve been converting my pages from HTML to XHTML and have had to redo that script: I had mistakenly nested lists by placing sublists as direct children of the parent list. That isn’t right: sublists need to be children of a list item.

It occurred to me that since my new pages also use Django, that I should be able to write a Django template tag that formats code on the fly; I would then be able to make any changes and have them apply automatically to all code snippets on my site. After learning a bit about xml.dom.minidom for my excerpting partial XHTML project, I realized it was perfect for the task.

[toggle code]

  • from django import template
  • import re
  • from xml.dom import minidom
  • #display code as a list
  • def do_code(parser, token):
    • nodelist = parser.parse(('endcode',))
    • parser.delete_first_token()
    • return codeNode(nodelist)
  • class codeNode(template.Node):
    • def __init__(self, nodelist):
      • self.nodelist = nodelist
    • def render(self, context):
      • #lines = self.rawSource().split("\n")
      • lines = self.nodelist.render(context).split("\n")
      • document = minidom.Document()
      • code = self.createList(document, lines)
      • code.setAttribute('class', 'code')
      • code = code.toprettyxml()
      • return code
    • def level(self, line):
      • if line.startswith("\t"):
        • lineLevel = len(re.findall('^\t+', line)[0])
      • else:
        • lineLevel = 0
      • return lineLevel
    • def clean(self, line, lineLevel):
      • cleanLine = line.strip()
      • #only reset line level for lines with something in them
      • if cleanLine:
        • lineLevel = self.level(line)
      • return cleanLine, lineLevel
    • def createList(self, document, lines, listLevel=0):
      • ul = document.createElement("ul")
      • #no sections from empty lines at the start of the code
      • #but afterwards, empty lines mean that the next element needs a section class
      • started=False
      • markSection = False
      • currentLine = 0
      • maxLine = len(lines)
      • lineLevel = listLevel
      • while currentLine < maxLine:
        • line = lines[currentLine]
        • cleanLine, lineLevel = self.clean(line, lineLevel)
        • #if the indentation has grown, send the sublines out to make a new list
        • if lineLevel > listLevel:
          • subLines = []
          • while currentLine < maxLine and lineLevel > listLevel:
            • subLines.append(line)
            • currentLine = currentLine + 1
            • if currentLine < maxLine:
              • line = lines[currentLine]
              • cleanLine, lineLevel = self.clean(line, lineLevel)
            • else:
              • cleanLine = ''
              • lineLevel = 0
          • markSection, subUL = self.createList(document, subLines, listLevel+1)
          • if not started:
            • li = document.createElement("li")
            • ul.appendChild(li)
          • ul.childNodes[-1].appendChild(subUL)
        • #what's left is text; create text node and put it in an LI
        • if cleanLine:
          • li = document.createElement("li")
          • li.appendChild(document.createTextNode(cleanLine))
          • #after blank lines, give the list item a special class
          • if markSection:
            • li.setAttribute('class', 'section')
            • markSection = False
          • ul.appendChild(li)
          • started = True
        • elif started:
          • markSection = True
        • currentLine = currentLine + 1
      • #sublists need to return both the list and whether or not there were blank lines left over
      • if listLevel > 0:
        • return markSection, ul
      • else:
        • return ul
  • register = template.Library()
  • register.tag('code', do_code)

One thing I’m doing a bit differently here is that I’m not rendering the nodelist that comes back from parser.parse(). Instead, I’m pulling the raw source out (probably incorrectly, but I can’t find any documentation on it). This way, I don’t have to worry about Django tags being inside my code—they can be displayed, too. This is handled with the rawNode() method. It loops through each node in the nodelist, pulls out the raw source, and concatenates it together determines the starting and ending point of the code in the raw source, and extracts that portion. This is useful enough that I extracted it into a parent class that can be used for other template tags. The raw source is only available if DEBUG=True is set in settings.py. This is not a viable option. I’ve modified the above code to reflect this. For historical reasons, here is the old class for retrieving the raw source:

[toggle code]

  • #this node is for template tags that need to be able to return their raw source
  • class RawNode(template.Node):
    • #get the raw code
    • def rawSource(self):
      • code, codeRange = self.source
      • start = codeRange[1]
      • end = start
      • for node in self.nodelist:
        • if hasattr(node, "source"):
          • stringNode, codeRange = node.source
          • end = codeRange[1]
      • return code.source[start:end]

Instead, you’ll need to use the “templatetag” template tag to display braces in code snippets:

  • <p>Here’s an example of a Django tag {% templatetag openblock %} link whitehouse {% templatetag closeblock %}.</p>

Code to be displayed is put between a {% code %} and {% endcode %} tag. The only code that won’t be able to be displayed is the code that ends the block are Django tags.

Otherwise, this is pretty straightforward. The “createList” method loops through every line; if a line is indented further than expected, createList calls itself with all of the sub-lines and increments the expected indentation level. Sub-levels are added as childNodes to the most recent element (which should always be an <LI>). Blank lines mean that the next list item is marked with the class “section”, which can be styled in the CSS file to put extra space in front of it.

When the overall list returns to the render method, it adds the “code” class to the top-level element (in this case, a <UL>). The whole thing is converted from XML nodes to a string of XHTML.

It can start mid-block as well; if it encounters immediate indentation, it will recurse until it gets to content, and then make up empty list items to hold the lists.

[toggle code]

      • markSection, subUL = self.createList(document, subLines, listLevel+1)
      • ul.childNodes[-1].appendChild(subUL)
      • #what's left is text; create text node and put it in an LI
    • if cleanLine:

If this works, I’ll have to take a look at some of the syntax highlighters available for Python, such as Pygments.

June 13, 2009: I’ve modified the rawNode method to more reliably (I hope) get the full raw source of the tag.

June 14, 2009: It turns out that the raw source is only available when DEBUG=True is on. That’s too bad.

  1. <- Excerpting HTML
  2. Heisenberg QuerySet ->