Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Converting HTML lists to text on the fly

Jerry Stratton, June 25, 2009

About four years ago, I had the epiphany that since programming code is a list of commands, it makes sense to display that list as an HTML list. Before that, I generally presented code using the PRE and/or CODE tags, which was always problematic. Any long lines tended to go off the screen (see Regroup vs. IfChanged in Django templates for an example). Since my purpose in displaying these snippets of code was to show how I was solving some problem, having the code disappear was counterproductive. Switching to HTML lists made the code readable.

The main problem with using HTML lists, however, is that browsers don’t copy lists as indented text. They tend to store the formatted version as indented lists, but the text version is just a jumbled mass of text—ironic, since I moved away from the PRE and CODE tags to avoid displaying a jumbled mass of text.

After a few years, I realized I might be able to solve this with a JavaScript function, but the problem with that is, I knew I would never go back and add the function to all of my previous code snippets. And I kept expecting browsers to copy hierarchical lists hierarchically even in text.

Now that I have a code template tag in Django, it’s easy for me to add new functionality to old code snippets. I’ve written a “simple” JavaScript to add a “toggle code” option on the code snippets in this site. The function went through three iterations.

Converting a list to indented plain text

After my experience with excerpting HTML, it seemed like this would be pretty easy, and it was.

[toggle code]

  • function copyCode(codeDIV) {
    • var codeList = codeDIV.getElementsByTagName('ul')[0];
    • var code = getCode(codeList.childNodes, 0, '');
    • window.alert(code);
  • }
  • //take a list-oriented HTML display and return indented plain text
  • function getCode(codeElements, codeDepth, leadingTabs) {
    • var subCode = '';
    • //increase leading tabs if necessary
    • if ((codeDepth-1)/2 > leadingTabs.length) {
      • leadingTabs = leadingTabs + "\t";
    • }
    • //loop through each element
    • for (var child=0;child<codeElements.length;child++) {
      • element = codeElements[child];
      • if (element.nodeType == element.TEXT_NODE) {
        • //strip white space from text
        • var elementText = element.data.replace(/^\s+|\s+$/g,'');
        • if (elementText) {
          • subCode = subCode + leadingTabs + elementText + "\n";
        • }
      • } else if (element.tagName != 'SPAN') {
        • if (element.className == 'section') {
          • subCode = subCode + "\n";
        • }
        • subCode = subCode + getCode(element.childNodes, codeDepth+1, leadingTabs);
      • }
    • }
    • return subCode;
  • }

This requires a non-UL element in the same DIV as the code. Since I’m using Django, my Django template for displaying code looks like:

[toggle code]

  • <div class="code">
    • <p class="codecopier" onclick="copyCode(this.parentNode)">[copy code]</p>
    • {{ code }}
  • </div>

The only tricky bit is that, at least the way I’ve designed it here, two levels deep is one level of indentation. One level for the LI, and one level for the text node.

Present the code in a plain text window?

My first thought was to keep it simple: open up a text/plain document and display the code there for copying. The Document object has an “open” method that specifically takes the MIME type of the new document. By specifying text/plain, there’s no problem displaying character entities or less than symbols.

[toggle code]

  • function copyCode(codeDIV, pageTitle) {
    • var codeList = codeDIV.getElementsByTagName('ul')[0];
    • var code = getCode(codeList.childNodes, 0, '');
    • var codeWindow = window.open('', 'hobo.codeWindow', 'location=no, status=no, toolbar=no');
    • var codeDocument = codeWindow.document.open('text/plain');
    • codeDocument.title = 'Plaintext code';
    • if (pageTitle) {
      • codeDocument.title += ' for ' + pageTitle;
    • }
    • codeDocument.write(code);
    • codeDocument.close();
    • codeWindow.focus();
  • }

And the HTML is:

[toggle code]

  • <div class="code">
    • <p class="codecopier" onclick="copyCode(this.parentNode, '{{ page.title }}')">[copy code]</p>
    • {{ code }}
  • </div>

This works great. It opens up a new window that can be both copied and saved, and if it is saved the page title is automatically in the new document. But it only works in Firefox. IE specifically forbids any MIME type other than text/html; it will error out if any other MIME type is requested. Safari (and presumably Webkit) just ignores the MIME type parameter altogether in favor of text/html.

Toggle between readable and copyable display?

Next, I thought about replacing the UL’s outer HTML with the new plain text. Save the original UL list as a property on the DIV object, convert the UL to plain text and set the outerHTML property to be the new plain text (wrapped inside of a PRE). By looking for the saved list property, I can tell whether I need to display the viewable list or the copyable code.

Because the code is being displayed in HTML, however, I also need to escape any ampersands and less than symbols, or the browser will display them as HTML code.

[toggle code]

  • function copyCode(codeDIV) {
    • //if savedList exists, set outerHTML from the PRE to the UL
    • if (codeDIV.savedList) {
      • var codePRE = codeDIV.getElementsByTagName('pre')[0];
      • codePRE.outerHTML = codeDIV.savedList.outerHTML;
      • codeDIV.savedList = 0;
    • } else {
      • //there should only be one list under this parent.
      • var codeList = codeDIV.getElementsByTagName('ul')[0];
      • var code = getCode(codeList.childNodes, 0, '');
      • if (code) {
        • codeDIV.savedList = codeList;
        • //replace with string doesn't seem to do < or & characters after the first line.
        • var escapedCode = code.replace(/&/g, '&amp;');
        • escapedCode = escapedCode.replace(/</g, '&lt;');
        • escapedCode = escapedCode.replace(/>/g, '&gt;');
        • codeList.outerHTML = '<pre>' + escapedCode + '</pre>';
      • }
    • }
  • }

And for the HTML:

[toggle code]

  • <div class="code">
    • <p class="codecopier" onclick="copyCode(this.parentNode)">[toggle code]</p>
    • {{ code }}
  • </div>

The tricky part here is, at least for me, is that JavaScript’s string replace method description is confusing. It doesn’t really take a string to look for and a string to replace with. It takes a regular expression to look for and a string to replace with. If it gets a string to look for, it converts it to a simple regular expression. That’s important, because regular expression replace normally only replaces the first match. I was almost ready to give up and have this feature work only for Firefox/Gecko users before I realized what the issue was. As a regular expression, I need to append /g to make it global.

But wait!

Now it works in Safari, but it doesn’t work in Firefox. Why? Because outerHTML is not part of the JavaScript standard. I don’t even know where I remembered it from. Maybe I just made it up based on remembering innerHTML, and unluckily it worked. Regardless, if I’m going to have it only work in one browser, it’s going to be the standards-based one, so unless I can find another solution it’s back to document.open("text/plain").

So what about innerHTML? I can’t replace the innerHTML of a UL with a PRE, but I can replace it with an LI that contains a PRE.

[toggle code]

  • function copyCode(codeDIV) {
    • //there should only be one list under this parent.
    • var codeList = codeDIV.getElementsByTagName('ul')[0];
    • //if savedList exists, set innerHTML from the PRE version to the list item version
    • if (codeDIV.savedList) {
      • codeList.innerHTML = codeDIV.savedList;
      • codeDIV.savedList = 0;
    • } else {
      • var code = getCode(codeList.childNodes, 0, '');
      • if (code) {
        • codeDIV.savedList = codeList.innerHTML;
        • //replace with string doesn't seem to do < or & characters after the first line.
        • code = code.replace(/&/g, '&amp;');
        • code = code.replace(/</g, '&lt;');
        • code = code.replace(/>/g, '&gt;');
        • codeList.innerHTML = '<li class="copyable"><pre>' + code + '</pre></li>';
      • }
    • }
  • }

And for restoration purposes, instead of saving the entire UL, it saves only the innerHTML. This works in Safari and Firefox, so presumably in all Webkit and Gecko browsers.

Only include code.js once

If you view source on this page, you’ll see that the copyCode and getCode functions are included via a file called code.js. That file is included in front of the first code snippet on the page, but not in front of other code snippets.

I’m doing that by setting a codeSnippetCount property on the page itself in the code templatetag:

[toggle code]

  • page = context['page']
  • if hasattr(page, 'codeSnippetCount'):
    • page.codeSnippetCount = page.codeSnippetCount+1
  • else:
    • page.codeSnippetCount = 1

Then, in the template, I check for “isequal page.codeSnippetCount 1”, and only display the SCRIPT tag if that’s true.

[toggle code]

  • {% ifequal page.codeSnippetCount 1 %}
    • <script type="text/javascript" src="{{ centralstart }}/library/scripts/code.js"></script>
  • {% endifequal %}

This was a meandering journey through writing a fairly simple JavaScript function, wasn’t it?

  1. <- Django memory
  2. Automatic images in XHTML ->