Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, Swift, BASIC, and whatever else I happen to feel like hacking at.

GeekTool, TaskPaper, and XML

Jerry Stratton, October 24, 2010

I’ve been using GeekTool more and more lately. It’s easy enough, for example, to display a file on the Desktop, and I use that for my auto-generated TaskPaper recurring tasks file.

However, I also have a hand-maintained TaskPaper file, and one of the tags I use in that file is “top”, for things I want to do next. Having the list of recurring tasks on the Desktop made dealing with those tasks much easier—and much more likely.

In fact, it comes to the point that Mr. Procrastinator doesn’t even open up TaskPaper. If it isn’t an icon on the Desktop or a Geeklet on the Desktop, it’s not getting done.

So I wrote a script to display the top tasks in a Geeklet, too, and it’s worked great.

Read the projects

The first part’s easy. Just read the files listed on the command line into a string.

[toggle code]

  • #!/usr/bin/python
  • # -*- coding: utf-8 -*-
  • import sys, re, codecs
  • from optparse import OptionParser
  • from xml.dom import minidom
  • parser = OptionParser('%prog [-t] <file1> [<files>]')
  • parser.add_option('-t', '--tag', help='show tasks with tag(s)', action='append')
  • parser.add_option('-x', '--xml', help='show XML version of TaskPaper document', action='store_true')
  • (options, files) = parser.parse_args()
  • if not files:
    • parser.print_help()
    • sys.exit()
  • #read tasks into taskLines
  • tasks = ''
  • for file in files:
    • tasks = tasks + codecs.open(file, 'r', 'utf-8').read()

Nothing special here, it just sets up the command-line options and loops through the TaskPaper files to read them in.


I decided to write the script in two steps. Step one was to convert the TaskPaper document to XML. Once in XML, I can manipulate it in Python like any XML document. Since TaskPaper documents are hierarchical, a recursive function based on the indentation level should, and did, work great.

[toggle code]

  • taskLines = tasks.split("\n")
  • def prettyXML(document):
    • fix = re.compile(r'((?<=>)(\n[\t]*)(?=[^<\t]))|(?<=[^>\t])(\n[\t]*)(?=<)')
    • fixed_output = re.sub(fix, '', document.toprettyxml())
    • return fixed_output
  • #parse TaskPaper file into XML
  • taskDocument = minidom.parseString('<projects />')
  • projectList = taskDocument.documentElement
  • def getIndentation(line):
    • if line.startswith("\t"):
      • indentation = len(re.findall('^\t+', line)[0])
    • else:
      • indentation = 0
    • return indentation
  • taskParser = re.compile(r'([^@]+) @(.*)$', re.IGNORECASE)
  • doneParser = re.compile(r'\((.+)\)')
  • def parseTask(task):
    • results = re.search(taskParser, task)
    • tags = ''
    • done = False
    • if results:
      • task = results.group(1)
      • tags = results.group(2)
      • tags = tags.split(' @')
      • tagList = []
      • for tag in tags:
        • if tag.startswith('done('):
          • done = re.findall(doneParser, tag)[0]
        • else:
          • tagList.append(tag)
      • tags = ' '.join(tagList)
    • return task, tags, done
  • def parseTasks(taskLines, parent, indentation=0):
    • taskIndent = indentation
    • item = None
    • while taskLines:
      • taskLine = taskLines[0].rstrip()
      • if not taskLine:
        • taskLines.pop(0)
        • continue
      • taskIndent = getIndentation(taskLine)
      • if taskIndent > indentation:
        • parseTasks(taskLines, item, taskIndent)
      • elif taskIndent < indentation:
        • return
      • else:
        • taskLine = taskLine[indentation:]
        • if taskLine.endswith(':'):
          • item = taskDocument.createElement('project')
          • item.setAttribute('name', taskLine[:-1])
        • elif taskLine.startswith('- '):
          • item = taskDocument.createElement('task')
          • taskLine, tags, done = parseTask(taskLine[2:])
          • taskValue = taskDocument.createTextNode(taskLine)
          • item.appendChild(taskValue)
          • if tags:
            • item.setAttribute('tag', tags)
          • if done:
            • item.setAttribute('done', done)
        • else:
          • item = taskDocument.createElement('note')
          • noteValue = taskDocument.createTextNode(taskLine)
          • item.appendChild(noteValue)
        • parent.appendChild(item)
        • taskLines.pop(0)
  • parseTasks(taskLines, projectList)
  • if options.xml:
    • print prettyXML(taskDocument)
    • sys.exit()

The meat of this section of the script is parseTasks. It steps through each line in the TaskPaper file(s) and either adds it to the parent element as a project, task, or note; or, if the indentation level has increased, it recursively calls itself again with the latest item as the new parent. And if the indentation level drops below the current recursion level, it returns to the caller.

This function makes heavy use of the fact that in Python, lists and objects are passed through functions by reference rather than by copy.

  • When the first item is popped out of the list at any recursion level, all other recursion levels immediately “get” the new version of the list.
  • When a parent has a child appended to it at a lower recursion level, there’s no need to pass the parent back up the chain: XML nodes in Python are objects, and objects are passed by reference.

The “prettyXML” function is based on a regular expression by BrendanM. It fixes the minidom’s toprettyxml so that it doesn’t add unwanted whitespace to textnodes.

You can run the above script; just make sure you add “--xml” as an option so that it displays the resulting XML.

Filter wanted tags

The next step is to get only the tasks with the desired tag(s). That’s easy enough with getElementsByTagName. Put this above the “if options.xml” so that the script can print out the pruned tree as XML:

[toggle code]

  • #filter wanted tasks
  • def removeTask(task):
    • parent = task.parentNode
    • if parent:
      • parent.removeChild(task)
      • if not parent.getElementsByTagName('task'):
        • removeTask(parent)
  • if options.tag:
    • for tag in options.tag:
      • tasks = projectList.getElementsByTagName('task')
      • for task in tasks:
        • if task.getAttribute('done'):
          • removeTask(task)
        • taskTags = task.getAttribute('tag').split(' ')
        • if tag not in taskTags:
          • removeTask(task)
  • tasks = projectList.getElementsByTagName('task')
  • if not tasks:
    • print 'No matching tasks found'
    • sys.exit()
  • if options.xml:
    • print prettyXML(taskDocument)
    • sys.exit()

It starts with all tasks, and then, for each tag, successively winnows out the tasks that don’t contain that tag. If a project (specifically a parent) ends up with no tasks, that parent gets removed, too.

Display the pruned tree

After that, it’s just a matter of recursively displaying the projects, tasks, and notes in the XML tree.

[toggle code]

  • #display tree as TaskPaper
  • def displayItem(item, indentation=0):
    • if item.tagName == 'project':
      • line = "\t"*indentation+item.getAttribute('name')+':'
    • elif item.tagName == 'task':
      • line = "\t"*indentation+'- '+item.childNodes[0].data
    • elif item.tagName == 'note':
      • line = "\t"*indentation+u'• ' + item.childNodes[0].data
    • print line.encode('utf-8')
    • for child in item.childNodes:
      • if child.nodeType == item.ELEMENT_NODE:
        • displayItem(child, indentation+1)
  • for project in projectList.childNodes:
    • displayItem(project)
    • print

The only odd bit is that, since I’m using a utf-8 bullet for notes, and utf-8 text might appear in tasks, I need to encode each line before I print it. Otherwise, while it will appear to work in the terminal, it won’t work in GeekTool. I suspect there’s a better way, but this works.

I can now grab a tagged subset of my to-do list using something like:

  • $HOME/bin/taskpaper --tag top "$HOME/Documents/Tasks.taskpaper"

I have it run every ten minutes in a GeekTool “shell” geeklet, and the tasks I want to focus on next are always sitting there on the Desktop.

  1. <- GeekTool color codes
  2. Per-header encryption ->