Mimsy Were the Borogoves

Parsing JSKit/Echo XML comments files—Monday, January 30th, 2012

I just switched over from my temporary JSKit comments to custom local comments. The main reason I went with JSKit to begin with rather than just not have comments is that they provide the comments in an XML file. This meant that I was able to convert the JSKit/Echo comments on my site to the new system.

I wrote it in Python because my comments database uses Django on the back end.

[toggle code]

  • #!/usr/bin/python
  • # -*- coding: utf-8 -*-
  • from optparse import OptionParser
  • import sys, urlparse, datetime
  • import xml.dom.minidom as minidom
  • parser = OptionParser(u'%(prog) [options] <jskit file>')
  • (options, args) = parser.parse_args()
  • if not args:
    • parser.print_help()
  • def getEntry(comment, key):
    • entry = comment.getElementsByTagName(key)
    • if entry:
      • return entry[0].firstChild.data.strip()
    • return None
  • def getValue(comment, key):
    • possibilities = comment.getElementsByTagName('jskit:attribute')
    • entry = None
    • for possibility in possibilities:
      • if possibility.getAttribute('key') == key:
        • entry = possibility
        • break
    • if entry:
      • value = entry.getAttribute('value').strip()
      • return value
    • return None
  • def getPosterURL(webpresence):
    • if '],[' in webpresence:
      • webpresences = webpresence.split('],[')
    • else:
      • webpresences = [webpresence]
    • for webpresence in webpresences:
      • webpresence = webpresence.strip('["]')
      • if webpresence:
        • service, serviceURL = webpresence.split('","')
        • if service in ['login-twitter', 'login-blogspot']:
          • return serviceURL
        • if service not in ['login-openid', 'login-gfc']:
          • print 'Unknown service:', service, serviceURL
          • sys.exit()
    • return None
Fresh new Negative Space—Saturday, January 28th, 2012

I’ve just moved all of Negative Space over to fresh, new servers on WebFaction, so if you see anything missing, please let me know!

I’ve been running all of my domains except hoboes.com through WebFaction for about a year. The main reasons for the move were the support/price ratio compared to Pair.com, easy support of Django (you will finally see the “temporary” comments system disappearing in favor of the custom system used on The Biblyon Broadsheet due to this), and the ease of adding new domains and new email addresses, as well as the ease of getting this data back out if I need it. WebFaction has an API that lets you script the addition, removal, and listing of most of its services, including email addresses.1

As part of my ongoing move to a Django CMS for all of Negative Space and its satellite domains, I put the last major portions of the site into the CMS last week: my Alexandre Dumas pages and my J. M. Barrie pages. They’re very possibly the earliest content-heavy HTML sections of Negative Space. And they were the last remnant of the old division of Negative Space files between /pub/ and /html/, a practice which dated from the days when ftp sites put their “public” data in /pub/; when the web came around, /pub/ contained mostly text, so I added /html/ as a place to put web pages.

Nowadays, /html/ is the default so there’s no need to clutter up my URL namespaces with “/html” preceding every path. Now that I’ve gotten rid of it, of course, this would be the perfect time for HTML to be superseded by something else…

Because everything’s now in a database, I was able to move the site more easily than in past moves. In the past, I’ve had to do a network copy, which is easy enough, except that I also had to make sure I didn’t move OS-specific files, make sure I didn’t upload originals (which I used Mac OS X’s labels to differentiate), and make sure I didn’t upload files marked as out-of-date or that someone had asked to be removed and which I agreed to remove (again, using labels). Last night, I told the CMS to publish everything that’s marked in the database as hosted on www.hoboes.com, and it did.

Tomato: Easy routing software—Monday, November 14th, 2011

A quick plug for router software: I installed Tomato on my Linksys router about a year ago. I haven’t written about it yet because I’m loathe to recommend that people completely remove the official code from their router. But Tomato is so much better than the default router software that I feel sort of like cheating not telling anyone.

Tomato is simpler than the default, more powerful than the default, and easier to use. If it’s compatible with your router, I recommend it, especially over the default Linksys router software. When I finally gave up on it, the Linksys software had become buggy as hell; often, it would display some sort of smarty-like code in the form fields.

  • Wake-On-Lan is built in. This makes it easy to let my personal servers go to sleep until I need them.
  • The device list makes it easy to see who is currently on my network. Easiest way to add somone to the Wireless Client Filter is to (a) copy their MAC address from the logs, b) paste it into the client filter, and (c) give the new entry a name so that I can recognize it later.
  • The firewall is a lot easier to work with. For one thing, it’s always on. You don’t have to figure out how to turn the firewall on, because it’s on by default. Adding new entries to the port forwarding list so that selected applications can be accessed remotely is a snap. It’s very easy for me to turn my CMS on remotely, for example.

Installing it is about as simple as it can be. It’s just like upgrading your default firmware. It’s the same process, in fact: tell your stock software you’re doing an upgrade, then choose the Tomato firmware. Follow the instructions in the included readme file.

Access restriction is also a lot easier. I’ve mentioned before that I occasionally use it to block the really annoying ad servers; Tomato’s access restriction interface is a lot more versatile than the old four-per-screen settings of the default Linksys software.

Tomato’s programmer, Jonathan Zarate, describes Tomato as a “small, lean, and simple” replacement. It’s all that and more. Read the FAQ for any issues you might find important—and to make sure it works with your router—but I’ve been very pleasantly surprised at how well Tomato improves my Linksys router.

Image dimensions and orientation in Mac OS X Python—Thursday, October 20th, 2011

Often you’ll want to get the height and width of attachments that are images. I was able to find two command-line programs on Mac OS X that will provide an image’s dimensions: /usr/bin/sips and /usr/bin/mdls. “SIPS” stands for Scriptable Image Processing System, where “mdls” is for listing metadata.

In my tests, sips was more accurate: the metadata appears to not get filled out immediately. It’s fast enough for humans, but not fast enough for scripts. The height and width metadata were empty when using mdls in the script, whereas sips was always able to provide it.

Add these two methods to the Attachment class:

[toggle code]

  • def dimensions(self):
    • if not self.isImageSaved():
      • return None, None
    • #get dimensions from sips
    • dimensionArgs = ['/usr/bin/sips', '-g', 'pixelHeight', '-g', 'pixelWidth', self.path]
    • sips = subprocess.Popen(dimensionArgs, stdout=subprocess.PIPE)
    • dimensions = sips.stdout.read()
    • dimensions = re.findall(r'pixel(Height|Width): ([0-9]+)', dimensions)
    • if len(dimensions) == 2:
      • label, height = dimensions[0]
      • label, width = dimensions[1]
      • height = int(height)
      • width = int(width)
      • if height and width:
        • return width, height
    • return False, False
  • def isImageSaved(self):
    • if not self.path:
      • print 'Attachment', self.file, 'has not yet been saved.'
      • return False
    • if self.fileKind != 'image':
      • print 'Attachment', self.path, 'is not an image:', self.fileKind
      • return False
    • return True

The “isImageSaved” method just checks to make sure that the attachment has been saved and it is in fact an image. The “dimensions” method returns width, height if it can find it, using “/usr/bin/sips -g pixelHeight -g pixelWidth image path”. It uses a regular expression to parse the response from sips. Pretty basic stuff.

At the bottom of the script, in the “for message in unPublishedMessages:” loop, add an attachments loop:

Using appscript with Apple Mail to get emails and attachments—Sunday, October 16th, 2011

I had a need recently to watch over a folder in Apple Mail, looking for incoming messages and saving attachments. Apparently, there used to be an easier way of getting attachments via appscript, but something in Snow Leopard changed it.

The core of this script is going to be grabbing the messages of a known folder in OS X Mail, and then looping through them.

[toggle code]

  • #!/usr/bin/python
  • # -*- coding: utf-8 -*-
  • import appscript
  • mail = appscript.app(u'Mail')
  • mailbox = mail.mailboxes[u'Archives'].mailboxes[u'Alarms']
  • messages = mailbox.messages[(appscript.its.read_status == False).AND(appscript.its.deleted_status == False)]
  • unPublishedMessages = messages.get()
  • #loop through all unread messages
  • for message in unPublishedMessages:
    • print message.subject.get()

This very simple script just loops through all unread messages1 in the “Alarms” folder of the “Archives” folder, and prints the subject.

Here’s a longer version that gets a bit more information about each message:

[toggle code]

  • #!/usr/bin/python
  • # -*- coding: utf-8 -*-
  • import appscript
  • import sys
  • from optparse import OptionParser
  • parser = OptionParser()
  • parser.add_option('-c', '--commit', help='commit action with mails', action='store_true')
  • (options, args) = parser.parse_args()
  • mail = appscript.app(u'Mail')
  • mailbox = mail.mailboxes[u'Archives'].mailboxes[u'Alarms']
  • messages = mailbox.messages[(appscript.its.read_status == False).AND(appscript.its.deleted_status == False)]
  • messageCount = messages.count(each=appscript.k.item)
  • if not messageCount:
    • print 'No messages to process.'
    • sys.exit()
  • unPublishedMessages = messages.get()
  • #make sure the earliest ones are done first, in case they reference each other
  • def messageSortKey(message):
    • return message.date_sent.get()
  • unPublishedMessages.sort(key=messageSortKey)
Django: fix_ampersands and abbreviations—Sunday, May 22nd, 2011

I’ve been slowly converting all of my Django fields to use UTF8 instead of named entities. The combination of using named entities, talking about code, and talking about D&D makes it difficult to know when the ampersand should be converted and when it shouldn’t.

For the most part the conversion to UTF is going well, except that there’s no easy way to know when &lt; and &gt; need to be converted and when they don’t: sometimes I’m talking about HTML, and sometimes I’m using it. So for those two characters, I’ll need to continue using the ampersand entity directly in my blog content.

It turns out, though, that the fix_ampersands filter handles this. The documentation makes it sound like fix_ampersands converts all ampersands, but in fact it uses a simple regular expression to exclude existing named entities and numeric character references.1

Unfortunately, this still leaves a few edge cases. Any use of ampersand abbreviations, such as Q&A, R&R, M&Ms, or R&D, runs the risk of triggering one of them. For me, this comes up mainly when talking about role-playing games such as D&D; V&V; and T&T. Those first two ampersands look like named entities to fix_ampersands because django/utils/html.py uses the simple and fast expedient of a very simple regular expression:

  • unencoded_ampersands_re = re.compile(r'&(?!(\w+|#\d+);)')

In my case, a simple change to the regex will handle the edge case examples above:

  • unencoded_ampersands_re = re.compile(r'&(?!(\w{2,}|#\d+);)')

There are no one-character named entities, so this regular expression includes what look like single-character named entities in its fixes. Rather than “\w+” it uses “\w{2,}” to only exclude two-character or longer named entities from replacing.2

This won’t fix the problem when the abbreviation looks like a two-character (or more) entity, but in my case the problem has only shown up for one-character abbreviations. In Python it’s possible to “fix” this without hacking the core code directly. At the end of settings.py, I added:

  • #modify fix_ampersands to handle single-character abbreviations
  • import django.utils.html, re
  • django.utils.html.unencoded_ampersands_re = re.compile(r'&(?!(\w{2,}|#\d+);)')

This overwrites the regular expression used by django.utils.html with one that ignores single-character entities.

Copying an iTunes playlist—Saturday, May 14th, 2011

When you give, you receive. For my Pioneer 3200BT review I needed to copy 32 gigabytes of music to an SD card, to see how fast the unit would load that many tracks. Making a giant playlist in iTunes was easy. But iTuneMyWalkman, an otherwise very useful application, was taking forever just to collect the list of songs from iTunes—it had taken over two hours and hadn’t even started a copy yet. And iTunes itself doesn’t let you drag that many items to the Finder.

But the iTunes script I wrote as a joke for Palm does almost everything I needed: it loops through all tracks in a named playlist. All I had to do was add a function to copy each track to a specified folder.

At the top, along with the other imports, add:

  • import urllib, shutil

The urllib library is necessary because iTunes stores track locations in its XML file as file:// encoded URLs. The shutil library will copy a file.

In the OptionParser section, add a new option:

  • parser.add_option('-c', '--copy', help='Copy tracks')

Because --copy means it has to loop through all tracks, just like --space and --tracks, change the “if options.showTracks or options.tallySpace:” line to:

  • if options.showTracks or options.tallySpace or options.copy:

Underneath the other two track-based options, add an “if” for copying:

[toggle code]

  • if options.tallySpace and 'Size' in track:
    • spaceUsed = spaceUsed + int(track['Size'])
  • if options.copy:
    • copyTrack(track)

And, finally, among the other functions, add a copyTrack function:

Put a relative clock on your Desktop with GeekTool—Friday, April 1st, 2011
Today on the Desktop

In this example, the current time is today.

Most clocks show time in a complicated display of day, month, year, hours, minutes, and seconds. But how often have you just wanted to know whether the time is today, yesterday, or last week? This Python-based GeekTool geeklet will display the relative date on your desktop.

[toggle code]

  • #!/usr/bin/python
  • # -*- coding: utf-8 -*-
  • import optparse, datetime
  • days = ('yesterday', 'today', 'tomorrow')
  • parser = optparse.OptionParser('%prog')
  • parser.add_option('-d', '--day', type='choice', choices=days, default='today')
  • (options, slugs) = parser.parse_args()
  • today = datetime.datetime.fromordinal(datetime.date.today().toordinal())
  • future20 = datetime.datetime.now() + datetime.timedelta(minutes=20)
  • oneDay = datetime.timedelta(days=1)
  • oneWeek = datetime.timedelta(days=7)
  • if options.day == 'today':
    • date = today
  • elif options.day == 'tomorrow':
    • date = today + datetime.timedelta(days=1)
  • elif options.day == 'yesterday':
    • date = today - datetime.timedelta(days=1)
  • else:
    • date = future20
  • #the future
  • if date >= today:
    • if date == future20:
      • print "20 minutes into the future"
    • elif date == today:
      • print "Today"
    • elif date <= today + oneDay:
      • print "Tomorrow"
    • elif date <= today + oneWeek:
      • print "Next", date.strftime('%A')
    • elif (date.year == today.year and date.month != today.month) or (date.year == today.year+1 and date.month <= today.month):
      • print "Next", date.strftime('%B')
    • else:
      • timeToFuture = date - today
      • print "In", timeToFuture.days, "Days"
  • #the past
  • elif date >= today - oneDay:
    • print "Yesterday"
  • elif date >= today - oneWeek:
    • print "Last", date.strftime('%A')

Older posts.