Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, Swift, BASIC, and whatever else I happen to feel like hacking at.

42 Astoundingly Useful Scripts and Automations for the Macintosh

Work faster and more reliably. Add actions to the services menu and the menu bar, create drag-and-drop apps to make your Macintosh play music, roll dice, and talk. Create ASCII art from photos. There’s a script for that in 42 Astounding Scripts for the Macintosh.

Django syndication feed guid

Jerry Stratton, December 30, 2006

The Django syndication feed framework makes it very easy to take any set of objects and turn them into an RSS feed. However, it makes an assumption about these items that is (for me, at least) often wrong: that each item has a unique URL. The framework is hard-coded to put the URL into the feed as the guid for each item. This is fine as long as URLs never repeat, but fails if URLs do repeat. The guid is then not a unique ID.

The problem

For example, on Negative Space I have a News class. News items don’t go on a page of their own, they go on other pages. They are ephemeral notes that appear at the top of a page for as long as I specify and then disappear. Different notes can appear on the same page. That News items don’t have their own URL isn’t itself a problem for Django’s feeds, because I can override the URL for any item. In this case, I override the URL to be the URL of the page that the News item appears on.

However, this means that later News items on that page don’t have a unique guid. Some RSS feed readers will see that they’ve already displayed this guid and not display the later News items.

I also have a “latest pages” feed. This feed is a list of the most recently modified pages on my site. Here, at least, at any specific time each item in the feed has a unique URL, but over time they do not. Any feed reader that remembers what it has viewed before will also see some items come up that, according to its database, it has already displayed.

The solution

As far as I can tell there is no way to fix this within a feed object, other than by mucking with the URL. Whereas other tags have something like this that looks for an attribute or method on the feed object:

  • pubdate = self.__get_dynamic_attr('item_pubdate', item),

the GUID is hard-coded to be the item’s link:

  • unique_id = link,

Using the low-level framework seemed to be overkill just to solve this problem. Even subclassing the Feed object would have meant copying a fairly long function (get_feed) just for one minor change. So I modified django/contrib/syndication/feeds.py itself and documented my change in my notes, so that I can restore them when the next version of Django comes out.

Between the link= and the enc = lines (currently lines 93 and 94) I added:

[toggle code]

  • unique_id = self.__get_dynamic_attr('item_guid', item)
  • if not unique_id:
    • unique_id = link

Then I replaced “unique_id = link” with “unique_id = unique_id”.

Now, Django will first look for an item_guid attribute or method before falling back to the link as the unique ID for an RSS feed item.

In my News item feed, I implemented the item_guid method as:

[toggle code]

  • def item_guid(self, news_item):
    • page = self.myPage(news_item)
    • hasher = md5.new()
    • hasher.update(page.get_absolute_url() + str(news_item.livedate))
    • hash = hasher.hexdigest()
    • return hash
  • def myPage(self, news_item):
    • pages = NewsPage.objects.filter(news_item = news_item)
    • page = pages[0:1].get().page
    • return page

This creates what will be (in this case) a unique ID by combining the URL of the page that the news item appears on with the date that the news item went live. It then returns a digest of that combination, because I prefer digests for unique keys.

Because these items don’t have URLs that work as valid GUIDs, the standard guid tag doesn’t work either. Not only must GUIDs be unique, but if they aren’t URLs the guid tag must be marked with the isPermaLink="false" attribute.

This is also hardcoded, into django/utils/feedgenerator.py. That file has:

[toggle code]

  • if item['unique_id'] is not None:
    • handler.addQuickElement(u"guid", item['unique_id'])

What it really needs to do is check to see if item['unique_id'] is a valid URL or not, and set isPermaLink to false if not (actually, it really needs to have some way of being told what this should be).

In my case, I know that none of my feeds are going to have links in the guid tags, so I just set it to:

[toggle code]

  • if item['unique_id'] is not None:
    • handler.addQuickElement(u"guid", item['unique_id'], {u"isPermaLink": "false"})

And my feeds now validate.

August 6, 2012: Fixing Django’s feed generator without hacking Django

I installed security update 1.4.1 for Django yesterday, and when I went to hack feedgenerator.py I thought I’d take another look at somehow subclassing or otherwise overriding the offending code. It’s been a long time since I wrote that hack and maybe I’ve learned enough about Django and/or Python to stop having to hack Django’s source every time I upgrade.

The offending code is in add_item_elements in django.utils.feedgenerator.Rss201rev2Feed. When creating a feed, however, I don’t subclass Rss201rev2Feed, I subclass django.contrib.syndication.views.Feed. In fact, all of my feeds inherit from a base subclass called NSFeed.1

Feed uses Rss201rev2Feed by way of DefaultGenerator. It’s just a property, feed_type, on the Feed class. So I overrode the feed_type property with my own subclass of Rss201rev2Feed and was able to override add_item_elements. I tested it by just putting in one line, “pass”, and checking the feed contents; it was just a bunch of empty items, as hoped for. Replacing “pass” with a “super” call to get the parent method’s functionality restored the feed.

Unfortunately, add_item_elements does a lot of work—it adds everything via a series of if/then statements. It uses an XMLGenerator subclass—the “handler” variable—to add elements to itself depending only on the dict entries in the “item” variable. My first thought was to let the parent add_item_elements do its work and then just add the isPermaLink attribute to the newly-added guid element. As far as I can tell, however, XMLGenerator is focused purely on XML generation, with no methods for XML modifications.

Fortunately guid is an optional element. If it doesn’t exist in the item dict, add_item_elements doesn’t create one. So I can modify handler before passing it through to the parent and then set guid to None. The element already has a guid element with isPermaLink=False and the parent doesn’t add another.

Note that as far as I can tell, none of these classes are documented beyond their signature, so they’re likely subject to change in any Django revision.

February 8, 2011: feedgenerator potentially improved

Hey! I was browsing my referrers today and noticed I was getting hits from Django’s bug tracker. Looks like this hack won’t be necessary to create valid RSS feeds in an unknown version after 1.3.

andreiko’s solution involves adding a new property to a Feed; the feed object will use that property to determine whether the guid provided is a permalink or not.

Here’s the sample:

[toggle code]

  • class Rss(Feed):
    • title = "Chicagocrime.org site news"
    • link = "http://chicagocrime.org/rss/"
    • description = "Updates on changes and additions to chicagocrime.org."
    • guid_is_permalink = False

Looks like a great solution. Once this issue is fixed, I won’t have to hack the Django source when new versions come out—this is the last remaining hack that I use.

September 9, 2008: Django 1.0 feedgenerator and unique IDs

I’ll have a longer post on upgrading from Django 0.96.2 to Django 1.0 later, hopefully this weekend. But here’s a note on generating RSS feeds with a unique ID. The new version of feedgenerator.py in django/utils supports adding unique_id to items; but it still doesn’t check to see if the unique ID is a permaLink; the default assumption is still that it is.

For my purpose, this is easy to fix. Line 244 and 245 are:

[toggle code]

  • if item['unique_id'] is not None:
    • handler.addQuickElement(u"guid", item['unique_id'])

I’m going to make the assumption that I’m only passing in a link once. If I’m passing a link as unique_id, I won’t pass it in as link. So if link is not None, unique_id is not a permalink:

[toggle code]

  • if item['unique_id'] is not None:
    • if item['link'] is not None:
      • handler.addQuickElement(u"guid", item['unique_id'], {u"isPermaLink": "false"})
    • else:
      • handler.addQuickElement(u"guid", item['unique_id'])

Note that link is a required parameter for SyndicationFeed.add_item.

  1. <- ht://Dig and Quotes
  2. Why Link Amazon? ->