Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Django syndication feed guid

Jerry Stratton, December 30, 2006

The Django syndication feed framework makes it very easy to take any set of objects and turn them into an RSS feed. However, it makes an assumption about these items that is (for me, at least) often wrong: that each item has a unique URL. The framework is hard-coded to put the URL into the feed as the guid for each item. This is fine as long as URLs never repeat, but fails if URLs do repeat. The guid is then not a unique ID.

The problem

For example, on Negative Space I have a News class. News items don’t go on a page of their own, they go on other pages. They are ephemeral notes that appear at the top of a page for as long as I specify and then disappear. Different notes can appear on the same page. That News items don’t have their own URL isn’t itself a problem for Django’s feeds, because I can override the URL for any item. In this case, I override the URL to be the URL of the page that the News item appears on.

However, this means that later News items on that page don’t have a unique guid. Some RSS feed readers will see that they’ve already displayed this guid and not display the later News items.

I also have a “latest pages” feed. This feed is a list of the most recently modified pages on my site. Here, at least, at any specific time each item in the feed has a unique URL, but over time they do not. Any feed reader that remembers what it has viewed before will also see some items come up that, according to its database, it has already displayed.

The solution

As far as I can tell there is no way to fix this within a feed object, other than by mucking with the URL. Whereas other tags have something like this that looks for an attribute or method on the feed object:

  • pubdate = self.__get_dynamic_attr('item_pubdate', item),

the GUID is hard-coded to be the item’s link:

  • unique_id = link,

Using the low-level framework seemed to be overkill just to solve this problem. Even subclassing the Feed object would have meant copying a fairly long function (get_feed) just for one minor change. So I modified django/contrib/syndication/feeds.py itself and documented my change in my notes, so that I can restore them when the next version of Django comes out.

Between the link= and the enc = lines (currently lines 93 and 94) I added:

[toggle code]

  • unique_id = self.__get_dynamic_attr('item_guid', item)
  • if not unique_id:
    • unique_id = link

Then I replaced “unique_id = link” with “unique_id = unique_id”.

Now, Django will first look for an item_guid attribute or method before falling back to the link as the unique ID for an RSS feed item.

In my News item feed, I implemented the item_guid method as:

[toggle code]

  • def item_guid(self, news_item):
    • page = self.myPage(news_item)
    • hasher = md5.new()
    • hasher.update(page.get_absolute_url() + str(news_item.livedate))
    • hash = hasher.hexdigest()
    • return hash
  • def myPage(self, news_item):
    • pages = NewsPage.objects.filter(news_item = news_item)
    • page = pages[0:1].get().page
    • return page

This creates what will be (in this case) a unique ID by combining the URL of the page that the news item appears on with the date that the news item went live. It then returns a digest of that combination, because I prefer digests for unique keys.

Because these items don’t have URLs that work as valid GUIDs, the standard guid tag doesn’t work either. Not only must GUIDs be unique, but if they aren’t URLs the guid tag must be marked with the isPermaLink="false" attribute.

This is also hardcoded, into django/utils/feedgenerator.py. That file has:

[toggle code]

  • if item['unique_id'] is not None:
    • handler.addQuickElement(u"guid", item['unique_id'])

What it really needs to do is check to see if item['unique_id'] is a valid URL or not, and set isPermaLink to false if not (actually, it really needs to have some way of being told what this should be).

In my case, I know that none of my feeds are going to have links in the guid tags, so I just set it to:

[toggle code]

  • if item['unique_id'] is not None:
    • handler.addQuickElement(u"guid", item['unique_id'], {u"isPermaLink": "false"})

And my feeds now validate.

February 7, 2011: feedgenerator potentially improved

Hey! I was browsing my referrers today and noticed I was getting hits from Django’s bug tracker. Looks like this hack won’t be necessary to create valid RSS feeds in an unknown version after 1.3.

andreiko’s solution involves adding a new property to a Feed; the feed object will use that property to determine whether the guid provided is a permalink or not.

Here’s the sample:

[toggle code]

  • class Rss(Feed):
    • title = "Chicagocrime.org site news"
    • link = "http://chicagocrime.org/rss/"
    • description = "Updates on changes and additions to chicagocrime.org."
    • guid_is_permalink = False

Looks like a great solution. Once this issue is fixed, I won’t have to hack the Django source when new versions come out—this is the last remaining hack that I use.

September 9, 2008: Django 1.0 feedgenerator and unique IDs

I’ll have a longer post on upgrading from Django 0.96.2 to Django 1.0 later, hopefully this weekend. But here’s a note on generating RSS feeds with a unique ID. The new version of feedgenerator.py in django/utils supports adding unique_id to items; but it still doesn’t check to see if the unique ID is a permaLink; the default assumption is still that it is.

For my purpose, this is easy to fix. Line 244 and 245 are:

[toggle code]

  • if item['unique_id'] is not None:
    • handler.addQuickElement(u"guid", item['unique_id'])

I’m going to make the assumption that I’m only passing in a link once. If I’m passing a link as unique_id, I won’t pass it in as link. So if link is not None, unique_id is not a permalink:

[toggle code]

  • if item['unique_id'] is not None:
    • if item['link'] is not None:
      • handler.addQuickElement(u"guid", item['unique_id'], {u"isPermaLink": "false"})
    • else:
      • handler.addQuickElement(u"guid", item['unique_id'])

Note that link is a required parameter for SyndicationFeed.add_item.

  1. <- ht://Dig and Quotes
  2. Why Link Amazon? ->