Django Twitter tag and RSS object
Python’s minidom makes it easy to parse RSS feeds, since RSS feeds are themselves just very simple XML. I wanted to parse my Twitter RSS feed into a context usable by Django templates.
I broke the feed down into the Feed, a Channel, and individual Items. Channels and Items are both XML nodes, so I made them inherit from a Node class that understands what is available in RSS.
[toggle code]
- #!/usr/bin/python
- #provide an RSS feed object for use in Django or Mako templates
- import datetime, os.path, time, urllib, xml.dom.minidom
- from xml.parsers.expat import ExpatError
-
class Node(object):
-
def __init__(self, node):
- self.node = node
- self.title = self.getValue('title')
- self.link = self.getValue('link')
- self.description = self.getValue('description')
-
def __str__(self):
- return self.title
-
def getValue(self, tag):
- node = self.node.getElementsByTagName(tag)[0].firstChild
- data = None
-
if node:
- data = self.node.getElementsByTagName(tag)[0].firstChild.data
- data = data.strip()
- return data
-
def __init__(self, node):
-
class Channel(Node):
-
def items(self, displayCount):
- items = self.node.getElementsByTagName("item")
-
if displayCount:
- items = items[:displayCount]
- feedItems = []
-
for item in items:
- feedItem = Item(item)
- feedItems.append(feedItem)
- return feedItems
-
def items(self, displayCount):
-
class Item(Node):
-
def __init__(self, item):
- super(Item, self).__init__(item)
- self.pubDate = self.getValue('pubDate')
- #provide a datetime for use by Django's date filters
-
def stamp(self):
- #Mon, 16 Mar 2009 13:02:19 +0000
- return datetime.datetime.strptime(self.pubDate, '%a, %d %b %Y %H:%M:%S +0000')
-
def __init__(self, item):
-
class Feed(object):
-
def __init__(self, feedURL, cache=None):
- self.feedURL = feedURL
-
if cache:
- self.cache = '/tmp/' + cache + '.rss'
-
else:
- self.cache = None
- #is the cache fresh enough to use?
-
def freshCache(self):
-
if self.cache and os.path.exists(self.cache):
- #use cache if it is less than sixty minutes old
- freshTime = time.time() - 60*60
-
if os.path.getmtime(self.cache) > freshTime:
- return True
- return False
-
if self.cache and os.path.exists(self.cache):
-
def readCache(self, forceRead=False):
- feed = None
-
if forceRead or self.freshCache():
-
try:
- feed = xml.dom.minidom.parse(open(self.cache))
-
except:
- feed = None
-
try:
- return feed
-
def reCache(self):
-
try:
- feed = xml.dom.minidom.parse(urllib.urlopen(self.feedURL))
-
except ExpatError, message:
- print "ExpatError opening URL:", message
- feed = None
-
except IOError, message:
- print "IOError opening URL:", message
- feed = None
-
if self.cache:
-
if not feed:
- feed = self.readCache(forceRead=True)
-
if feed:
- xmlString = feed.toprettyxml(encoding="utf-8")
- #if last created by a different user, remove it first
-
if os.path.exists(self.cache) and not os.access(self.cache, os.W_OK):
- os.remove(self.cache)
- cacheFile = open(self.cache, 'w')
- cacheFile.write(xmlString)
- cacheFile.close()
-
if not feed:
- return feed
-
try:
-
def context(self, displayCount=None):
- context = {}
- feed = self.readCache()
-
if not feed:
- feed = self.reCache()
-
if feed:
- channel = Channel(feed.getElementsByTagName("channel")[0])
- feedItems = channel.items(displayCount)
- context['items'] = feedItems
- context['title'] = channel.title
- context['feedURL'] = self.feedURL
- return context
-
def __init__(self, feedURL, cache=None):
The Feed class caches, if possible, the output of the RSS feed, and tries not to make a request more often than once an hour.
I saved this file in an app I have called “resources”. Then I added a “tweet” tag to my templatetags:
[toggle code]
- import resources.rss
- from django import template
- register = template.Library()
-
def tweet(count=1):
- feed = resources.rss.Feed('http://twitter.com/statuses/user_timeline/20020901.rss', "twitter")
- context = feed.context(displayCount=count)
- context['webURL'] = 'http://twitter.com/hoboes'
- return template.loader.render_to_string("parts/tweet.html", context)
- register.simple_tag(tweet)
This uses a dedicated Django template snippet to render the tweets:
[toggle code]
-
<ul class="twitter">
-
{% for tweet in items %}
- <li><a href="{{ tweet.link }}">{{ tweet.title|stripLeadingText:"hoboes:" }}</a></li>
- {% endfor %}
-
{% for tweet in items %}
- </ul>
There’s a filter in there called “stripLeadingText” that I use to remove my Twitter name from the title:
[toggle code]
-
def stripLeadingText(text, toStrip):
- text = text.strip()
-
if text.startswith(toStrip):
- text = text[len(toStrip):]
- text = text.strip()
- return text
- register.filter('stripLeadingText', stripLeadingText)
I can then use a “tweet” template tag to display one or more tweets:
- {% tweet %}
- {% tweet 2 %}
- {% tweet 10 %}
You can also, of course, provide the tweets directly to any template via your views, or turn this into a tweet/endtweet loop for custom tweet HTML on every page.
A couple of caveats:
- You don’t want to use /tmp for your cache files. I just used it so that the example will most likely work.
- Remember to compare dates as UTC time, using, for example, datetime.datetime.utcnow().
- I’ve also seen pubDate formats which use GMT instead of +0000. You may need to account for that if you use multiple feeds and each uses a different format.
- If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp. I make either one or two requests per day, and I’m not sure ETags matter on Twitter anyway.
- August 14, 2009: Using ETag and If-Modified-Since
-
In Django Twitter tag and RSS object I wrote “If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp.”
I ended up needing that for another project. The main difference is that you have to manage HTTP headers, and to manage HTTP headers you have to use urllib2 instead of urllib.
This change will require the addition of two methods, and modifying the reCache method. Also, change the import at the top of the file from urllib to urllib2.
Here’s the new reCache:
[toggle code]
-
def reCache(self):
- feedStream = self.openFeed()
- feed = None
-
if feedStream:
-
try:
- feed = xml.dom.minidom.parse(feedStream)
-
except ExpatError, message:
- print "ExpatError opening URL:", message, self.feedURL
- feed = None
-
except IOError, message:
- print "IOError opening URL:", message, self.feedURL
- feed = None
-
try:
-
if self.cache:
-
if not feed:
- feed = self.readCache(forceRead=True)
-
if feed:
- xmlString = feed.toprettyxml(encoding="utf-8")
- #if last created by a different user, remove it first
- self.ensureWritability(self.cache)
- cacheFile = open(self.cache, 'w')
- cacheFile.write(xmlString)
- cacheFile.close()
-
if not feed:
- return feed
This uses two new functions. One is easy: ensureWritability is the same code as before to make sure that the process that’s running this code can write to the cache file. I’ve moved it off into a separate method, because now we’re going to have to cache an ETag also.
[toggle code]
-
def ensureWritability(self, filepath):
-
if os.path.exists(filepath) and not os.access(filepath, os.W_OK):
- os.remove(filepath)
-
if os.path.exists(filepath) and not os.access(filepath, os.W_OK):
-
def reCache(self):
- Dive Into Python: Handling Last-Modified and ETag
- Using urllib2 to add special headers to request the page only if the page has changed.
- xml.dom.minidom
- “xml.dom.minidom is a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smaller.”
More Django
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- ModelForms and FormViews
- This is just a notice because when I did a search, nothing came up. Don’t use ModelForm with FormView, use UpdateView instead.
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- Custom managers for Django ForeignKeys
- I’ve got one really annoying model for keywords. There’s one category of keywords that, by default, should not show up when used as a ForeignKey for most models. Key word: most.
- Fixing Django 1.2.4’s SuspiciousOperation on filtering
- When you get the message “Filtering by keyword not allowed” in Django 1.2.4, here’s one way to fix it.
- 27 more pages with the topic Django, and other related pages
More Python
- Parsing JSKit/Echo XML comments files
- While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
- Put a relative clock on your Desktop with GeekTool
- There are a lot of desktop clocks that show the absolute time. But sometimes you just want to know if the time is today, or yesterday, or two days ago. Here’s how to do it with Python and GeekTool.
- Multiple tables on the same command
- The way the “random” script currently stands, it does one table at a time. Often, however, you have more than one table you know you’re going to need. Why not use one command to rule them all?
- Easier random tables
- Rather than having to type --table and --count, why not just type the table name and an optional count number?
- Programming for Gamers: Choosing a random item
- If you can understand a roleplaying game’s rules, you can understand programming. Programming is a lot easier.
- 24 more pages with the topic Python, and other related pages
More RSS
- Fixing Django’s feed generator without hacking Django
- It looks like it’s going to be a while before the RSS feed generator in Django is going to get fixed, so I looked into subclassing as a way of getting a working guid in my Django RSS feeds.
- Why I still use RSS
- I still use RSS because connections regularly fail, especially to Twitter.
- Using ETag and If-Modified-Since
- In the article on grabbing an RSS feed, I mentioned that if you’re grabbing a feed more than once a day, you should pay attention to the ETag and the If-Modified-Since headers. Here’s how to do that.
More XML
- Parsing JSKit/Echo XML using PHP
- In the comments, dpusa wants to import JSKit comments into WordPress, which uses PHP. Here’s how to parse them using PHP.
- Parsing JSKit/Echo XML comments files
- While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
- Auto-closing HTML tags in comments
- One of the biggest problems on blogs is that comments often get stuck with unclosed italics, bold, or links. You can automatically close them by transforming the HTML snippet into an XML document.
- minidom self-closes empty SCRIPT tags
- Python’s minidom will self-close empty script tags—as it should. But it turns out that Firefox 3.6 and IE 8 don’t support empty script tags.
- A present for Palm
- Palm needs a little help understanding XML.
- Four more pages with the topic XML, and other related pages
