Django Twitter tag and RSS object
Python’s minidom makes it easy to parse RSS feeds, since RSS feeds are themselves just very simple XML. I wanted to parse my Twitter RSS feed into a context usable by Django templates.
I broke the feed down into the Feed, a Channel, and individual Items. Channels and Items are both XML nodes, so I made them inherit from a Node class that understands what is available in RSS.
[toggle code]
- #!/usr/bin/python
- #provide an RSS feed object for use in Django or Mako templates
- import datetime, os.path, time, urllib, xml.dom.minidom
- from xml.parsers.expat import ExpatError
-
class Node(object):
-
def __init__(self, node):
- self.node = node
- self.title = self.getValue('title')
- self.link = self.getValue('link')
- self.description = self.getValue('description')
-
def __str__(self):
- return self.title
-
def getValue(self, tag):
- node = self.node.getElementsByTagName(tag)[0].firstChild
- data = None
-
if node:
- data = self.node.getElementsByTagName(tag)[0].firstChild.data
- data = data.strip()
- return data
-
def __init__(self, node):
-
class Channel(Node):
-
def items(self, displayCount):
- items = self.node.getElementsByTagName("item")
-
if displayCount:
- items = items[:displayCount]
- feedItems = []
-
for item in items:
- feedItem = Item(item)
- feedItems.append(feedItem)
- return feedItems
-
def items(self, displayCount):
-
class Item(Node):
-
def __init__(self, item):
- super(Item, self).__init__(item)
- self.pubDate = self.getValue('pubDate')
- #provide a datetime for use by Django's date filters
-
def stamp(self):
- #Mon, 16 Mar 2009 13:02:19 +0000
- return datetime.datetime.strptime(self.pubDate, '%a, %d %b %Y %H:%M:%S +0000')
-
def __init__(self, item):
-
class Feed(object):
-
def __init__(self, feedURL, cache=None):
- self.feedURL = feedURL
-
if cache:
- self.cache = '/tmp/' + cache + '.rss'
-
else:
- self.cache = None
- #is the cache fresh enough to use?
-
def freshCache(self):
-
if self.cache and os.path.exists(self.cache):
- #use cache if it is less than sixty minutes old
- freshTime = time.time() - 60*60
-
if os.path.getmtime(self.cache) > freshTime:
- return True
- return False
-
if self.cache and os.path.exists(self.cache):
-
def readCache(self, forceRead=False):
- feed = None
-
if forceRead or self.freshCache():
-
try:
- feed = xml.dom.minidom.parse(open(self.cache))
-
except:
- feed = None
-
try:
- return feed
-
def reCache(self):
-
try:
- feed = xml.dom.minidom.parse(urllib.urlopen(self.feedURL))
-
except ExpatError, message:
- print "ExpatError opening URL:", message
- feed = None
-
except IOError, message:
- print "IOError opening URL:", message
- feed = None
-
if self.cache:
-
if not feed:
- feed = self.readCache(forceRead=True)
-
if feed:
- xmlString = feed.toprettyxml(encoding="utf-8")
- #if last created by a different user, remove it first
-
if os.path.exists(self.cache) and not os.access(self.cache, os.W_OK):
- os.remove(self.cache)
- cacheFile = open(self.cache, 'w')
- cacheFile.write(xmlString)
- cacheFile.close()
-
if not feed:
- return feed
-
try:
-
def context(self, displayCount=None):
- context = {}
- feed = self.readCache()
-
if not feed:
- feed = self.reCache()
-
if feed:
- channel = Channel(feed.getElementsByTagName("channel")[0])
- feedItems = channel.items(displayCount)
- context['items'] = feedItems
- context['title'] = channel.title
- context['feedURL'] = self.feedURL
- return context
-
def __init__(self, feedURL, cache=None):
The Feed class caches, if possible, the output of the RSS feed, and tries not to make a request more often than once an hour.
I saved this file in an app I have called “resources”. Then I added a “tweet” tag to my templatetags:
[toggle code]
- import resources.rss
- from django import template
- register = template.Library()
-
def tweet(count=1):
- feed = resources.rss.Feed('http://twitter.com/statuses/user_timeline/20020901.rss', "twitter")
- context = feed.context(displayCount=count)
- context['webURL'] = 'http://twitter.com/hoboes'
- return template.loader.render_to_string("parts/tweet.html", context)
- register.simple_tag(tweet)
This uses a dedicated Django template snippet to render the tweets:
[toggle code]
-
<ul class="twitter">
-
{% for tweet in items %}
- <li><a href="{{ tweet.link }}">{{ tweet.title|stripLeadingText:"hoboes:" }}</a></li>
- {% endfor %}
-
{% for tweet in items %}
- </ul>
There’s a filter in there called “stripLeadingText” that I use to remove my Twitter name from the title:
[toggle code]
-
def stripLeadingText(text, toStrip):
- text = text.strip()
-
if text.startswith(toStrip):
- text = text[len(toStrip):]
- text = text.strip()
- return text
- register.filter('stripLeadingText', stripLeadingText)
I can then use a “tweet” template tag to display one or more tweets:
- {% tweet %}
- {% tweet 2 %}
- {% tweet 10 %}
You can also, of course, provide the tweets directly to any template via your views, or turn this into a tweet/endtweet loop for custom tweet HTML on every page.
A couple of caveats:
- You don’t want to use /tmp for your cache files. I just used it so that the example will most likely work.
- Remember to compare dates as UTC time, using, for example, datetime.datetime.utcnow().
- I’ve also seen pubDate formats which use GMT instead of +0000. You may need to account for that if you use multiple feeds and each uses a different format.
- If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp. I make either one or two requests per day, and I’m not sure ETags matter on Twitter anyway.
- August 14, 2009: Using ETag and If-Modified-Since
-
In Django Twitter tag and RSS object I wrote “If to-the-second timeliness is important, you’ll want to pay attention to the ETags that Twitter sends as well as throttling based on the cache’s timestamp.”
I ended up needing that for another project. The main difference is that you have to manage HTTP headers, and to manage HTTP headers you have to use urllib2 instead of urllib.
This change will require the addition of two methods, and modifying the reCache method. Also, change the import at the top of the file from urllib to urllib2.
Here’s the new reCache:
[toggle code]
-
def reCache(self):
- feedStream = self.openFeed()
- feed = None
-
if feedStream:
-
try:
- feed = xml.dom.minidom.parse(feedStream)
-
except ExpatError, message:
- print "ExpatError opening URL:", message, self.feedURL
- feed = None
-
except IOError, message:
- print "IOError opening URL:", message, self.feedURL
- feed = None
-
try:
-
if self.cache:
-
if not feed:
- feed = self.readCache(forceRead=True)
-
if feed:
- xmlString = feed.toprettyxml(encoding="utf-8")
- #if last created by a different user, remove it first
- self.ensureWritability(self.cache)
- cacheFile = open(self.cache, 'w')
- cacheFile.write(xmlString)
- cacheFile.close()
-
if not feed:
- return feed
This uses two new functions. One is easy: ensureWritability is the same code as before to make sure that the process that’s running this code can write to the cache file. I’ve moved it off into a separate method, because now we’re going to have to cache an ETag also.
[toggle code]
-
def ensureWritability(self, filepath):
-
if os.path.exists(filepath) and not os.access(filepath, os.W_OK):
- os.remove(filepath)
-
if os.path.exists(filepath) and not os.access(filepath, os.W_OK):
-
def reCache(self):
- Dive Into Python: Handling Last-Modified and ETag
- Using urllib2 to add special headers to request the page only if the page has changed.
- xml.dom.minidom
- “xml.dom.minidom is a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smaller.”
More Django
- Reusing Django’s filter_horizontal
- Just as with pop-ups, it’s possible to use the built-in JavaScript for filtering multiple-selection popups on custom forms.
- Django formsets and date/time fields
- Date/Time fields in Django formsets appear to have incompatible default values, resulting in forms using them always looking as though they’ve got a new entry when they don’t.
- Multiple Input Fields with multiple inheritance
- We needed to display one TextField as either a TextInput or a Textarea, depending on the value in the field. Multiple inheritance makes it easy, if a bit wonky.
- Django tutorial mostly ready
- My long-promised Django tutorial is pretty much ready. It’s still designed around an in-person tutorial, but you should be able to get started using it even if you’re on your own.
- Django: Beyond the SQL
- Django is a great application framework for Python and web applications. You can use it to greatly speed up your database and application development both on the web and on the command line. This tutorial is currently a very rough draft; it probably won’t be very useful without the assistance of someone who knows Django running the tutorial. If I ever run this tutorial a second time, I’ll probably update it with screenshots to make it more usable for individuals.
- 22 more pages with the topic Django, and other related pages
More Python
- Multiple Input Fields with multiple inheritance
- We needed to display one TextField as either a TextInput or a Textarea, depending on the value in the field. Multiple inheritance makes it easy, if a bit wonky.
- PyTown
- General rambling in code regarding Python, Mailman, and Django.
- Thinking Python: Django cache expiration time
- Django sets the expiration time when data is cached. Sometimes it makes more sense to expire data dynamically based on later changes to the database. Does this mean a change to CacheClass? Not necessarily.
- Excerpting partial XHTML using minidom
- You can use xml.dom.minidom to parse partial XHTML as long as you use a few tricks and don’t mind that getElementById doesn’t work.
- Media duration in Python on Mac OS X
- It turns out to be very easy to get the duration of MP3 files, MPEGs, and other media files on the OS X command line.
- 18 more pages with the topic Python, and other related pages
More XML
- A present for Palm
- Palm needs a little help understanding XML.
- Automatically distributing images within XHTML
- One of the nice things about XHTML is that the tools for reading XML have finally matured. So if, for example, I want to have a series of images automatically placed within my web page, I can parse the XHTML of the content to ensure that nothing is broken.
- Add nodes to SimpleXMLElement
- If you want to add child nodes in PHP’s SimpleXML, the correct way to do it is to add the node first, then create it.
- Code formatting Django tag
- xml.dom.minidom makes it easy to format code snippets into lists.
- Excerpting partial XHTML using minidom
- You can use xml.dom.minidom to parse partial XHTML as long as you use a few tricks and don’t mind that getElementById doesn’t work.
More RSS
- Using ETag and If-Modified-Since
- In the article on grabbing an RSS feed, I mentioned that if you’re grabbing a feed more than once a day, you should pay attention to the ETag and the If-Modified-Since headers. Here’s how to do that.
