Thinking Python: Django cache expiration time
We have a task/project manager in which each task can have any other task as its parent, and different tasks can be applied to different employees. To show a project for any one employee involves collecting the tasks assigned to them, and collecting connected tasks into projects. And then counting up the completed tasks vs. the incompleted tasks to show a progress bar and an estimated time of completion.
Our skunkworks Django server is a bit old, and can be slow to display complex projects. I’ve tried to convince them not to create complex projects (most of the offenders are never-ending projects that should be broken into smaller projects that can actually be completed), but that’s an uphill battle. So I started looking into Django’s caching.
The first thing I noticed is that Django’s caching is built solely around caches expiring themselves. There’s nothing within the cache object for dynamic expiration based on new data coming in. It doesn’t even have a means of getting the timestamp of when the cache was created.
I ended up looking into the cache code, and found that cache.get() looks at the expiration time before returning the cached data. So I added a get_stamp() method to CacheClass in django/core/cache/backends/filebased.py:
[toggle code]
-
def get_stamp(self, key, timeout=None):
- fname = self._key_to_file(key)
-
try:
- f = open(fname, 'rb')
- exp = pickle.load(f)
- f.close()
-
if timeout:
- exp = exp - timeout
- return exp
-
except(IOError, OSError, EOFError, pickle.PickleError):
- pass
- return 0
As you can see, it runs into a problem immediately: what I want is when the cache was created, but Django’s caches are very simple. They store only the absolutely necessary information from the perspective of the cache: when does the cache expire, and what does the cache contain? It does not store the time the cache was created. So this method accepts the same timeout value that cache.get() accepts in order to calculate when the cache was most likely created.
Here’s how I used it:
[toggle code]
- from django.core.cache import cache
- import datetime
-
def personalProjects(self):
- #cache for a long time (one day)--if there's new stuff, we'll recache anyway
- cacheTime = 60*60*24
- cacheKey = 'personal-projects-' + str(self.id)
- rawTasks = Task.objects.filter(assignees=self)
- #if anything has changed since the last cache, recreate the cache
- recentTasks = rawTasks.filter(edited__gte=datetime.datetime.fromtimestamp(cache.get_stamp(cacheKey, cacheTime)))
-
if not recentTasks:
- openProjects = cache.get(cacheKey)
-
if openProjects:
- return openProjects
- …
- #cache both the time of this cache, and the open projects
- cache.set(cacheKey, openProjects, cacheTime)
- return openProjects
It seems to work fine. However, this seems like pretty basic functionality. Whenever I see a major project, like Django, missing what I consider to be basic functionality, I know there’s a good chance I’m missing something. So I posted the proposed addition to CacheClass to the Django developers newsgroup. The first workaround proposed (by Thomas Adamcik) was a very Pythonesque solution: use tuples. Instead of caching the data, cache a tuple of the current timestamp and the data.
Simply using cache.set('your_cache_key', (datetime.now(), your_value)) (or time() instead of datetime.now()) should allow you to keep track of this in a way which doesn’t require modifying any core Django functionality :)
The problem with modifying core functionality is that you have to do it every time you upgrade, unless (and until) your modifications make it into the distributed code. So I try to avoid this whenever possible. Using tuples instead of returning the modified expiration time means that not only don’t I have to modify Django’s code, I can store the real cache time instead of guessing it based on expiration time:
[toggle code]
- from django.core.cache import cache
- import datetime
-
def personalProjects(self):
- #cache for a long time (one day)--if there's new stuff, we'll recache anyway
- cacheTime = 60*60*24
- cacheKey = 'personal-project-' + str(self.id)
- (cacheStamp, cachedProjects) = cache.get(cacheKey, (None, None))
- rawTasks = Task.objects.filter(assignees=self)
-
if cachedProjects and cacheStamp:
- #if nothing has changed since the last cache, return the last cache
- recentTasks = rawTasks.filter(edited__gte=cacheStamp)
-
if not recentTasks:
- return cachedProjects
- …
- #cache both the time of this cache, and the open projects
- cache.set(cacheKey, (datetime.datetime.now(), openProjects), cacheTime)
- return openProjects
Note that I changed the cache key from personal-projects-id to personal-project-id. If the cache key remained the same, existing caches would fail when returned, since they aren’t yet tuples.
- get_stamp for CacheClass?: Jerry Stratton
- “As I’m working with caches, I’ve found myself wanting to know when the cache was created, to compare against the last time some data was updated.”
More Python
- Multiple Input Fields with multiple inheritance
- We needed to display one TextField as either a TextInput or a Textarea, depending on the value in the field. Multiple inheritance makes it easy, if a bit wonky.
- PyTown
- General rambling in code regarding Python, Mailman, and Django.
- Django Twitter tag and RSS object
- I wanted to embed my twitter feed into my Django blog, and didn’t see any simple RSS readers for Python that did what I wanted.
- Excerpting partial XHTML using minidom
- You can use xml.dom.minidom to parse partial XHTML as long as you use a few tricks and don’t mind that getElementById doesn’t work.
- Media duration in Python on Mac OS X
- It turns out to be very easy to get the duration of MP3 files, MPEGs, and other media files on the OS X command line.
- 18 more pages with the topic Python, and other related pages
More Django
- Reusing Django’s filter_horizontal
- Just as with pop-ups, it’s possible to use the built-in JavaScript for filtering multiple-selection popups on custom forms.
- Django formsets and date/time fields
- Date/Time fields in Django formsets appear to have incompatible default values, resulting in forms using them always looking as though they’ve got a new entry when they don’t.
- Multiple Input Fields with multiple inheritance
- We needed to display one TextField as either a TextInput or a Textarea, depending on the value in the field. Multiple inheritance makes it easy, if a bit wonky.
- Django tutorial mostly ready
- My long-promised Django tutorial is pretty much ready. It’s still designed around an in-person tutorial, but you should be able to get started using it even if you’re on your own.
- Django: Beyond the SQL
- Django is a great application framework for Python and web applications. You can use it to greatly speed up your database and application development both on the web and on the command line. This tutorial is currently a very rough draft; it probably won’t be very useful without the assistance of someone who knows Django running the tutorial. If I ever run this tutorial a second time, I’ll probably update it with screenshots to make it more usable for individuals.
- 22 more pages with the topic Django, and other related pages
