Thinking Python: Django cache expiration time
We have a task/project manager in which each task can have any other task as its parent, and different tasks can be applied to different employees. To show a project for any one employee involves collecting the tasks assigned to them, and collecting connected tasks into projects. And then counting up the completed tasks vs. the incompleted tasks to show a progress bar and an estimated time of completion.
Our skunkworks Django server is a bit old, and can be slow to display complex projects. I’ve tried to convince them not to create complex projects (most of the offenders are never-ending projects that should be broken into smaller projects that can actually be completed), but that’s an uphill battle. So I started looking into Django’s caching.
The first thing I noticed is that Django’s caching is built solely around caches expiring themselves. There’s nothing within the cache object for dynamic expiration based on new data coming in. It doesn’t even have a means of getting the timestamp of when the cache was created.
I ended up looking into the cache code, and found that cache.get() looks at the expiration time before returning the cached data. So I added a get_stamp() method to CacheClass in django/core/cache/backends/filebased.py:
[toggle code]
-
def get_stamp(self, key, timeout=None):
- fname = self._key_to_file(key)
-
try:
- f = open(fname, 'rb')
- exp = pickle.load(f)
- f.close()
-
if timeout:
- exp = exp - timeout
- return exp
-
except(IOError, OSError, EOFError, pickle.PickleError):
- pass
- return 0
As you can see, it runs into a problem immediately: what I want is when the cache was created, but Django’s caches are very simple. They store only the absolutely necessary information from the perspective of the cache: when does the cache expire, and what does the cache contain? It does not store the time the cache was created. So this method accepts the same timeout value that cache.get() accepts in order to calculate when the cache was most likely created.
Here’s how I used it:
[toggle code]
- from django.core.cache import cache
- import datetime
-
def personalProjects(self):
- #cache for a long time (one day)--if there's new stuff, we'll recache anyway
- cacheTime = 60*60*24
- cacheKey = 'personal-projects-' + str(self.id)
- rawTasks = Task.objects.filter(assignees=self)
- #if anything has changed since the last cache, recreate the cache
- recentTasks = rawTasks.filter(edited__gte=datetime.datetime.fromtimestamp(cache.get_stamp(cacheKey, cacheTime)))
-
if not recentTasks:
- openProjects = cache.get(cacheKey)
-
if openProjects:
- return openProjects
- …
- #cache both the time of this cache, and the open projects
- cache.set(cacheKey, openProjects, cacheTime)
- return openProjects
It seems to work fine. However, this seems like pretty basic functionality. Whenever I see a major project, like Django, missing what I consider to be basic functionality, I know there’s a good chance I’m missing something. So I posted the proposed addition to CacheClass to the Django developers newsgroup. The first workaround proposed (by Thomas Adamcik) was a very Pythonesque solution: use tuples. Instead of caching the data, cache a tuple of the current timestamp and the data.
Simply using cache.set('your_cache_key', (datetime.now(), your_value)) (or time() instead of datetime.now()) should allow you to keep track of this in a way which doesn’t require modifying any core Django functionality :)
The problem with modifying core functionality is that you have to do it every time you upgrade, unless (and until) your modifications make it into the distributed code. So I try to avoid this whenever possible. Using tuples instead of returning the modified expiration time means that not only don’t I have to modify Django’s code, I can store the real cache time instead of guessing it based on expiration time:
[toggle code]
- from django.core.cache import cache
- import datetime
-
def personalProjects(self):
- #cache for a long time (one day)--if there's new stuff, we'll recache anyway
- cacheTime = 60*60*24
- cacheKey = 'personal-project-' + str(self.id)
- (cacheStamp, cachedProjects) = cache.get(cacheKey, (None, None))
- rawTasks = Task.objects.filter(assignees=self)
-
if cachedProjects and cacheStamp:
- #if nothing has changed since the last cache, return the last cache
- recentTasks = rawTasks.filter(edited__gte=cacheStamp)
-
if not recentTasks:
- return cachedProjects
- …
- #cache both the time of this cache, and the open projects
- cache.set(cacheKey, (datetime.datetime.now(), openProjects), cacheTime)
- return openProjects
Note that I changed the cache key from personal-projects-id to personal-project-id. If the cache key remained the same, existing caches would fail when returned, since they aren’t yet tuples.
- get_stamp for CacheClass?: Jerry Stratton
- “As I’m working with caches, I’ve found myself wanting to know when the cache was created, to compare against the last time some data was updated.”
More Python
- Parsing JSKit/Echo XML comments files
- While I’m not a big fan of remote comment systems for privacy reasons, I was willing to use JSKit as a temporary solution because they provide an easy XML dump of posted comments. This weekend, I finally moved my main blog to custom comments; here’s how I parsed JSKit’s XML file.
- Put a relative clock on your Desktop with GeekTool
- There are a lot of desktop clocks that show the absolute time. But sometimes you just want to know if the time is today, or yesterday, or two days ago. Here’s how to do it with Python and GeekTool.
- Multiple tables on the same command
- The way the “random” script currently stands, it does one table at a time. Often, however, you have more than one table you know you’re going to need. Why not use one command to rule them all?
- Easier random tables
- Rather than having to type --table and --count, why not just type the table name and an optional count number?
- Programming for Gamers: Choosing a random item
- If you can understand a roleplaying game’s rules, you can understand programming. Programming is a lot easier.
- 24 more pages with the topic Python, and other related pages
More Django
- Django: fix_ampersands and abbreviations
- The fix_ampersands filter will miss some cases where ampersands need to be replaced.
- Custom managers for Django ForeignKeys
- I’ve got one really annoying model for keywords. There’s one category of keywords that, by default, should not show up when used as a ForeignKey for most models. Key word: most.
- Fixing Django 1.2.4’s SuspiciousOperation on filtering
- When you get the message “Filtering by keyword not allowed” in Django 1.2.4, here’s one way to fix it.
- Reusing Django’s filter_horizontal
- Just as with pop-ups, it’s possible to use the built-in JavaScript for filtering multiple-selection popups on custom forms.
- Django formsets and date/time fields
- Date/Time fields in Django formsets appear to have incompatible default values, resulting in forms using them always looking as though they’ve got a new entry when they don’t.
- 25 more pages with the topic Django, and other related pages
