Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Thinking Python: Django cache expiration time

Jerry Stratton, September 4, 2009

We have a task/project manager in which each task can have any other task as its parent, and different tasks can be applied to different employees. To show a project for any one employee involves collecting the tasks assigned to them, and collecting connected tasks into projects. And then counting up the completed tasks vs. the incompleted tasks to show a progress bar and an estimated time of completion.

Our skunkworks Django server is a bit old, and can be slow to display complex projects. I’ve tried to convince them not to create complex projects (most of the offenders are never-ending projects that should be broken into smaller projects that can actually be completed), but that’s an uphill battle. So I started looking into Django’s caching.

The first thing I noticed is that Django’s caching is built solely around caches expiring themselves. There’s nothing within the cache object for dynamic expiration based on new data coming in. It doesn’t even have a means of getting the timestamp of when the cache was created.

I ended up looking into the cache code, and found that cache.get() looks at the expiration time before returning the cached data. So I added a get_stamp() method to CacheClass in django/core/cache/backends/filebased.py:

[toggle code]

  • def get_stamp(self, key, timeout=None):
    • fname = self._key_to_file(key)
    • try:
      • f = open(fname, 'rb')
      • exp = pickle.load(f)
      • f.close()
      • if timeout:
        • exp = exp - timeout
      • return exp
    • except(IOError, OSError, EOFError, pickle.PickleError):
      • pass
    • return 0

As you can see, it runs into a problem immediately: what I want is when the cache was created, but Django’s caches are very simple. They store only the absolutely necessary information from the perspective of the cache: when does the cache expire, and what does the cache contain? It does not store the time the cache was created. So this method accepts the same timeout value that cache.get() accepts in order to calculate when the cache was most likely created.

Here’s how I used it:

[toggle code]

  • from django.core.cache import cache
  • import datetime
  • def personalProjects(self):
    • #cache for a long time (one day)--if there's new stuff, we'll recache anyway
    • cacheTime = 60*60*24
    • cacheKey = 'personal-projects-' + str(self.id)
    • rawTasks = Task.objects.filter(assignees=self)
    • #if anything has changed since the last cache, recreate the cache
    • recentTasks = rawTasks.filter(edited__gte=datetime.datetime.fromtimestamp(cache.get_stamp(cacheKey, cacheTime)))
    • if not recentTasks:
      • openProjects = cache.get(cacheKey)
      • if openProjects:
        • return openProjects
    • #cache both the time of this cache, and the open projects
    • cache.set(cacheKey, openProjects, cacheTime)
    • return openProjects

It seems to work fine. However, this seems like pretty basic functionality. Whenever I see a major project, like Django, missing what I consider to be basic functionality, I know there’s a good chance I’m missing something. So I posted the proposed addition to CacheClass to the Django developers newsgroup. The first workaround proposed (by Thomas Adamcik) was a very Pythonesque solution: use tuples. Instead of caching the data, cache a tuple of the current timestamp and the data.

Simply using cache.set('your_cache_key', (datetime.now(), your_value)) (or time() instead of datetime.now()) should allow you to keep track of this in a way which doesn’t require modifying any core Django functionality :)

The problem with modifying core functionality is that you have to do it every time you upgrade, unless (and until) your modifications make it into the distributed code. So I try to avoid this whenever possible. Using tuples instead of returning the modified expiration time means that not only don’t I have to modify Django’s code, I can store the real cache time instead of guessing it based on expiration time:

[toggle code]

  • from django.core.cache import cache
  • import datetime
  • def personalProjects(self):
    • #cache for a long time (one day)--if there's new stuff, we'll recache anyway
    • cacheTime = 60*60*24
    • cacheKey = 'personal-project-' + str(self.id)
    • (cacheStamp, cachedProjects) = cache.get(cacheKey, (None, None))
    • rawTasks = Task.objects.filter(assignees=self)
    • if cachedProjects and cacheStamp:
      • #if nothing has changed since the last cache, return the last cache
      • recentTasks = rawTasks.filter(edited__gte=cacheStamp)
      • if not recentTasks:
        • return cachedProjects
    • #cache both the time of this cache, and the open projects
    • cache.set(cacheKey, (datetime.datetime.now(), openProjects), cacheTime)
    • return openProjects

Note that I changed the cache key from personal-projects-id to personal-project-id. If the cache key remained the same, existing caches would fail when returned, since they aren’t yet tuples.

  1. <- Combining architectures
  2. Snow Leopard Server ->