Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Design for the future, but don’t code for the future

Jerry Stratton, February 24, 2008

One of the reasons our current coding policy treads dangerously close to “not invented here” syndrome is that several years ago we ended up getting a lot of bad code from a web design firm. One of the “features” in their code that we were fortunately able to get removed was that it created a session for every single page just in case we needed it in the future. I noticed it because it broke any of our pages that were already using sessions.

When I realized that it was creating a session file on the server for every visitor to our site, I immediately asked them to remove it. After an exchange in which they tried to convince us to (a) use their code, (b) not to worry about the extra server load, and (c) that they’d be happy to recode it to use memory-based sessions, we finally convinced them to take the code out.

Ivan Sagalaev’s troubles reminded me of that exchange:

The problem was in the usage of sessions (not in them per se though but in the way we used them). We store there one-off user messages that are shown once and then removed. For this we had this code in a context processor… And since it is in a context processor it was executed on every request.

Due to the interaction between a few of their systems, this feature caused their web site to slow to a crawl under an advertising-induced load. The kicker:

By the way… The bitter irony of the story is that we in fact don’t use this messages subsystem. The service was killed by a feature that didn’t exist.

This is one of the reasons why our programming style guide links to Wil Shipley’s line about how less source code is better:

There are some interesting corollaries here. For instance, if you’re writing a class to display some text in red (for some reason), don’t add a bunch of methods “for the future” that allow you to draw the text in blue or green or purple. Because that’s more code than you need right now, and “less code is better.”

The lesson I’m getting at is, don’t try to make code general until you actually need it in more than one place. The worst libraries in the world are the ones people write without actually writing any code that uses them to do actual work for actual users.

As an aside, Ivan also writes about a different problem he calls the “dog-pile” effect, where cached pages can cause a cascading load increase when caches are regenerated. That is, if the server is getting a lot of requests, then when a cached page comes up for recaching it will get hit by more than one request during the recache process. If recaching is expensive, load spikes during this process.

It’s a tricky issue; on this blog I solve it by cheating. When a page comes up for recache, the first thing the process does is mark it as having already been recached. This should minimize the number of processes performing a recache; later processes just keep getting the old page until the first process writes out the new version.

In theory this opens up the possibility that a page will get marked as having been recached, the process will stop before it’s completed, and then an old page is live when it shouldn’t be. In the eight years I’ve been using this software, I’ve yet to see that happen, but of course I probably don’t get nearly as many requests as they do.

  1. <- Apache Modules
  2. Apache Network IO ->