Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, Swift, BASIC, and whatever else I happen to feel like hacking at.

No premature optimization

Jerry Stratton, June 15, 2022

I’m still learning amazing things about Perl. I have a recipe search—it’s in the book—that includes tags such as “s” for “sour milk”, so I can find recipes that use sour milk when I have some. In another window as I’m writing this, I decided I don’t want to have to remember the codes, so I added the ability to use --tag "sour milk" (or whatever) in addition to --tag S. It works perfectly.

But it can’t be working. I have a test line that prints on every iteration through the loop, and the test print only prints once. For all the thousands of recipes it’s searching, I only see “HELLO WORLD” one time.

[toggle code]

  • sub tagMatches {
    • my @recipeTags = @_;
    • #allow either 1-2 letter code or full name
    • foreach my $term (@tagsWanted) {
        • if (length($term) > 2) {
          • print "HELLO WORLD\n";
          • my $tagKey = lc($term);
          • $term = $tagIndex{$tagKey};
          • die("Unknown tag $tagKey\n") if $term eq '';
        • }
        • $term = uc($term);
        • return 0 if !grep(/^$term$/, @recipeTags);
    • }
    • return 1;
  • }

That’s a simplified version of the actual subroutine. It can’t be run without the rest of the code. Here’s a simpler example that can be run on its own:

[toggle code]

  • #!/usr/bin/perl</p>
  • @secrets = ('door', 'floor', 'food', 'grass', 'hand', 'heart', 'key', 'lamp', 'smile', 'tree');
  • foreach $secret (@secrets) {
    • foreach $word (@ARGV) {
      • if ($word =~ m/[A-Z!-\/?]/) {
        • print "Normalizing $word\n";
        • $word = lc($word);
        • $word =~ s/[!-\/?]//g;
      • }
      • print "You've said the secret word, “$word”!\n" if $word eq $secret;
    • }
  • }
Lucid Code

This extremely contrived example has a list of secrets. For every item on the command line, it loops through that list of secrets. Within that outer loop, it loops over the list of items on the command line. It ensures that everything is lowercase and that there is no punctuation before testing to see if that item is a secret word.

The secrets array has ten items in it. The script thus loops over @ARGV ten times, once for each secret. For someone who learned programming in the old school this screams for preoptimizing. There is no reason to normalize every item ten times when it only needs to be done once. I should run the conversion once and create a new array, ahead of time.

Shouldn’t I?

The obvious answer is that both computers and scripting languages today are so fast that such optimization won’t make a visible difference.

But there’s a less obvious answer, too. Run the script. How many times do we get the message about normalization?

  • ./groucho The green was the light the tree did not consume.
  • Normalizing The
  • Normalizing consume.
  • You've said the secret word, “tree”!
  • %

Once. Each word is only normalized once. Perl “recognizes” what I’m doing here, and it handles the optimization automatically. I’ve been programming in Perl for decades, and I had no idea it did this. The only reason I noticed it this time is that (a) I often don’t bother pre-optimizing code unless I need the speed, and (b) I happened to be running some tests inside the loop. Clearly, the loop wasn’t working. And yet just as clearly, it was producing the correct output.

And so I learned that Perl recognizes this exact situation for me and does what I thought needed doing, automatically.

I’m sure I have lots of code that pre-optimizes arrays by creating new ones with standard values, because it simply never occurred to me that Perl might do this automatically without any work on my part. This is probably the strangest example of why premature optimization is wrong that I’ve seen in a long time.

The focus, especially for weekend scripting, should always be writing that code that works, that performs its function well.

After a script is working, then optimizing for speed, or memory usage, or even length, can be fun and might even be necessary. It depends on what your needs are and what’s you enjoy. It is rarely necessary to optimize scripts for length nowadays; scripts are text and take up no space on your hard drive. But when I wrote 42 Astounding Scripts I did optimize some of the scripts for length, because I wanted to make them easier to type. Different use, different optimization needs.

I often don’t even know what a script is really going to be used for until after it’s written and I discover all sorts of new use cases for it. The script that I use to create the memes at the top of most of my posts started out as a script to turn tables into images.

Optimizing ahead of time isn’t just going to often be a waste of time, it will create unoptimized code. You don’t really know what optimization even means until the script is finished. What looks obvious in an unfinished script may turn out to block a different, more important, optimization later. Even worse, it might block a more important use later.

  1. <- pipewalk