Perls Before Swine: Arrays and functions

  1. Smarter scripts
  2. Perls Before Swine
  3. Custom search

We’ve done a little bit with an array already: the list of arguments to the script is a simple array. We’ve only ever referenced the first item in that array, shifting that first item out so that the next items is now first. We can do quite a bit more with arrays in Perl.

Besides simple arrays, there are also associative arrays. An associative array is one which, instead of using numbers to reference the values in the array, uses keys. It associates a key with a value. So instead of asking for the first, second, or third item in the list, you can ask for the value that corresponds to “The Band”, or the value that corresponds to “Jane Jensen”.

For example, we might want to create a third format, one that summarizes songs by artist, showing how many songs each artist has in the matches.

If we’re going to have a bunch of formats, it will be easier to keep a list of them. Add the following lines just above the “strip off the command-line switches” section:

#options for the --format switch
@validFormats = ("raw", "simple", "summary");
$validFormats = join(", ", @validFormats);

The first line (below the comment) assigns a simple array of three items: raw, simple, and summary. I mentioned it in passing earlier, but all simple arrays begin with the @ symbol.

The second line assigns the result of the “join” function to a scalar variable called $validFormats. The “join” function combines an array into a scalar, using the first argument as its glue. Here, we specify a command and a space as the “glue”, so $validFormats will be “raw, simple, summary”.

Functions are like subroutines, but they are built in to Perl.

Don’t get confused by the fact that the scalar variable $validFormats and the simple array @validFormats have the same text for their name. They are not the same variable, and as far as Perl is concerned they are completely unrelated.

Now, inside the switches area, change, the “if ($format ne…” line and the print following it to:

if (!grep(/^$format$/, @validFormats)) {
print "\nFormat must be $validFormats.\n\n";

The second line is simple enough: instead of us typing the valid formats, we’re using the automatically-created variable that holds them as a piece of text.

The first line uses the grep function to check whether or not $format exists in the array @validFormats. Like join, grep takes two arguments, and the second one is a list. The first one, however, is a regular expression. So in that line, grep is checking to see if any of the items in @validFormats begins and ends with $format: the caret anchors $format to the beginning, and the dollar sign anchors it to the end.

Go ahead and try a few options and see how they work. Both ‘simple’ and ‘summary’ will currently do the same thing, since we haven’t added any code for ‘summary’.

./show --format unknown girl aerosmith.txt

./show --format raw girl aerosmith.txt

./show --format summary girl aerosmith.txt

So the next step is to handle the summary format. Where the script prints out the song information, between the raw and simple format, add:

} elsif ($format eq "summary") {
$artists{$artist}++;

That section should now be:

if ($format eq "raw") {
print;
} elsif ($format eq "summary") {
$artists{$artist}++;
} else {
print "$song ($album, by $artist)\n";
}

Go ahead and try for a summary:

./show --format summary girl songs.txt

Nothing should happen. When we ask for a summary, we are no longer printing anything, but only keeping track by incrementing… what are we incrementing?

$artists{$artist}++;

The “++” we’ve already met: it increments the variable to the left of it. The variable to the left looks vaguely like a value from a simple array, except that instead of using square brackets we’re using curly brackets. That’s how you tell the difference between a simple array and an associative array. Simple arrays use square brackets to get at their individual values, and associative arrays use curly brackets to get at their individual values.

If $artist contains “Eurythmics”, this will add one to the value of $artists{"Eurythmics"}. If that value didn’t previously exist, it is assumed to be 0 and now is 1. If it was 1, it is now 2, and so on.

Finally, just outside of the end of the while block that loops through the song information, we can print out the summary:

if (%artists) {
@artists = keys %artists;
@artists = sort @artists;
foreach $artist (@artists) {
$artistCount = $artists{$artist};
print "$artist: $artistCount\n";
}
}

If the associative array %artists exists—that is, if we’ve been keeping track of how many songs each artist has—we’ll perform the rest of this if block.

The first line inside the block gets the keys out of the %artists associative array. The keys are a simple list, so they go into @artists.

The second line sorts @artists, and then assigns the sorted @artists back to itself.

The next block is a foreach block. Very much like a while block, it loops through its lines for as long as it has something to loop through. The difference is that foreach gets its things to loop through from a simple array, in this case @artists. Foreach places each piece into the first item, in this case the scalar variable $artist.

So if there are three matching artists, Pink Floyd, Warren Zevon, and Stillwater, the first time through $artist will contain “Pink Floyd”, the second time through “Warren Zevon”, and the third time through “Stillwater”.

Inside the foreach block, the first line assigns the artist’s total songs to the variable $artistCount, and the second line prints out the artist’s name and count.

./show --format summary stand songs.txt

You should get several lines, including that Bing Crosby has 23 songs, Taco 11, and William S. Burroughs 1 matching “stand”.

Change the help subroutine to reflect the new format:

print "\t--format <$validFormats>: choose format for results\n";

  1. Smarter scripts
  2. Perls Before Swine
  3. Custom search