The basic Perl filter: Splitting and printing

  1. Basic regular expressions
  2. The basic Perl filter
  3. Comments

Our search is working nicely, but it is returning a lot more information than we really need. The data file contains quite a bit of information about our songs. It’d be nice to display it in a more useful form. To do this, we first need to break it up into its pieces.

Looking at the data, it looks like each line consists of a song title, a duration, an artist name, an album name, a year, some number from 0 to 100, a timestamp, its position in the tracks on the album, and a genre. Each of these items is separated by a tab character. Replace the print line with the following two lines:

($song, $duration, $artist, $album) = split("\t");
print "$song ($album, by $artist)\n" if /$searchFor/i;

Both of these new lines illustrate some important features of Perl, so let’s take them piece by piece:

($song, $duration, $artist, $album) = split("\t");

The “=” character means that we’re doing an assignment here. Whatever happens on the right is going to get assigned to the stuff we have on the left. The right is the split function. This function splits a piece of text into pieces, based on another piece of text. The text we’re splitting on is the tab character. In Unix, the tab is often specified using backslash-t, or “\t”. The text we’re splitting is the current line, because we didn’t tell Perl what text to split. So, if the current line has eight tabs, this is going to split it into nine pieces.

On the left, we have a list of variables. They all begin with dollar signs, so we know that they are all expecting a single piece. Perl is going to take those pieces we generated on the right and assign them, in order to the variables we’ve specified on the left. If there are more pieces on the right than on the left, the extra pieces will be ignored.

So when this line is done, we should have a variable containing the song title, duration of the song, the artist’s name, and the album’s name. We’ll ignore everything else for now.

print "$song ($album, by $artist)\n" if /$searchFor/i;

Up until now, we’ve been letting Perl print out the current line as soon as it comes through. Our new print command actually has something to print. We’re telling print to print a piece of text consisting of the variable $song, a space, a parentheses, the variable $album, a comma, a space, the word “by”, another space, the variable $artist, a closing parentheses, and then a “new line” character. Just as we saw for tabs, the new-line character has a special code beginning with a backslash: “\n”.

In Perl, most text, whether your are printing it or assigning to a variable, will be surrounded by quotes. If you surround text with double-quotes, any variables inside that text will be “interpreted” and replaced with their values. If you use single-quotes, Perl will not search the text for variables.

So, go ahead and try the new version of show:

./show Voices songs.txt

You should see, in a much more readable form, a skit by George Carlin, songs from Nanci Griffith’s Other Voices, Other Rooms, and several songs that contain the word “voices”.

  1. Basic regular expressions
  2. The basic Perl filter
  3. Comments