The basic Perl filter: Basic regular expressions

  1. Indentation
  2. The basic Perl filter
  3. Splitting and printing

Our current script is a filter. It takes some raw data on one end, filters it, and produces modified data on the other end. You can also think of a filter as a sausage grinder. Our filter, however, is a very leaky sieve: it currently lets everything through.

One of the things that Perl does very well is to filter what it gets according to a regular expression. A regular expression is a very versatile form of searching. Change the print line to:

print if /Mellow/;

And then type:

./show songs.txt

You should see all of the songs in Donovan’s Mellow Yellow album as well as a few songs by the Mellow Men.

In Perl, everything between two slashes, like that, is a regular expression. Since we didn’t tell Perl what text we want the regular expression to apply to, it assumes we want to apply it to the current line.

Any command or function followed by “if expression” will be performed only if that expression returns something. If it returns nothing (as it will in this case when the line does not contain Mellow), that line does not get performed. In this case, that line does not get printed.

It would be annoying to have to edit our script every time we wanted to look for some text in our file. So we can modify the script to look on the command line for the text we want. Add one line to the script, so that it now reads:

#!/usr/bin/perl
$searchFor = shift;
while (<>) {
print if /$searchFor/;
}

Let’s look for all mentions of Yellow in the song listing:

./show Yellow songs.txt

You should see a bunch of songs from Elton John’s Goodbye Yellow Brick Road and Donovan’s Mellow Yellow, and a few versions of Joni Mitchell’s Big Yellow Taxi.

We’ll talk more specifically about grabbing stuff from the command line in a bit, but the shift command grabs the first item off of a list of items. If you don’t specify a list of items, it assumes you want the list of items that were on the command line.

On the left of shift, we have “$searchFor =”. In Perl, as in many scripting languages, the “=” is used to assign values to variables. Variables that can contain individual values (such as the word “Yellow” or the phrase “Voices in my head”) are called scalars in Perl, and they always begin with a dollar sign. So this line takes the first item on the command line, shifts it out of the list of items on the command line (so that it is no longer in that list) and assigns it to the variable called $searchFor.

Let’s talk a little more about regular expressions first, though, because they are so useful. Do a search for “Voices in my head”:

./show "Voices in my head" songs.txt

On the command line, “arguments”—the things you pass to your scripts—are separated by spaces. If you want those spaces to be part of your argument, you need to surround the argument with quotes. If we didn’t have the quotes around “Voices in my head”, Perl would think we wanted to search for Voices in four files: “in”, “my”, “head”, and “songs.txt”.

However, the show script still doesn’t show anything. Try:

./show Voices songs.txt

There are several songs, and one of them is the one we’re looking for. However, it is slightly different from what we typed: it has upper case letters where we typed lower case letters.

Rather than have to remember the exact case for every song title, we can tell Perl to ignore the case of what we’re looking for. We can tell it to be case insensitive. It’s very simple: just add an “i” after the final slash on the print line:

print if /$searchFor/i;

Once you’ve done that, repeat the “Voices in my head” search, and you will see the one song that has that phrase in its title.

That’s the form for regular expressions: a slash, something to search for, another slash, and single-letter options to modify the search. Regular expressions will get a lot more complicated than that as you learn more Perl, but that’s the basic form that they will take.

  1. Indentation
  2. The basic Perl filter
  3. Splitting and printing