Make sure you’ve downloaded the sample data, and then create the following text file:
#!/usr/bin/perl
while (<>) {
print;
}
Save this file as the filename show. Once you’ve saved it, make sure that it is executable by you. Go to your command line, make sure that your are in the correct folder, and type:
chmod u+x show
On Mac OS X, you can ensure that you are in the correct folder by typing “cd” in the terminal, a space, and then dragging the folder onto your terminal window. Press return, and you will change directory into that folder.
Now, type:
./show show
The script should show you itself. Make sure that songs.txt is in the same directory as your script, and type:
./show songs.txt
The show script should show you all 7,006 lines in the songs.txt file.
This is about as simple of a Perl script as you can get. While it doesn’t do much yet, this is a shell around which you can build quite a few useful scripts.
#!/usr/bin/perl
The first line is not Perl. The first line tells the operating system what language this script is written in. More specifically, it tells the operating system which program can interpret this script.
Most shell scripts use the pound character (“#”) for comments. What it really means is that every line that begins with a pound character is ignored by the scripting language. So Perl ignores this line because it begins with a pound character, but the operating system or shell that you’re using knows to send this script off to the program called /usr/bin/perl.
If your computer didn’t come with Perl pre-installed, you may have it installed in /usr/local/bin/perl instead. Nowadays, however, most operating systems come with Perl pre-installed.
while (<>) {
}
This is a while block. The part between the parentheses is an expression and the part between the two curly brackets gets acted on for as long as that expression does not return false, empty, zero, or, basically, as long as it returns something. We can put as much stuff as we want between those curly brackets. Perl will repeat them, or loop through them, for as long as that expression gives it something to work on.
The expression we have here is “<>”. This tells Perl that we want, line by line, everything in the files that we mentioned on the command line. If we don’t mention any files on the command line, Perl takes the standard input and gives that to us line by line instead.
For example, type:
./show
When you press return, you won’t get the command line back. Because we didn’t specify any filenames on the command line, Perl is waiting for the standard input. Because we didn’t give it any, it is waiting for us to type it. Type a few lines, pressing return after each line, and you’ll see the script echo whatever you type back.
Type Control-D to exit. Then type:
echo "Now is the time for all good muskrats to come to the aid of their country" | ./show
In Unix, the vertical bar is the pipe character. Whatever is on the left gets piped through to whatever is on the right. Echo echoes text to the screen normally, but in the above command the output from echo gets piped through to the show script.
Finally:
print;
This command is perhaps the most common one in Perl. You use it to output something, either to the screen or to a file.
What is it printing? Perl often makes assumptions about what you want. When we don’t give print anything to print, Perl assumes we want to print the current line from the while loop.
So what this script does is go through every line it gets and prints it out to the screen. If you’re familiar with Unix, we’ve just reinvented the cat command.
In the above script, the print command is indented by one tab. Indentation makes it much easier to read your scripts. It is much easier to see where where blocks and other blocks begin and end if those blocks are indented. While Perl does not care about indentation, it is for all practical purposes required that you indent your blocks; if you have blocks inside of blocks, those will be indented further.
Our current script is a filter. It takes some raw data on one end, filters it, and produces modified data on the other end. You can also think of a filter as a sausage grinder. Our filter, however, is a very leaky sieve: it currently lets everything through.
One of the things that Perl does very well is to filter what it gets according to a regular expression. A regular expression is a very versatile form of searching. Change the print line to:
print if /Mellow/;
And then type:
./show songs.txt
You should see all of the songs in Donovan’s Mellow Yellow album as well as a few songs by the Mellow Men.
In Perl, everything between two slashes, like that, is a regular expression. Since we didn’t tell Perl what text we want the regular expression to apply to, it assumes we want to apply it to the current line.
Any command or function followed by “if expression” will be performed only if that expression returns something. If it returns nothing (as it will in this case when the line does not contain Mellow), that line does not get performed. In this case, that line does not get printed.
It would be annoying to have to edit our script every time we wanted to look for some text in our file. So we can modify the script to look on the command line for the text we want. Add one line to the script, so that it now reads:
#!/usr/bin/perl
$searchFor = shift;
while (<>) {
print if /$searchFor/;
}
Let’s look for all mentions of Yellow in the song listing:
./show Yellow songs.txt
You should see a bunch of songs from Elton John’s Goodbye Yellow Brick Road and Donovan’s Mellow Yellow, and a few versions of Joni Mitchell’s Big Yellow Taxi.
We’ll talk more specifically about grabbing stuff from the command line in a bit, but the shift command grabs the first item off of a list of items. If you don’t specify a list of items, it assumes you want the list of items that were on the command line.
On the left of shift, we have “$searchFor =”. In Perl, as in many scripting languages, the “=” is used to assign values to variables. Variables that can contain individual values (such as the word “Yellow” or the phrase “Voices in my head”) are called scalars in Perl, and they always begin with a dollar sign. So this line takes the first item on the command line, shifts it out of the list of items on the command line (so that it is no longer in that list) and assigns it to the variable called $searchFor.
Let’s talk a little more about regular expressions first, though, because they are so useful. Do a search for “Voices in my head”:
./show "Voices in my head" songs.txt
On the command line, “arguments”--the things you pass to your scripts--are separated by spaces. If you want those spaces to be part of your argument, you need to surround the argument with quotes. If we didn’t have the quotes around “Voices in my head”, Perl would think we wanted to search for Voices in four files: “in”, “my”, “head”, and “songs.txt”.
However, the show script still doesn’t show anything. Try:
./show Voices songs.txt
There are several songs, and one of them is the one we’re looking for. However, it is slightly different from what we typed: it has upper case letters where we typed lower case letters.
Rather than have to remember the exact case for every song title, we can tell Perl to ignore the case of what we’re looking for. We can tell it to be case insensitive. It’s very simple: just add an “i” after the final slash on the print line:
print if /$searchFor/i;
Once you’ve done that, repeat the “Voices in my head” search, and you will see the one song that has that phrase in its title.
That’s the form for regular expressions: a slash, something to search for, another slash, and single-letter options to modify the search. Regular expressions will get a lot more complicated than that as you learn more Perl, but that’s the basic form that they will take.
Our search is working nicely, but it is returning a lot more information than we really need. The data file contains quite a bit of information about our songs. It’d be nice to display it in a more useful form. To do this, we first need to break it up into its pieces.
Looking at the data, it looks like each line consists of a song title, a duration, an artist name, an album name, a year, some number from 0 to 100, a timestamp, its position in the tracks on the album, and a genre. Each of these items is separated by a tab character. Replace the print line with the following two lines:
($song, $duration, $artist, $album) = split("\t");
print "$song ($album, by $artist)\n" if /$searchFor/i;
Both of these new lines illustrate some important features of Perl, so let’s take them piece by piece:
($song, $duration, $artist, $album) = split("\t");
The “=” character means that we’re doing an assignment here. Whatever happens on the right is going to get assigned to the stuff we have on the left. The right is the split function. This function splits a piece of text into pieces, based on another piece of text. The text we’re splitting on is the tab character. In Unix, the tab is often specified using backslash-t, or “\t”. The text we’re splitting is the current line, because we didn’t tell Perl what text to split. So, if the current line has eight tabs, this is going to split it into nine pieces.
On the left, we have a list of variables. They all begin with dollar signs, so we know that they are all expecting a single piece. Perl is going to take those pieces we generated on the right and assign them, in order to the variables we’ve specified on the left. If there are more pieces on the right than on the left, the extra pieces will be ignored.
So when this line is done, we should have a variable containing the song title, duration of the song, the artist’s name, and the album’s name. We’ll ignore everything else for now.
print "$song ($album, by $artist)\n" if /$searchFor/i;
Up until now, we’ve been letting Perl print out the current line as soon as it comes through. Our new print command actually has something to print. We’re telling print to print a piece of text consisting of the variable $song, a space, a parentheses, the variable $album, a comma, a space, the word “by”, another space, the variable $artist, a closing parentheses, and then a “new line” character. Just as we saw for tabs, the new-line character has a special code beginning with a backslash: “\n”.
In Perl, most text, whether your are printing it or assigning to a variable, will be surrounded by quotes. If you surround text with double-quotes, any variables inside that text will be “interpreted” and replaced with their values. If you use single-quotes, Perl will not search the text for variables.
So, go ahead and try the new version of show:
./show Voices songs.txt
You should see, in a much more readable form, a skit by George Carlin, songs from Nanci Griffith’s Other Voices, Other Rooms, and several songs that contain the word “voices”.
I mentioned earlier that the pound sign at the beginning of a line causes Perl to completely ignore that line. This makes the pound sign a useful way of adding comments to your scripts. Comments are very important: they help you remember what you meant by this snippet of script several months or even years later when you look at the script again.
This script, for example, might be commented as follows:
#!/usr/bin/perl
#Search for songs in a file of the following tab-separated data:
# title, duration, artist, album, year, rating, rip date, track position, genre
#the first item on the command line is what we're searching for
$searchFor = shift;
while (<>) {
#split out the song, duration, artist, and album from the current line
($song, $duration, $artist, $album) = split("\t");
#print song information if this line contains our search text
print "$song ($album, by $artist)\n" if /$searchFor/i;
}
You don’t need to comment every line, but it is a good idea to comment every section. You’ll usually want to put a comment in front of any while block, or other large block of Perl lines.