Remember that search for “stand” that topped the list with a bunch of older artists? Try that search again without asking for a summary and most of them don’t have “stand” anywhere in the song, artist, or album.
The genre for those songs is Standards. Ask for raw format and you’ll see that. Our search is searching through the entire line, both the stuff we can see and the stuff we can’t.
Currently, our script searches everything for the text we specify. It would be nice to be able to focus our search on just the artist, just the album, or just the song. This way, we can search for songs about Yellow without songs about Yellow without picking up albums that mention Yellow.
If we want to search for all songs that mention “yellow” by an artist whose name contains “joni”, we might use:
./show -artist Joni -song yellow songs.txt
The first step to doing this is to add artist, song, and album to the list of switches.
First, make a list of valid fields to search in:
#options for fields to search in
@validFields = ("artist", "album", "song");
$validFields = englishJoin(", ", "and", @validFields);
Second, add another elif to the switches area:
} elsif (grep(/^$switch$/, @validFields)) {
if ($searchText = shift) {
$searches{$switch} = $searchText;
} else {
print "\nSearching in $switch requires text to search on.\n\n";
help();
exit;
}
We’re storing the search text in an associative array whose key is the field we want to search on.
Because we are now going to be doing multiple searches, we’re going to want a subroutine to do the search. Otherwise, we’ll have to duplicate the “if ($sensitive)” lines for each field we want to search on:
sub match {
my($searchIn) = shift;
my($searchFor) = shift;
my($matched) = 0;
if ($sensitive) {
$matched = $searchIn =~ /$searchFor/;
} else {
$matched = $searchIn =~ /$searchFor/i;
}
return $matched;
}
Change “if ($searchFor = shift) {“ to:
if (%searches) {
Instead of expecting some search text, we’re now checking to see if at least one of the searches has been specified. The if block will only be performed if the associative array called “searches” exists and isn’t empty.
And finally, replace the “if ($sensitive)” blocks with:
foreach $searchField (keys %searches) {
$needle = $searches{$searchField};
$haystack = $$searchField;
$matched = match($haystack, $needle);
last if !$matched;
}
Go to the command line and type:
./show --album yellow --song girl songs.txt
You should get back three songs. The albums “Mellow Yellow” and “Goodbye Yellow Brick Road” both contain at least one song whose title contains “yellow”.
First, we assign the number ‘1’ to the variable $matched. By default, we’re assuming that we found a match.
Next, we loop through each field for which we want to search for text. For each such field:
1. We pull the text we’re looking for back out of the “searches” associative array, and assign that text to the variable $needle.
2. We grab the haystack--the text of the current field, that we want to search through, through a little trick called dereferencing a symbolic reference. Imagine that we are searching for an artist. The %searches array contains “artist” as the key and “some text” as the value. So, $searchField will be “artist”. Now, look up above and see that we have a variable called $artist. If $searchField is “artist”, then $$searchField is the same as $artist. So when we say $haystack = $$searchField, this is the same as saying $haystack = $artist.
3. We set $matched to whether or not $needle can be found in $haystack. If the needle can’t be found, $matched will be false.
4. If $matched is false, there is no need to go any further, so the last line exits if !$matched.
5. At the end of this loop, $matched is either true or false. If it is true, this track matched our search. Otherwise it did not. It failed at least one of the searches requested on the command line.
If $matched can go through all three checks without becoming zero, that means that this song matches our search. Remember that some checks will be skipped, and thus not affect $matched.
Go ahead and play around with some searches. You can find all of the Elton John songs about girls on albums about yellow, with:
./show --album yellow --song girl --artist "Elton John" songs.txt
All of the Elton John songs about girls can be found with:
./show --song girl --artist "Elton John" songs.txt
And, of course, don’t forget to add a line to the help for this item! You’ll need to change the top item:
print "Syntax: show [options] [song files]\n";
And add a few lines to the bottom:
print "\t--$validFields <searchtext>: search in the $validFields field\n";
print "At least one of the search requests must be specified.\n";
That’s it!
Symbolic references can be taken to any level. If $key contains “artist”, $artist contains “Baez”, and $Baez contains “Joan”, then $$key is the same as $artist which is the same as “Baez”. $$$key is the same as $$artist which is the same as $Baez which is the same as “Joan”. Symbolic references are a powerful tool, but can easily make your script confusing. Use them carefully.
Now that we have an array of valid fields to search through, it’s easy enough to add new ones. Go ahead and add “genre” to the list of valid fields:
@validFields = ("artist", "album", "song", "genre");
At the moment, the script doesn’t know about genre, so let’s tell the script about all of the fields the file has. Change the “split” lines to:
#split out the song information
($song, $duration, $artist, $album, $year, $rating, $ripdate, $track, $genre) = split("\t");
That’s it. Our script can now limit searches on genre as well as on artist, album, or song:
./show --artist linda --genre standard songs.txt
./show --genre video songs.txt
./show --genre spoken --song senator songs.txt
If you don’t get dereferencing, go back and take another look at it, because we’re going to do a different kind of dereferencing here. Arrays can have multiple dimensions. So far, all of the arrays we’ve used have had a single dimension: our simple arrays have been a list of single items, and our associative arrays have been simple sets of keys and values. But arrays can have rows and columns much like a spreadsheet; they can even mix simple arrays in one column with associative arrays in others.
Adding a sort switch is pretty easy. We’ll want to be able to sort on any valid field, so we can re-use @validFields for this purpose.
} elsif ($switch eq "sort") {
$sortby = shift;
if (!grep(/^$sortby$/, @validFields)) {
print "\nI can only sort by $validFields.\n\n";
help();
exit;
}
} else {
Because we want to sort the results, we can’t just print out each line as soon as we reach it. We’ll need to save it for later. Replace the print for raw format with:
$text = $_;
Replace the print for html format with:
$text = "<tr><td>$song</td><td>$album</td><td>$artist</td></tr>\n";
Replace the print for simple format with:
$text = "$song ($album, by $artist)\n";
As a test, you might run the script now; you should see nothing, because we aren’t printing anything anymore.
After the section the sets the text (and the used to print the text) add:
#store or print the display text and the sort text
if ($sortby) {
$matches[$#matches+1]{'text'} = $text;
$matches[$#matches]{'sort'} = $$sortby;
} else {
print $text;
}
So, if $sortby exists and has something in it we store the $text we just set for later sorting. If we aren’t going to be sorting it, we just print it out now. The interesting part is how we remember this text. We have to remember not only the text we want to display, but also the text we want to sort on.
$matches[$#matches+1]{'text'} = $text;
$matches[$#matches]{'sort'} = $$sortby;
The first line remembers the text. We’re setting up an @matches array that will contain this information. This will be a simple array: it will simply be a list of items that goes from 0 on up to however many we find. For a simple array, recall that $#arrayname is the current top item. This means that $#arrayname+1 is the next empty item. That’s what we’re setting right here: the next empty item in @matches is getting a new item.
That new item is, rather than a scalar variable, an associative array. The first association in that array will be between the word “text” and the display text we want to remember.
The second line remembers what we’re sorting by. Here, we only use $#matches, because the topmost item is the one we want: the previous line added a new item to @matches, and we want to add a new association to the associative array we put there.
We associate the word “sort” with the value of the field we want to sort by. This, again, is a symbolic dereference. If $sortby contains “genre”, $$sortby will be $genre.
So if the first matching song is “Sleeping Bag” by “ZZ Top” from the album “Afterburner”, and we are sorting by song, the first item in @matches ($matches[0]) will be an associative array associating “text” with “Sleeping Bag (Afterburner, by ZZ Top)” and associating “sort” with “Sleeping Bag”.
We’re almost ready to try it. Just to make sure we’re on the same page, here is the entire “if ($matched)” section:
if ($matched) {
$matches++;
if ($format eq "raw") {
$text = $_;
} elsif ($format eq "html") {
$text = "<tr><td>$song</td><td>$album</td><td>$artist</td></tr>\n";
} elsif ($format eq "summary") {
$artists{$artist}++;
} else {
$text = "$song ($album, by $artist)\n";
}
#store or print the display text and the sort text
if ($sortby) {
$matches[$#matches+1]{'text'} = $text;
$matches[$#matches]{'sort'} = $$sortby;
} else {
print $text;
}
}
All that’s left is to sort and display the matches. But in order to sort the matches, we need a subroutine that we can hand to sort, that knows how to sort matches.
sub byCustom {
return $$a{'sort'} cmp $$b{'sort'};
}
When sort calls a subroutine, it does not pass arrays. If the item it is passing is an array, it passes a hard reference to an array. Just as with symbolic references, we need to dereference a hard reference in order to get at its value.
Here, $a and $b are going to be hard references to an associative array, because each item in @matches is an associative array, and we want to sort @matches. Since we want to sort on the text that is associated with the word ‘sort’ in the associative array, we dereference each array and then ask for the value associated with “sort” in that array.
Remember that “cmp” is the text equivalent of “<=>”.
We can also dereference such a reference and get an associative array back by using %$a or %$b. For example, “%leftside = %$a” would make %leftside be a normal associative array that we could get keys from or pull values from as normal.
So, now we have our sort routine. We can finally sort and display our matches.
We already have a place outside of the while that is displaying stored information: there’s an “if (%artists)” block. At the end of that block, and an “elsif” block:
} elsif (@matches) {
@matches = sort byCustom @matches;
foreach $match (@matches) {
print $$match{'text'};
}
}
We sort @matches, assign the sorted array back to @matches, and then go through @matches for each item it contains. Arrays in Perl do not really contain arrays. They contain hard references to arrays. So we have to dereference that hard reference in order to get the value associated with “text” that we want to display.
./show --song shoes songs.txt
./show --song shoes --sort song songs.txt
./show --song shoes --sort artist songs.txt
The first one should show about ten songs that mention “shoes” in the title. The second one should show the same songs, but sorted by song title. The third shows the same songs sorted by artist name.
Try this:
./show --artist "Elton John" --sort song songs.txt
Looks like we’re not quite done yet. This is sorted by song, but it’s putting the upper-case songs first, and the lower-case songs second. First it sorts through A to Z and then a to z.
This is easy enough to fix. We need to make the comparison in the byCustom subroutine not care about upper or lower case. The easiest way to do this is to make the text be all lower case (or all upper case). There is a function for this: lc(“text”) will convert that text to all lower case. Change the byCustom subroutine to:
sub byCustom {
if ($sensitive) {
return $$a{'sort'} cmp $$b{'sort'};
} else {
return lc($$a{'sort'}) cmp lc($$b{'sort'});
}
}
Now, by default sorts will not care about case, but if we specify -case sorts will be case sensitive:
./show --artist "Elton John" --sort song songs.txt
./show --artist "Elton John" --sort song -case songs.txt
And add this to the help:
print "\t--sort <$validFields>: sort by specified field\n";
Go ahead and look up songs from the album “4”:
./show --album 4 songs.txt
You’ll end up getting about 100 songs from all albums that include the number “4” in the album title. Currently, our searches look for the search text anywhere in the album name, song title, or artist name. What if we specifically want only the albums with that exact name? Let’s add a switch called “exact”:
} elsif ($switch eq "exact") {
$exact = 1;
We can implement this immediately after “if (%switches) {“:
#the first item on the command line is what we're searching for
#if we're looking for exact matches, set them up ahead of time
if ($exact) {
foreach $search (keys %searches) {
$searchText = $searches{$search};
$searches{$search} = "^$searchText\$";
}
}
We done all of this before except for the “\$”. It just goes through the keys of %searches, and adds “^” to the beginning of the search text and “$” to the end. But within Perl texts the dollar sign means something special. It means replace this with the variable whose name follows. It doesn’t matter that the variable that follows doesn’t exist, because Perl brings variables into existence the moment they’re used.
So, what we do is “backquote” the dollar sign. A backquote in front of a special character tells Perl not to interpret the special character, but rather to leave it as is. You can even backquote backquotes: "\\n" will not be a new line, it will be a backquote and the letter “n”.
You’ll do the same if you need to put a double-quote inside of double-quoted text. Backquote the “"” character. Instead of ending the text, Perl will insert the “"” into the text at that point:
$mobster = "Johnny \"Ratface\" Martin";
Don’t forget to add it to the help:
print "\t--exact: the search text must match exactly\n";
As an exercise, you might consider adding a “beginswith” and an “endswith” switch, to match albums, songs, and artists that begin or end with a specific text.
This script is beginning to be useful. You should start thinking about the data you work with on a regular basis, and how these techniques could automate what you have to do to this data. Scripts like this can easily be set to run automatically through the use of cron or similar tools.
Anyway, here is the script so far:
#!/usr/bin/perl
#Search for songs in a file of the following tab-separated data:
# title, duration, artist, album, year, rating, rip date, track position, genre
#options for the --format switch
@validFormats = ("raw", "simple", "html", "summary");
$validFormats = englishJoin(", ", "or", @validFormats);
#options for fields to search in
@validFields = ("artist", "album", "song", "genre");
$validFields = englishJoin(", ", "and", @validFields);
#strip off the command-line switches and act on or remember them
while ($ARGV[0] =~ /^--(.+)/) {
$switch = $1;
#pull this switch off of the front of the list
shift;
#if they ask for help, do it and exit
if ($switch eq "help") {
help();
exit;
} elsif ($switch eq "case") {
$sensitive = 1;
} elsif ($switch eq "reverse") {
$reverse = 1;
} elsif ($switch eq "limit") {
$limit = shift;
if ($limit !~ /^[1-9][0-9]*$/) {
print "\nYou must limit to a number, such as '33' or '2'.\n\n";
help();
exit;
}
} elsif ($switch eq "format") {
$format = shift;
if (!grep(/^$format$/, @validFormats)) {
print "\nFormat must be $validFormats.\n\n";
help();
exit;
}
} elsif (grep(/^$switch$/, @validFields)) {
if ($searchText = shift) {
$searches{$switch} = $searchText;
} else {
print "\nSearching in $switch requires text to search on.\n\n";
help();
exit;
}
} elsif ($switch eq "sort") {
$sortby = shift;
if (!grep(/^$sortby$/, @validFields)) {
print "\nI can only sort by $validFields.\n\n";
help();
exit;
}
} elsif ($switch eq "exact") {
$exact = 1;
} else {
print "\nI do not understand the option '$switch'.\n\n";
help();
exit;
}
}
#the first item on the command line is what we're searching for
if (%searches) {
#if we're looking for exact matches, set them up ahead of time
if ($exact) {
foreach $search (keys %searches) {
$searchText = $searches{$search};
$searches{$search} = "^$searchText\$";
}
}
while (<>) {
#split out the song information
($song, $duration, $artist, $album, $year, $rating, $ripdate, $track, $genre) = split("\t");
foreach $searchField (keys %searches) {
$needle = $searches{$searchField};
$haystack = $$searchField;
$matched = match($haystack, $needle);
last if !$matched;
}
#reverse the match if we want non-matching lines
if ($reverse) {
$matched = !$matched;
}
#print the information if this line is one we want
if ($matched) {
$matches++;
if ($format eq "raw") {
$text = $_;
} elsif ($format eq "html") {
$text = "<tr><td>$song</td><td>$album</td><td>$artist</td></tr>\n";
} elsif ($format eq "summary") {
$artists{$artist}++;
} else {
$text = "$song ($album, by $artist)\n";
}
#store or print the display text and the sort text
if ($sortby) {
$matches[$#matches+1]{'text'} = $text;
$matches[$#matches]{'sort'} = $$sortby;
} else {
print $text;
}
}
last if $limit && $matches >= $limit;
}
if (%artists) {
@artists = keys %artists;
@artists = sort byArtistCount @artists;
foreach $artist (@artists) {
$artistCount = $artists{$artist};
print "$artist: $artistCount\n";
}
} elsif (@matches) {
@matches = sort byCustom @matches;
foreach $match (@matches) {
print $$match{'text'};
}
}
} else {
help();
}
#describe how this script is used
sub help {
print "Syntax: show [options] [song files]\n";
print "\tSearch for some text in the song file. If no song file is specified\n";
print "\t'show' will expect it on standard input.\n";
print "\tA song file is a tab-delimited file with:\n";
print "\ttitle, duration, artist, album, year, rating, rip date, track position, genre\n";
print "\t--help: print this help text\n";
print "\t--case: be sensitive to upper and lower case\n";
print "\t--reverse: filter out songs that contain the search text\n";
print "\t--limit x: limit to x results\n";
print "\t--format <$validFormats>: choose format for results\n";
print "\t--$validFields <searchtext>: search in the $validFields field\n";
print "\t--sort <$validFields>: sort by specified field\n";
print "\t--exact: the search text must match exactly\n";
print "At least one of the $validFields search requests must be specified.\n";
}
sub byArtistCount {
return $artists{$b} <=> $artists{$a};
}
sub englishJoin {
my($punctuation) = shift;
my($conjunction) = shift;
my(@items) = @_;
my($joined, $finalItem);
if ($#items == -1) {
$joined = "";
} elsif ($#items == 0) {
$joined = $items[0];
} elsif ($#items == 1) {
$joined = "$items[0] $conjunction $items[1]";
} else {
$finalItem = pop(@items);
$joined = join($punctuation, @items) . "$punctuation$conjunction $finalItem";
}
return $joined;
}
sub match {
my($searchIn) = shift;
my($searchFor) = shift;
my($matched) = 0;
if ($sensitive) {
$matched = $searchIn =~ /$searchFor/;
} else {
$matched = $searchIn =~ /$searchFor/i;
}
return $matched;
}
sub byCustom {
if ($sensitive) {
return $$a{'sort'} cmp $$b{'sort'};
} else {
return lc($$a{'sort'}) cmp lc($$b{'sort'});
}
}