Creating files: Timestamps

  1. Try to break it
  2. Creating files
  3. The current script

Some data is time-sensitive. The file came in at a specific time, and you want the exported files to keep that timestamp. Under Unix, you can see a file’s last modified time using “ls -l”. If you look at songs.txt you’ll probably see that it was last modified on April 25, 2005. If you look at the export files you’ve created, their last modified time is today, or the day you exported them.

First, add the switch:

} elsif ($switch eq "keep-time") {
$keepTime = 1;

and then the help:

print "\t--keep-time: keep the input file's timestamp on any exported files\n";

If we’re going to stamp the files we create so that they have the same timestamp as the file the data came from, we need to get the timestamp of that file. So far we haven’t cared what file that is. In fact, our script is designed to allow multiple files to be specified on the command line. We might imagine exporting raw artist files of all Rock songs, for example, and then searching through the files for multiple artists.

So the first step is purely on our part, with no coding. If more than one file is specified, what is the correct timestamp? Do we want the most recent one? The oldest one? Some sort of average? I’m going to assume that we want the most recent one.

The second problem is that in order to get the timestamp for a file, we need to know the file’s name. So far we haven’t cared. We’ve let Perl handle the file input for us. Fortunately, there’s no need to change that. Perl can also tell us the name of the current file. When a script loops through file input, Perl puts the current filename in a special variable called $ARGV.

Below the “if ($matched) {” line, add:

if ($keepTime) {
@fileInfo = stat($ARGV);
$fileMod = $fileInfo[9];
$lastModified = $fileMod if $fileMod > $lastModified;
}

Simple enough. If $keepTime has something in it, we grab the information for the file called $ARGV. The stat() function returns a bunch of information about a file; we want the ninth piece. That’s the last modified time of the file.

Then, we set $lastModified to be this file’s modification time if $fileMod is larger than (more recent than) the current $lastModified. The first time around, $lastModified has nothing in it, so anything will be greater than it. After that, $lastModified only gets changed if the current file is newer than the previous newest file.

One minor problem with this is that it is checking the current file every time we go through the loop. File system access is usually very fast, but if we’re exporting thousands of records from a handful of files that’s thousands of stat calls we don’t really need. What we can do is keep track of the filename, and only get the last modified when $ARGV no longer matches the previously current filename:

if ($keepTime && $lastFile ne $ARGV) {
@fileInfo = stat($ARGV);
$fileMod = $fileInfo[9];
$lastModified = $fileMod if $fileMod > $lastModified;
$lastFile = $ARGV;
}

So, now we have the timestamp we need, we just need to set each file to have that timestamp. The easiest place to handle this is after we close each file. The script already goes through each file one by one to close it. We can set the last modified time during that loop. Change the entire “#close all open files” section to:

#close all open files
foreach $filename (keys %files) {
$filehandle = $files{$filename};
close($filehandle);
utime($lastModified, time(), $filename);
}

Instead of just grabbing the values (file handles) out of %files, we need the keys as well. The keys are the filenames. So, we grab the keys and then grab the values using the key as normal. We close $filehandle just as we always did, and then we run utime on $filename. Each file has two times that are commonly used: the last modified time and the last accessed time. The utime function requires both of them, so we’ll set the first one (the last modified time) to the saved $lastModified from the input file(s). We’ll set second (last access time) to the current time, since that’s when the file was last accessed.

  1. Try to break it
  2. Creating files
  3. The current script