Welcome to
The DFN Weekly

Perl Pipelining

By
Old Tom
OT Scripts Industrial Strength®

Have you ever had to read someone else's Perl CGI script, but the programmer got so darn tricky that you can't follow what's happening? Perhaps I can help! Let me teach you a technique that I call pipelining. Once you understand the concept, you'll find it much easier to understand what a given Perl script is doing. And, if you write Perl scripts yourself, you just might learn a new trick or two!

As an example, let's look at a script that searches for all of the folders in your domain. For example, your domain might include:

http://www.domain.com/banners/
http://www.domain.com/images/
http://www.domain.com/members/
http://www.domain.com/members/paid
http://www.domain.com/members/leeches
http://www.domain.com/bbw/
http://www.domain.com/mature/
http://www.domain.com/mature/red
http://www.domain.com/mature/white
http://www.domain.com/mature/blue
http://www.domain.com/mature/blue/nekkid
http://www.domain.com/skanks/

Your domain root contains the six folders banners, images, members, bbw, mature, and skanks. members, in turn, contains paid and leeches. mature has its own subfolders and sub-sub-folders. Our goal is to construct a list of all of the URLs which are folders.

One way to do this - and perfectly reasonable - is to do everything one item at a time. Scan through the server, checking every file and folder, one at a time. If it's a folder, figure out that folder's URL, and add it to the list. Then go to the next item on the server, and the next, and so on.

Instead, we'll do things a bit differently. We slurp up the entire contents of the folder at once. Here's the procedure:

  1. Slurp up the entire contents of the folder. This becomes the start of our pipeline. Remember, we have slurped up both files (such as image.jpg), and folders (such as paid).
  2. Drop all the hidden files (i.e., anything beginning with '.') out of the pipeline. If your server is configured correctly, those should never be valid URLs anyway.
  3. Narrow the list down to just folders. Anything that isn't a folder (namely a regular file) gets dropped out of the pipeline.
  4. Transform the folder name into folder/subfolder form. For example, paid becomes members/paid, and nekkid becomes members/blue/nekkid.
  5. Transform the pipeline into sorted order.

Okay, we know how to slurp a pipeline, but we're far from done. The problem is that we can only slurp one folder at a time. We can slurp the entire folder - that was the point of the slurp - but we can still do only one entire folder at a time. Here is a procedure for slurping all of the folders in turn:

  1. Start our list of folders to be slurped with '.', the current folder.
  2. Take the next folder to be slurped off the front of the list.
  3. Add this folder to our list of final results.
  4. Slurp up the contents of that folder, adding the pipeline results to the end of the list.
  5. Continue taking the next folder to slurp off the front of the list, and adding the pipelined results to the end of the list, until the list is empty. When the list is empty, we have found all the folders. (Pretty tricky!)

Now that we have found everything, we have one pipeline to go. We need to print the results as URLs, one per line:

  1. The entire final results list (which is already sorted) becomes our pipeline.
  2. Add 'http://www.domain.com/' to the front of each item on the list.
  3. Add '/' to the end of the URL, and '\n' to tell Perl to start a new output line.
  4. Print the pipelined result.

The result looks like this:
http://www.domain.com/banners/
http://www.domain.com/bbw/
http://www.domain.com/images/
http://www.domain.com/mature/
http://www.domain.com/members/
http://www.domain.com/skanks/
http://www.domain.com/mature/blue/
http://www.domain.com/mature/red/
http://www.domain.com/mature/white/
http://www.domain.com/members/leeches/
http://www.domain.com/members/paid/
http://www.domain.com/mature/blue/nekkid/

Here's what the slurp pipeline looks like in Perl.

  1. readdir D; Read the folder contents.
  2. grep !/^\./, Remove the hidden files from the pipeline, i.e., everything beginning with '.'.
  3. grep { -d "$dir/$_" } Narrow the pipeline down to just the items which are folders. The '-d' asks, "Is it a directory?"; "directory" is the unix/linux term for "folder". The variable "$dir" is the folder we are slurping, and "$_" is the item inside the folder that we are considering. "$_" is the special Perl way of saying "the current item in the pipeline".
  4. map { "$dir/$_" } We need to know both the folder and the subfolder, in order to create the URL, so transform the pipeline into just that.
  5. sort Sort the result.

That whole pipeline becomes one single line of Perl: sort map { "$dir/$_" } grep { -d "$dir/$_" } grep !/^\./, readdir D; Reading left to right, that says "Sort the result of combining the folder/subfolder of each item that's a folder of each item that's not a hidden folder of the entire directory contents." If you just try to grind through, it sounds obscure and convoluted, to say the least!

But if you understand it's a pipeline, you'll be in good shape. Understand that each "portion" of the pipeline is either narrowing things down, or combining/transforming the results as they pass through the pipe. Note that the pipe flows from right to left. You could read that line as "Slurp up the folder contents, then weed out the hidden files, then narrow it down to just subfolders, then write them out as folder/subfolder, and sort the result." It's still a mouthful - but that's to be expected since you're doing so much in one single pipeline.

Here is the complete Perl script. The listing below has the pipeline spanning several lines, but that's just so it fits within the column on this page:

#! /usr/bin/perl -w
use strict;

sub dirs {
  my $dir = shift;
  opendir(D,$dir) or
    die "Cannot read dir $dir ($!)";
  sort
    map { $dir eq '.' ?
	    $_ : "$dir/$_" }
      grep { -d "$dir/$_" }
	grep !/^\./, readdir D;
}

my @dirs = qw( . );
my @found = ();

while(my $dir = shift @dirs) {
  push @found, $dir unless $dir eq '.';
  push @dirs, dirs($dir);
}

print map { 'http://www.domain.com/'
	      . $_ . "/\n" } @found;


Old Tom

The DFN Weekly Staff
Magus ... Chief Editor
VNWR Staff
Voltar ... President - Old Tom ... Vice President
Jojasa ... Vice President - LadyB ... Vice President

   Next Page

© 2001-2002 EA Ventures. All rights reserved.