Welcome to
The DFN Weekly
Perl Pipelining
By
Old Tom
OT Scripts Industrial Strength®
Have you ever had to read someone else's Perl CGI script, but the programmer got so darn tricky that you can't follow what's happening? Perhaps I can help! Let me teach you a technique that I call pipelining. Once you understand the concept, you'll find it much easier to understand what a given Perl script is doing. And, if you write Perl scripts yourself, you just might learn a new trick or two!
As an example, let's look at a script that searches for all of the folders in your domain. For example, your domain might include:
http://www.domain.com/banners/
http://www.domain.com/images/
http://www.domain.com/members/
http://www.domain.com/members/paid
http://www.domain.com/members/leeches
http://www.domain.com/bbw/
http://www.domain.com/mature/
http://www.domain.com/mature/red
http://www.domain.com/mature/white
http://www.domain.com/mature/blue
http://www.domain.com/mature/blue/nekkid
http://www.domain.com/skanks/
Your domain root contains the six folders banners, images, members, bbw, mature, and skanks. members, in turn, contains paid and leeches. mature has its own subfolders and sub-sub-folders. Our goal is to construct a list of all of the URLs which are folders.
One way to do this - and perfectly reasonable - is to do everything one item at a time. Scan through the server, checking every file and folder, one at a time. If it's a folder, figure out that folder's URL, and add it to the list. Then go to the next item on the server, and the next, and so on.
Instead, we'll do things a bit differently. We slurp up the entire contents of the folder at once. Here's the procedure:
- Slurp up the entire contents of the folder. This becomes the start of our pipeline. Remember, we have slurped up both files (such as image.jpg), and folders (such as paid).
- Drop all the hidden files (i.e., anything beginning with '.') out of the pipeline. If your server is configured correctly, those should never be valid URLs anyway.
- Narrow the list down to just folders. Anything that isn't a folder (namely a regular file) gets dropped out of the pipeline.
- Transform the folder name into folder/subfolder form. For example, paid becomes members/paid, and nekkid becomes members/blue/nekkid.
- Transform the pipeline into sorted order.
Okay, we know how to slurp a pipeline, but we're far from done. The problem is that we can only slurp one folder at a time. We can slurp the entire folder - that was the point of the slurp - but we can still do only one entire folder at a time. Here is a procedure for slurping all of the folders in turn:
- Start our list of folders to be slurped with '.', the current folder.
- Take the next folder to be slurped off the front of the list.
- Add this folder to our list of final results.
- Slurp up the contents of that folder, adding the pipeline results to the end of the list.
- Continue taking the next folder to slurp off the front of the list, and adding the pipelined results to the end of the list, until the list is empty. When the list is empty, we have found all the folders. (Pretty tricky!)
Now that we have found everything, we have one pipeline to go. We need to print the results as URLs, one per line:
- The entire final results list (which is already sorted) becomes our pipeline.
- Add 'http://www.domain.com/' to the front of each item on the list.
- Add '/' to the end of the URL, and '\n' to tell Perl to start a new output line.
- Print the pipelined result.



