To Catch a Thief

by

Pasta

Pasta's Pissers

Controlling The Thief

I believe it was the beginning of August I posted an Htaccess file for site rippers (a Cool DFN link of the week). It was a learning thread for me. There were concerns of do I really want to block browsers and spiders to my site. I did, but I also took what people mentioned into consideration. I did my own investigating. I visited search engines to find what their bot names were for starters. Took the working htaccess saved it as htaccess.old. Made it easier to do live edits on my htaccess file when I wasn't home. Basically, you will just add rewrite conditions blocking a particular user agent in your htaccess, simple enough. It's not! First and foremost it takes some up-front research. Research your current monthly stats and the previous month and look for a trend in occurrences. Show of hands, do you want something called Siphon visiting your site? Search on Siphon and read about it you would block it also, wouldn't you? The reason for me going on a little mission to gather information, I didn't want these proggies in my stats, period. Here is and example of the rewrite condition lines you will be adding to your htaccess file if you want to do some research:

RewriteCond %{HTTP_USER_
AGENT} ^.*Siphon.*$ [OR]

Pasta's Cool DFN Link Of The Week

How You Doin?
Working Site Ripper Htaccess

"He is happiest, be he king or peasant, who finds peace in his home."
Johann von Goethe

How To Go About It

Take a look at the user line above, after the user agent carrot, period, asterisk, title, period, asterisk. I started with Billy's Htaccess, I took the rewrite condition and it didn't work, it locked my site down hehehe. As a matter of fact, all the rewrite conditions I used locked my site down. Frustrated as hell, and my site was being downloaded more than once daily until I found the DFN cool link of the week. It works, I will attest to that. That is half the battle, getting a working file, it's done already.

You need to be able to interpret the information in your stats. What I did was to find a list of known good spiders, cut and pasted that list into a text file placed that on my server for easy access from anywhere. You have to monitor your stats, in particular the user agents. I checked at least twice during the day until I had reasonable control over the situation, I opted before work mid morning, and during lunch mid afternoon. To minimize exposure during the day. When some evil looking user agent appeared, I checked it out. Searched for the proggie and read what it had to say. Don't Like it? Open your htaccess file, open the text file and search the page if it's not there add another rewrite condition. I know this file is preventing the rippers I have researched and blocked. My dialer minutes have gone up, instances of each blocked proggie stays at "1" in my stats and that's it. Prior to Using my modified htaccess the instances were 10 or greater for the malicious looking rippers. I never let this get out of hand as I addressed it the moment I became aware of it. By all means there are other ways to prevent site leeching, this for the time being has worked for me. I am satisfied with my results...are you?

Hasta Pasta


©2001 VNWR. All rights reserved.