From 3449950949b16c90fafd51727ba7a2b2b3b360bc Mon Sep 17 00:00:00 2001
From: Rich Bowen
+ In this recipe, we discuss how to block persistent requests from + a particular robot, or user agent.
+ +The standard for robot exclusion defines a file,
+ /robots.txt
that specifies those portions of your
+ website where you which to exclude robots. However, some robots
+ do not honor these files.
+
Note that there are methods of accomplishing this which do
+ not use mod_rewrite. Note also that any technique that relies on
+ the clients USER_AGENT
string can be circumvented
+ very easily, since that string can be changed.
We use a ruleset that specifies the directory to be
+ protected, and the client USER_AGENT
that
+ identifies the malicious or persistent robot.
In this example, we are blocking a robot called
+ NameOfBadRobot
from a location
+ /secret/files
. You may also specify an IP address
+ range, if you are trying to block that user agent only from the
+ particular source.
+RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot +RewriteCond %{REMOTE_ADDR} =123\.45\.67\.[8-9] +RewriteRule ^/secret/files/ - [F] +
+ Rather than using mod_rewrite for this, you can accomplish the + same end using alternate means, as illustrated here: +
+
+ SetEnvIfNoCase User-Agent ^NameOfBadRobot goaway
+ <Location /secret/files>
+ Order allow,deny
+ Allow from all
+ Deny from env=goaway
+
+ As noted above, this technique is trivial to circumvent, by simply
+ modifying the USER_AGENT
request header. If you
+ are experiencing a sustained attack, you should consider blocking
+ it at a higher level, such as at your firewall.
+
Available Languages: en