From 3449950949b16c90fafd51727ba7a2b2b3b360bc Mon Sep 17 00:00:00 2001 From: Rich Bowen Date: Mon, 2 Nov 2009 22:57:44 +0000 Subject: [PATCH] Removes the 'block evil robots' rule from rewrite_guide, moves it to access, and makes it not suck. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@832175 13f79535-47bb-0310-9956-ffa450edef68 --- docs/manual/rewrite/access.html.en | 73 ++++++++++++++++++++++- docs/manual/rewrite/access.xml | 70 ++++++++++++++++++++++ docs/manual/rewrite/rewrite_guide.html.en | 39 ------------ docs/manual/rewrite/rewrite_guide.xml | 38 ------------ 4 files changed, 142 insertions(+), 78 deletions(-) diff --git a/docs/manual/rewrite/access.html.en b/docs/manual/rewrite/access.html.en index d04cf74058..cc0f1743a8 100644 --- a/docs/manual/rewrite/access.html.en +++ b/docs/manual/rewrite/access.html.en @@ -36,7 +36,78 @@ configuration.

See also

- +
top
+
+

Blocking of Robots

+ + + +
+
Description:
+ +
+

+ In this recipe, we discuss how to block persistent requests from + a particular robot, or user agent.

+ +

The standard for robot exclusion defines a file, + /robots.txt that specifies those portions of your + website where you which to exclude robots. However, some robots + do not honor these files. +

+ +

Note that there are methods of accomplishing this which do + not use mod_rewrite. Note also that any technique that relies on + the clients USER_AGENT string can be circumvented + very easily, since that string can be changed.

+
+ +
Solution:
+ +
+

We use a ruleset that specifies the directory to be + protected, and the client USER_AGENT that + identifies the malicious or persistent robot.

+ +

In this example, we are blocking a robot called + NameOfBadRobot from a location + /secret/files. You may also specify an IP address + range, if you are trying to block that user agent only from the + particular source.

+ +
+RewriteCond %{HTTP_USER_AGENT}   ^NameOfBadRobot
+RewriteCond %{REMOTE_ADDR}       =123\.45\.67\.[8-9]
+RewriteRule ^/secret/files/   -   [F]
+
+
+ +
Discussion
+ +
+

+ Rather than using mod_rewrite for this, you can accomplish the + same end using alternate means, as illustrated here: +

+

+ SetEnvIfNoCase User-Agent ^NameOfBadRobot goaway
+ <Location /secret/files>
+ Order allow,deny
+ Allow from all
+ Deny from env=goaway +

+

+ As noted above, this technique is trivial to circumvent, by simply + modifying the USER_AGENT request header. If you + are experiencing a sustained attack, you should consider blocking + it at a higher level, such as at your firewall. +

+ +
+ +
+ +

Available Languages:  en