From: Randy Terbush Date: Sun, 1 Dec 1996 17:07:17 +0000 (+0000) Subject: Relocated missing file. X-Git-Tag: APACHE_1_2b1~12 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=5e0a0770e5ca48908b1b237ed39d0e9bbc414882;p=apache Relocated missing file. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@77120 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/docs/manual/misc/howto.html b/docs/manual/misc/howto.html new file mode 100644 index 0000000000..72c2d09ed5 --- /dev/null +++ b/docs/manual/misc/howto.html @@ -0,0 +1,124 @@ + + + +Apache HOWTO documentation + + + + +

Apache HOWTO documentation

+ +How to: +

redirect an entire server or directory +
reset your log files +
stop robots +

+ +

How to redirect an entire server or directory

+ +One way to redirect all requests for an entire server is to setup a +Redirect to a cgi script which outputs a 301 or 302 status +and the location of the other server.

+ +By using a cgi-script you can intercept various requests and treat them +specially, e.g. you might want to intercept POST requests, so that the +client isn't redirected to a script on the other server which expects POST +information (a redirect will lose the POST information.)

+ +Here's how to redirect all requests to a script... In the server configuration +file, +

ScriptAlias / +/usr/local/httpd/cgi-bin/redirect_script

+ +and here's a simple perl script to redirect + +

+#!/usr/local/bin/perl + +print "Status: 302 Moved Temporarily\r +Location: http://www.some.where.else.com/\r\n\r\n"; + +

+ +

How to reset your log files

+ +Sooner or later, you'll want to reset your log files (access_log and +error_log) because they are too big, or full of old information you don't +need.

+ +access.log typically grows by 1Mb for each 10,000 requests.

+ +Most people's first attempt at replacingthe logfile is to just move the +logfile or remove the logfile. This doesn't work.

+ +Apache will continue writing to the logfile at the same offset as before the +logifile moved. This results in a new logfile being created which is just +as big as the old one, but it now contains thousands (or millions) of null +characters.

+ +The correct procedure is to move the logfile, then signal Apache to tell it to +reopen the logfiles.

+ +Apache is signalled using the SIGHUP (-1) signal. e.g. +

+mv access_log access_log.old ; kill -1 `cat httpd.pid` +

+ +Note: httpd.pid is a file containing the process id +of the Apache httpd daemon, Apache saves this in the same directory as the log +files.

+ +Many people use this method to replace (and backup) their logfiles on a +nightly basis.

+ +

How to stop robots

+ +Ever wondered why so many clients are interested in a file called +robots.txt which you don't have, and never did have?

+ +These clients are called robots - special automated clients which +wander around the web looking for interesting resources.

+ +Most robots are used to generate some kind of web index which +is then used by a search engine to help locate information.

+ +robots.txt provides a means to request that robots limit their +activities at the site, or more often than not, to leave the site alone.

+ +When the first robots were developed, they had a bad reputation for +sending hundreds of requests to each site, often resulting in the site +being overloaded. Things have improved dramatically since then, thanks +to Guidlines +for Robot Writers, but even so, some robots may exhibit unfriendly +behaviour which the webmaster isn't willing to tolerate.

+ +Another reason some webmasters want to block access to robots, results +from the way in which the information collected by the robots is subsequently +indexed. There are currently no well used systems to annotate documents +such that they can be indexed by wandering robots. Hence, the index +writer will often revert to unsatisfactory algorithms to determine what gets +indexed.

+ +Typically, indexes are built around text which appears in +document titles (<TITLE>), or main headings (<H1>), and more +often than not, the words it indexes on are completely irrelevant or +misleading for the docuement subject. The worst index is one based on +every word in the document. This inevitably leads to the search engines +offering poor suggestions which waste both the users and the servers +valuable time

+ +So if you decide to exclude robots completely, or just limit the areas +in which they can roam, set up a robots.txt file, and refer +to the robot +exclusion documentation.

+ +Much better systems exist to both index your site and publicise its +resources, e.g. +ALIWEB, which +uses site defined index files.

+ + + +