Relocated missing file.

author Randy Terbush <randy@apache.org>

Sun, 1 Dec 1996 17:07:17 +0000 (17:07 +0000)

committer Randy Terbush <randy@apache.org>

Sun, 1 Dec 1996 17:07:17 +0000 (17:07 +0000)
author Randy Terbush <randy@apache.org>
Sun, 1 Dec 1996 17:07:17 +0000 (17:07 +0000)
committer Randy Terbush <randy@apache.org>
Sun, 1 Dec 1996 17:07:17 +0000 (17:07 +0000)
diff --git a/docs/manual/misc/howto.html b/docs/manual/misc/howto.html

new file mode 100644 (file)

index 0000000..72c2d09
--- /dev/null
+++ b/docs/manual/misc/howto.html
@@ -0,0 +1,124 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
+<HTML>
+<HEAD>
+<TITLE>Apache HOWTO documentation</TITLE>
+</HEAD>
+
+<BODY>
+<!--#include virtual="header.html" -->
+<H1>Apache HOWTO documentation</h1>
+
+How to:
+<ul>
+<li><A HREF="#redirect">redirect an entire server or directory</A>
+<li><A HREF="#logreset">reset your log files</A>
+<li><A HREF="#stoprob">stop robots</A>
+</ul>
+
+<hr>
+<H2><A name="redirect">How to redirect an entire server or directory</A></H2>
+
+One way to redirect all requests for an entire server is to setup a
+<CODE>Redirect</Code> to a <B>cgi script</B> which outputs a 301 or 302 status
+and the location of the other server.<P>
+
+By using a <B>cgi-script</B> you can intercept various requests and treat them
+specially, e.g. you might want to intercept <B>POST</B> requests, so that the
+client isn't redirected to a script on the other server which expects POST
+information (a redirect will lose the POST information.)<P>
+
+Here's how to redirect all requests to a script... In the server configuration
+file,
+<blockquote><code>ScriptAlias /
+/usr/local/httpd/cgi-bin/redirect_script</code></blockquote>
+
+and here's a simple perl script to redirect
+
+<blockquote><code>
+#!/usr/local/bin/perl <br>
+<br>
+print "Status: 302 Moved Temporarily\r <br>
+Location: http://www.some.where.else.com/\r\n\r\n"; <br>
+<br>
+</code></blockquote><p><hr>
+
+<H2><A name="logreset">How to reset your log files</A></H2>
+
+Sooner or later, you'll want to reset your log files (access_log and
+error_log) because they are too big, or full of old information you don't
+need.<p>
+
+<CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.<p>
+
+Most people's first attempt at replacingthe logfile is to just move the
+logfile or remove the logfile. This doesn't work.<p>
+
+Apache will continue writing to the logfile at the same offset as before the
+logifile moved. This results in a new logfile being created which is just
+as big as the old one, but it now contains thousands (or millions) of null
+characters.<p>
+
+The correct procedure is to move the logfile, then signal Apache to tell it to
+reopen the logfiles.<p>
+
+Apache is signalled using the <B>SIGHUP</B> (-1) signal. e.g.
+<blockquote><code>
+mv access_log access_log.old ; kill -1 `cat httpd.pid`
+</code></blockquote>
+
+Note: <code>httpd.pid</code> is a file containing the <B>p</B>rocess <B>id</B>
+of the Apache httpd daemon, Apache saves this in the same directory as the log
+files.<P>
+
+Many people use this method to replace (and backup) their logfiles on a
+nightly basis.<p><hr>
+
+<H2><A name="stoprob">How to stop robots</A></H2>
+
+Ever wondered why so many clients are interested in a file called
+<code>robots.txt</code> which you don't have, and never did have?<p>
+
+These clients are called <B>robots</B> - special automated clients which
+wander around the web looking for interesting resources.<p>
+
+Most robots are used to generate some kind of <em>web index</em> which
+is then used by a <em>search engine</em> to help locate information.<P>
+
+<code>robots.txt</code> provides a means to request that robots limit their
+activities at the site, or more often than not, to leave the site alone.<P>
+
+When the first robots were developed, they had a bad reputation for
+sending hundreds of requests to each site, often resulting in the site
+being overloaded. Things have improved dramatically since then, thanks
+to <A HREF="http://web.nexor.co.uk/mak/doc/robots/guidelines.html"> Guidlines
+for Robot Writers</A>, but even so, some robots may exhibit unfriendly
+behaviour which the webmaster isn't willing to tolerate.<P>
+
+Another reason some webmasters want to block access to robots, results
+from the way in which the information collected by the robots is subsequently
+indexed. <B>There are currently no well used systems to annotate documents
+such that they can be indexed by wandering robots.</B> Hence, the index
+writer will often revert to unsatisfactory algorithms to determine what gets
+indexed.<p>
+
+Typically, indexes are built around text which appears in 
+document titles (&lt;TITLE&gt;), or main headings (&lt;H1&gt;), and more 
+often than not, the words it indexes on are completely irrelevant or
+misleading for the docuement subject. The worst index is one based on
+every word in the document. This inevitably leads to the search engines
+offering poor suggestions which waste both the users and the servers 
+valuable time<P>
+
+So if you decide to exclude robots completely, or just limit the areas
+in which they can roam, set up a <CODE>robots.txt</CODE> file, and refer
+to the <A HREF="http://web.nexor.co.uk/mak/doc/robots/norobots.html">robot
+exclusion documentation</A>.<p>
+
+Much better systems exist to both index your site and publicise its
+resources, e.g.
+<A HREF="http://web.nexor.co.uk/public/aliweb/aliweb.html">ALIWEB</A>, which
+uses site defined index files.<p>
+
+<!--#include virtual="footer.html" -->
+</BODY>
+</HTML>
author	Randy Terbush <randy@apache.org>
	Sun, 1 Dec 1996 17:07:17 +0000 (17:07 +0000)
committer	Randy Terbush <randy@apache.org>
	Sun, 1 Dec 1996 17:07:17 +0000 (17:07 +0000)