From f88b50dec40878abeccce0e1b9cfc6780a4b1465 Mon Sep 17 00:00:00 2001
From: Rich Bowen
-Using XSSI and ErrorDocument to configure
-customized international server error responses
-Index
-
-
-
-Introduction
-This document describes an easy way to provide your apache WWW server
-with a set of customized error messages which take advantage of
-Content Negotiation
-and eXtended Server Side Includes (XSSI)
-to return error messages generated by the server in the client's
-native language.
-
-By using XSSI, all
-customized messages
-can share a homogenous and consistent style and layout, and maintenance work
-(changing images, changing links) is kept to a minimum because all layout
-information can be kept in a single file.
-Error documents can be shared across different servers, or even hosts,
-because all varying information is inserted at the time the error document
-is returned on behalf of a failed request.
-
-Content Negotiation then selects the appropriate language version of a -particular error message text, honoring the language preferences passed -in the client's request. (Users usually select their favorite languages -in the preferences options menu of today's browsers). When an error -document in the client's primary language version is unavailable, the -secondary languages are tried or a default (fallback) version is used. -
--You have full flexibility in designing your error documents to -your personal taste (or your company's conventions). For demonstration -purposes, we present a simple generic error document scheme. -For this hypothetic server, we assume that all error messages... -
-An example of a "document not found" message for a german client might
-look like this:
-
-All links in the document as well as links to the server's administrator
-mail address, and even the name and port of the serving virtual host
-are inserted in the error document at "run-time", i.e., when the error
-actually occurs.
-
src/main/http_protocol.c
- if you wish to see apache's standard messages), an
- ErrorDocument
- in the aliased /errordocs directory is defined.
- Note that we only define the basename of the document here
- because the MultiViews option will select the best candidate
- based on the language suffixes and the client's preferences.
- Any error situation with an error code not handled by a
- custom document will be dealt with by the server in the standard way
- (i.e., a plain error message in english).
- + + + + + + +International Customized Server Error Messages + + + + + + +Using XSSI and ErrorDocument to + configure customized international server error responses
+ +Index
+ +
By using XSSI, all customized messages
+ can share a homogenous and consistent style and layout, and
+ maintenance work (changing images, changing links) is kept to a
+ minimum because all layout information can be kept in a single
+ file.
+ Error documents can be shared across different servers, or
+ even hosts, because all varying information is inserted at the
+ time the error document is returned on behalf of a failed
+ request.
Content Negotiation then selects the appropriate language + version of a particular error message text, honoring the + language preferences passed in the client's request. (Users + usually select their favorite languages in the preferences + options menu of today's browsers). When an error document in + the client's primary language version is unavailable, the + secondary languages are tried or a default (fallback) version + is used.
+ +You have full flexibility in designing your error documents + to your personal taste (or your company's conventions). For + demonstration purposes, we present a simple generic error + document scheme. For this hypothetic server, we assume that all + error messages...
+ +An example of a "document not found" message for a german
+ client might look like this:
+
+ All links in the document as well as links to the server's
+ administrator mail address, and even the name and port of the
+ serving virtual host are inserted in the error document at
+ "run-time", i.e., when the error actually occurs.
src/main/http_protocol.c
if you wish to see
+ apache's standard messages), an ErrorDocument in
+ the aliased /errordocs directory is defined.
+ Note that we only define the basename of the document here
+ because the MultiViews option will select the best candidate
+ based on the language suffixes and the client's preferences.
+ Any error situation with an error code not handled
+ by a custom document will be dealt with by the server in the
+ standard way (i.e., a plain error message in
+ english).LanguagePriority en fr de Alias /errordocs /usr/local/apache/errordocs <Directory /usr/local/apache/errordocs> @@ -159,152 +201,162 @@ the contained directives- must be omitted.) ErrorDocument 404 /errordocs/404 # "500 Internal Server Error", ErrorDocument 500 /errordocs/500 --The directory for the error messages (here: -/usr/local/apache/errordocs/) must then be created with the -appropriate permissions (readable and executable by the server uid or gid, -only writable for the administrator). - -
-The names of the individual error documents are now determined like this -(I'm using 403 as an example, think of it as a placeholder for any of -the configured error documents): -
-One of these layout files defines the HTML document header
-and a configurable list of paths to the icons to be shown in the resulting
-error document. These paths are exported as a set of XSSI environment
-variables and are later evaluated by the "footer" special file.
-The title of the current error (which is
-put into the TITLE tag and an H1 header) is simply passed in from the main
-error document in a variable called title
.
-By changing this file, the layout of all generated error
-messages can be changed in a second.
-(By exploiting the features of XSSI, you can easily define different
-layouts based on the current virtual host, or even based on the
-client's domain name).
-
-The second layout file describes the footer to be displayed at the bottom -of every error message. In this example, it shows an apache logo, the current -server time, the server version string and adds a mail reference to the -site's webmaster. -
-For simplicity, the header file is simply called head.shtml
-because it contains server-parsed content but no language specific
-information. The footer file exists once for each language translation,
-plus a symlink for the default language.
-Example: for English, French and German versions
-(default english)
-foot.shtml.en
,
-foot.shtml.fr
,
-foot.shtml.de
,
-foot.shtml
symlink to foot.shtml.en
-Both files are included into the error document by using the
-directives <!--#include virtual="head" -->
-and <!--#include virtual="foot" -->
-respectively: the rest of the magic occurs in mod_negotiation and
-in mod_include.
-
- -See the listings below to see an actual HTML -implementation of the discussed example. - - -
-<!--#set var="title" value="error description title" --> ++ The directory for the error messages (here: + /usr/local/apache/errordocs/) must then be created + with the appropriate permissions (readable and executable by + the server uid or gid, only writable for the administrator). + +
The names of the individual error documents are now + determined like this (I'm using 403 as an example, think of it + as a placeholder for any of the configured error + documents):
+ +One of these layout files defines the HTML document header
+ and a configurable list of paths to the icons to be shown in
+ the resulting error document. These paths are exported as a set
+ of XSSI environment variables and are later evaluated by the
+ "footer" special file. The title of the current error (which is
+ put into the TITLE tag and an H1 header) is simply passed in
+ from the main error document in a variable called
+ title
.
+ By changing this file, the layout of all generated
+ error messages can be changed in a second. (By
+ exploiting the features of XSSI, you can easily define
+ different layouts based on the current virtual host, or even
+ based on the client's domain name).
The second layout file describes the footer to be displayed + at the bottom of every error message. In this example, it shows + an apache logo, the current server time, the server version + string and adds a mail reference to the site's webmaster.
+ +For simplicity, the header file is simply called
+ head.shtml
because it contains server-parsed
+ content but no language specific information. The footer file
+ exists once for each language translation, plus a symlink for
+ the default language.
Example: for English, French and German
+ versions (default english)
+ foot.shtml.en
,
+ foot.shtml.fr
,
+ foot.shtml.de
,
+ foot.shtml
symlink to
+ foot.shtml.en
Both files are included into the error document by using the
+ directives <!--#include virtual="head" -->
+ and <!--#include virtual="foot" -->
+ respectively: the rest of the magic occurs in mod_negotiation
+ and in mod_include.
See the listings below to see an + actual HTML implementation of the discussed example.
+ ++<!--#set var="title" value="error description title" --> <!--#include virtual="head" --> - explanatory error text + explanatory error text <!--#include virtual="foot" --> --In the listings section, you can see an example -of a [400 Bad Request] error document. Documents as simple as that -certainly cause no problems to translate or expand. - -
-Well, the LanguagePriority directive is for the case where the client does -not express any language priority at all. But what -happens in the situation where the client wants one -of the languages we do not have, and none of those we do have? -
-Without doing anything, the Apache server will usually return a -[406 no acceptable variant] error, listing the choices from which the client -may select. But we're in an error message already, and important error -information might get lost when the client had to choose a language -representation first. -
-So, in this situation it appears to be easier to define a fallback language -(by copying or linking, e.g., the english version to a language-less version). -Because the negotiation algorithm prefers "more specialized" variants over -"more generic" variants, these generic alternatives will only be chosen -when the normal negotiation did not succeed. -
-A simple shell script to do it (execute within the errordocs/ dir): -
++ In the listings section, you can see an + example of a [400 Bad Request] error document. Documents as + simple as that certainly cause no problems to translate or + expand. + +
Well, the LanguagePriority directive is for the case where + the client does not express any language priority at all. But + what happens in the situation where the client wants one of the + languages we do not have, and none of those we do have?
+ +Without doing anything, the Apache server will usually + return a [406 no acceptable variant] error, listing the choices + from which the client may select. But we're in an error message + already, and important error information might get lost when + the client had to choose a language representation first.
+ +So, in this situation it appears to be easier to define a + fallback language (by copying or linking, e.g., the + english version to a language-less version). Because the + negotiation algorithm prefers "more specialized" variants over + "more generic" variants, these generic alternatives will only + be chosen when the normal negotiation did not succeed.
+ +A simple shell script to do it (execute within the + errordocs/ dir):
+for f in *.shtml.en do ln -s $f `basename $f .en` done -- -
-
- -
- As of Apache-1.3, it is possible to use the ErrorDocument
- mechanism for proxy error messages as well (previous versions always
- returned fixed predefined error messages).
-
- Most proxy errors return an error code of [500 Internal Server Error].
- To find out whether a particular error document was invoked on behalf
- of a proxy error or because of some other server error, and what the reason
- for the failure was, you can check the contents of the new
- ERROR_NOTES
CGI environment variable:
- if invoked for a proxy error, this variable will contain the actual proxy
- error message text in HTML form.
-
- The following excerpt demonstrates how to exploit the ERROR_NOTES
- variable within an error document:
-
++ +
As of Apache-1.3, it is possible to use the
+ ErrorDocument
mechanism for proxy error messages
+ as well (previous versions always returned fixed predefined
+ error messages).
Most proxy errors return an error code of [500 Internal
+ Server Error]. To find out whether a particular error document
+ was invoked on behalf of a proxy error or because of some other
+ server error, and what the reason for the failure was, you can
+ check the contents of the new ERROR_NOTES
CGI
+ environment variable: if invoked for a proxy error, this
+ variable will contain the actual proxy error message text in
+ HTML form.
The following excerpt demonstrates how to exploit the
+ ERROR_NOTES
variable within an error document:
<!--#if expr="$REDIRECT_ERROR_NOTES = ''" --> <p> The server encountered an unexpected condition @@ -321,16 +373,18 @@ A simple shell script to do it (execute within the errordocs/ dir): <!--#else --> <!--#echo var="REDIRECT_ERROR_NOTES" --> <!--#endif --> -+ -
++HTML listing of the + discussed example
+ So, to summarize our example, here's the complete listing of + the 400.shtml.en document. You will notice that it + contains almost nothing but the error text (with conditional + additions). Starting with this example, you will find it easy + to add more error documents, or to translate the error + documents to different languages. +
+<!--#set var="title" value="Bad Request" --><!--#include virtual="head" --><P> Your browser sent a request that this server could not understand: @@ -351,18 +405,21 @@ documents, or to translate the error documents to different languages. <!--#endif --> </P> <!--#include virtual="foot" --> -
- -Here is the complete head.shtml file (the funny line -breaks avoid empty lines in the document after XSSI processing). Note the -configuration section at top. That's where you configure the images and logos -as well as the apache documentation directory. Look how this file displays -two different logos depending on the content of the virtual host name -($SERVER_NAME), and that an animated apache logo is shown if the browser -appears to support it (the latter requires server configuration lines -of the formBrowserMatch "^Mozilla/[2-4]" anigif
-for browser types which support animated GIFs). -++
+ Here is the complete head.shtml file (the funny + line breaks avoid empty lines in the document after XSSI + processing). Note the configuration section at top. That's + where you configure the images and logos as well as the apache + documentation directory. Look how this file displays two + different logos depending on the content of the virtual host + name ($SERVER_NAME), and that an animated apache logo is shown + if the browser appears to support it (the latter requires + server configuration lines of the form
+BrowserMatch "^Mozilla/[2-4]" anigif
+ for browser types which support animated GIFs). +
+<!--#if expr="$SERVER_NAME = /.*\.mycompany\.com/" --><!--#set var="IMG_CorpLogo" value="http://$SERVER_NAME:$SERVER_PORT/errordocs/CorpLogo.gif" @@ -394,10 +451,11 @@ for browser types which support animated GIFs). </H1> <HR><!-- ======================================================== --> <DIV> -
- and this is the foot.shtml.en file: -- ++
+ and this is the foot.shtml.en file: +
+</DIV> <HR> <DIV ALIGN="right"><SMALL><SUP>Local Server time: @@ -419,15 +477,13 @@ for browser types which support animated GIFs). </ADDRESS> </UL></BODY> </HTML> -
- - -More welcome!
- -If you have tips to contribute, send mail to martin@apache.org +
A descriptor, also commonly called a file handle is
-an object that a program uses to read or write an open file, or open
-network socket, or a variety of other devices. It is represented
-by an integer, and you may be familiar with stdin
,
-stdout
, and stderr
which are descriptors 0,
-1, and 2 respectively.
-Apache needs a descriptor for each log file, plus one for each
-network socket that it listens on, plus a handful of others. Libraries
-that Apache uses may also require descriptors. Normal programs don't
-open up many descriptors at all, and so there are some latent problems
-that you may experience should you start running Apache with many
-descriptors (i.e., with many virtual hosts).
-
-
The operating system enforces a limit on the number of descriptors -that a program can have open at a time. There are typically three limits -involved here. One is a kernel limitation, depending on your operating -system you will either be able to tune the number of descriptors available -to higher numbers (this is frequently called FD_SETSIZE). Or you -may be stuck with a (relatively) low amount. The second limit is called -the hard resource limit, and it is sometimes set by root in an -obscure operating system file, but frequently is the same as the kernel -limit. The third limit is called the soft -resource limit. The soft limit is always less than or equal to -the hard limit. For example, the hard limit may be 1024, but the soft -limit only 64. Any user can raise their soft limit up to the hard limit. -Root can raise the hard limit up to the system maximum limit. The soft -limit is the actual limit that is used when enforcing the maximum number -of files a process can have open. - -
To summarize: - -
+ + + + + + +Descriptors and Apache + + + + + + +Descriptors and Apache
+ +A descriptor, also commonly called a file + handle is an object that a program uses to read or write + an open file, or open network socket, or a variety of other + devices. It is represented by an integer, and you may be + familiar with
+ +stdin
,stdout
, and +stderr
which are descriptors 0, 1, and 2 + respectively. Apache needs a descriptor for each log file, plus + one for each network socket that it listens on, plus a handful + of others. Libraries that Apache uses may also require + descriptors. Normal programs don't open up many descriptors at + all, and so there are some latent problems that you may + experience should you start running Apache with many + descriptors (i.e., with many virtual hosts).The operating system enforces a limit on the number of + descriptors that a program can have open at a time. There are + typically three limits involved here. One is a kernel + limitation, depending on your operating system you will either + be able to tune the number of descriptors available to higher + numbers (this is frequently called FD_SETSIZE). Or you + may be stuck with a (relatively) low amount. The second limit + is called the hard resource limit, and it is sometimes + set by root in an obscure operating system file, but frequently + is the same as the kernel limit. The third limit is called the + soft resource limit. The soft limit is always less + than or equal to the hard limit. For example, the hard limit + may be 1024, but the soft limit only 64. Any user can raise + their soft limit up to the hard limit. Root can raise the hard + limit up to the system maximum limit. The soft limit is the + actual limit that is used when enforcing the maximum number of + files a process can have open.
+ +To summarize:
+ ++ - -#open files <= soft limit <= hard limit <= kernel limit -You control the hard and soft limits using the
limit
(csh) -orulimit
(sh) directives. See the respective man pages -for more information. For example you can probably use -ulimit -n unlimited
to raise your soft limit up to the -hard limit. You should include this command in a shell script which -starts your webserver. - -Unfortunately, it's not always this simple. As mentioned above, -you will probably run into some system limitations that will need to be -worked around somehow. Work was done in version 1.2.1 to improve the -situation somewhat. Here is a partial list of systems and workarounds -(assuming you are using 1.2.1 or later): - -
-DFD_SETSIZE=nnn
to
- EXTRA_CFLAGS
(where nnn is the number of descriptors
- you wish to support, keep it less than the hard limit). But it
- will run into trouble if more than approximately 240 Listen
- directives are used. This may be cured by rebuilding your kernel
- with a higher FD_SETSIZE.
- - -
FD_SETSIZE
and rebuild. But the extra
- Listen limitation doesn't exist.
- - -
- -
-DHIGH_SLACK_LINE=256
added to
- EXTRA_CFLAGS
. You will be limited to approximately
- 240 error logs if you do this.
- - -
- -
/etc/conf/cf.d/stune
file or use
- /etc/conf/cf.d/configure
choice 7
- (User and Group configuration) and modify the NOFILES
kernel
- parameter to a suitably higher value. SCO recommends a number
- between 60 and 11000, the default is 110. Relink and reboot,
- and the new number of descriptors will be available.
-
- - -
open_max_soft
and open_max_hard
- to 4096 in the proc subsystem.
- Do a man on sysconfig, sysconfigdb, and sysconfigtab.
- max-vnodes
to a large number which is greater
- than the number of apache processes * 4096
- (Setting it to 250,000 should be good for most people).
- Do a man on sysconfig, sysconfigdb, and sysconfigtab.
- NO_SLACK
to work around a bug in the OS.
- CFLAGS="-DNO_SLACK" ./configure
- - -
- -
In addition to the problems described above there are problems with -many libraries that Apache uses. The most common example is the bind -DNS resolver library that is used by pretty much every unix, which -fails if it ends up with a descriptor above 256. We suspect there -are other libraries that similar limitations. So the code as of 1.2.1 -takes a defensive stance and tries to save descriptors less than 16 -for use while processing each request. This is called the low -slack line. - -
Note that this shouldn't waste descriptors. If you really are pushing -the limits and Apache can't get a descriptor above 16 when it wants -it, it will settle for one below 16. - -
In extreme situations you may want to lower the low slack line,
-but you shouldn't ever need to. For example, lowering it can
-increase the limits 240 described above under Solaris and BSDI 2.0.
-But you'll play a delicate balancing game with the descriptors needed
-to serve a request. Should you want to play this game, the compile
-time parameter is LOW_SLACK_LINE
and there's a tiny
-bit of documentation in the header file httpd.h
.
-
-
Finally, if you suspect that all this slack stuff is causing you
-problems, you can disable it. Add -DNO_SLACK
to
-EXTRA_CFLAGS
and rebuild. But please report it to
-our Bug
-Report Page so that
-we can investigate.
-
-
-
-
+
+
You control the hard and soft limits using the
+ limit
(csh) or ulimit
(sh)
+ directives. See the respective man pages for more information.
+ For example you can probably use ulimit -n
+ unlimited
to raise your soft limit up to the hard limit.
+ You should include this command in a shell script which starts
+ your webserver.
Unfortunately, it's not always this simple. As mentioned + above, you will probably run into some system limitations that + will need to be worked around somehow. Work was done in version + 1.2.1 to improve the situation somewhat. Here is a partial list + of systems and workarounds (assuming you are using 1.2.1 or + later):
+ +-DFD_SETSIZE=nnn
to
+ EXTRA_CFLAGS
(where nnn is the number of
+ descriptors you wish to support, keep it less than the hard
+ limit). But it will run into trouble if more than
+ approximately 240 Listen directives are used. This may be
+ cured by rebuilding your kernel with a higher
+ FD_SETSIZE.FD_SETSIZE
and rebuild. But the extra Listen
+ limitation doesn't exist.-DHIGH_SLACK_LINE=256
added to
+ EXTRA_CFLAGS
. You will be limited to
+ approximately 240 error logs if you do this./etc/conf/cf.d/stune
file or use
+ /etc/conf/cf.d/configure
choice 7 (User and
+ Group configuration) and modify the NOFILES
+ kernel parameter to a suitably higher value. SCO recommends a
+ number between 60 and 11000, the default is 110. Relink and
+ reboot, and the new number of descriptors will be
+ available.open_max_soft
and
+ open_max_hard
to 4096 in the proc subsystem.
+ Do a man on sysconfig, sysconfigdb, and
+ sysconfigtab.max-vnodes
to a large number which
+ is greater than the number of apache processes * 4096
+ (Setting it to 250,000 should be good for most people).
+ Do a man on sysconfig, sysconfigdb, and
+ sysconfigtab.NO_SLACK
to work around a bug in the OS.
+ CFLAGS="-DNO_SLACK" ./configure
In addition to the problems described above there are + problems with many libraries that Apache uses. The most common + example is the bind DNS resolver library that is used by pretty + much every unix, which fails if it ends up with a descriptor + above 256. We suspect there are other libraries that similar + limitations. So the code as of 1.2.1 takes a defensive stance + and tries to save descriptors less than 16 for use while + processing each request. This is called the low slack + line.
+ +Note that this shouldn't waste descriptors. If you really + are pushing the limits and Apache can't get a descriptor above + 16 when it wants it, it will settle for one below 16.
+ +In extreme situations you may want to lower the low slack
+ line, but you shouldn't ever need to. For example, lowering it
+ can increase the limits 240 described above under Solaris and
+ BSDI 2.0. But you'll play a delicate balancing game with the
+ descriptors needed to serve a request. Should you want to play
+ this game, the compile time parameter is
+ LOW_SLACK_LINE
and there's a tiny bit of
+ documentation in the header file httpd.h
.
Finally, if you suspect that all this slack stuff is causing
+ you problems, you can disable it. Add -DNO_SLACK
+ to EXTRA_CFLAGS
and rebuild. But please report it
+ to our Bug
+ Report Page so that we can investigate.
+
+
netstat
) than they saw using older versions. When the
-server closes a TCP connection, it sends a packet with the FIN bit
-sent to the client, which then responds with a packet with the ACK bit
-set. The client then sends a packet with the FIN bit set to the
-server, which responds with an ACK and the connection is closed. The
-state that the connection is in during the period between when the
-server gets the ACK from the client and the server gets the FIN from
-the client is known as FIN_WAIT_2. See the TCP RFC for the
-technical details of the state transitions.- -The FIN_WAIT_2 state is somewhat unusual in that there is no timeout -defined in the standard for it. This means that on many operating -systems, a connection in the FIN_WAIT_2 state will stay around until -the system is rebooted. If the system does not have a timeout and -too many FIN_WAIT_2 connections build up, it can fill up the space -allocated for storing information about the connections and crash -the kernel. The connections in FIN_WAIT_2 do not tie up an httpd -process.
- -
- -
-
-If you are lucky, this means that the buggy client will fully close the -connection and release the resources on your server. However, there -are some cases where the socket is never fully closed, such as a dialup -client disconnecting from their provider before closing the client. -In addition, a client might sit idle for days without making another -connection, and thus may hold its end of the socket open for days -even though it has no further use for it. -This is a bug in the browser or in its operating system's -TCP implementation.
- -The clients on which this problem has been verified to exist:
-
- -This does not appear to be a problem on: -
- -It is expected that many other clients have the same problem. What a -client should do is periodically check its open -socket(s) to see if they have been closed by the server, and close their -side of the connection if the server has closed. This check need only -occur once every few seconds, and may even be detected by a OS signal -on some systems (e.g., Win95 and NT clients have this capability, but -they seem to be ignoring it).
- -Apache cannot avoid these FIN_WAIT_2 states unless it -disables persistent connections for the buggy clients, just -like we recommend doing for Navigator 2.x clients due to other bugs. -However, non-persistent connections increase the total number of -connections needed per client and slow retrieval of an image-laden -web page. Since non-persistent connections have their own resource -consumptions and a short waiting period after each closure, a busy server -may need persistence in order to best serve its clients.
- -As far as we know, the client-caused FIN_WAIT_2 problem is present for -all servers that support persistent connections, including Apache 1.1.x -and 1.2.
- -
lingering_close()
which was added
-between 1.1 and 1.2. This function is necessary for the proper
-handling of persistent connections and any request which includes
-content in the message body (e.g., PUTs and POSTs).
-What it does is read any data sent by the client for
-a certain time after the server closes the connection. The exact
-reasons for doing this are somewhat complicated, but involve what
-happens if the client is making a request at the same time the
-server sends a response and closes the connection. Without lingering,
-the client might be forced to reset its TCP input buffer before it
-has a chance to read the server's response, and thus understand why
-the connection has closed.
-See the appendix for more details.
-
-The code in lingering_close()
appears to cause problems
-for a number of factors, including the change in traffic patterns
-that it causes. The code has been thoroughly reviewed and we are
-not aware of any bugs in it. It is possible that there is some
-problem in the BSD TCP stack, aside from the lack of a timeout
-for the FIN_WAIT_2 state, exposed by the lingering_close
-code that causes the observed problems.
- -
- -
-
ndd
to
- modify tcp_fin_wait_2_flush_interval
, but the
- default should be appropriate for most servers and improper
- tuning can have negative impacts.
- SO_LINGER
socket option
- which is enabled by Apache. This parameter can be adjusted
- by using nettune
to modify parameters such as
- tcp_keepstart
and tcp_keepstop
.
- In later revisions, there is an explicit timer for
- connections in FIN_WAIT_2 that can be modified; contact HP
- support for details.
- -The following systems are known to not have a timeout: -
-
-There is a -patch available for adding a timeout to the FIN_WAIT_2 state; it -was originally intended for BSD/OS, but should be adaptable to most -systems using BSD networking code. You need kernel source code to be -able to use it. If you do adapt it to work for any other systems, -please drop me a note at marc@apache.org. -
-
lingering_close()
lingering_close()
function. This will result in that
-section of code being similar to that which was in 1.1. If you do
-this, be aware that it can cause problems with PUTs, POSTs and
-persistent connections, especially if the client uses pipelining.
-That said, it is no worse than on 1.1, and we understand that keeping your
-server running is quite important.
-
-To compile without the lingering_close()
function, add
--DNO_LINGCLOSE
to the end of the
-EXTRA_CFLAGS
line in your Configuration
file,
-rerun Configure
and rebuild the server.
-
-
SO_LINGER
as an alternative to
-lingering_close()
SO_LINGER
that
-can be set with setsockopt(2)
. It does something very
-similar to lingering_close()
, except that it is broken
-on many systems so that it causes far more problems than
-lingering_close
. On some systems, it could possibly work
-better so it may be worth a try if you have no other alternatives.
-
-To try it, add -DUSE_SO_LINGER -DNO_LINGCLOSE
to the end of the
-EXTRA_CFLAGS
line in your Configuration
-file, rerun Configure
and rebuild the server.
-
-NOTE: Attempting to use SO_LINGER
and
-lingering_close()
at the same time is very likely to do
-very bad things, so don't.
- -
-
-The exact way to increase them may depend on your OS; look
-for some reference to the number of "mbufs" or "mbuf clusters". On
-many systems, this can be done by adding the line
-NMBCLUSTERS="n"
, where n
is the number of
-mbuf clusters you want to your kernel config file and rebuilding your
-kernel.
-
If you are unable to do any of the above then you should, as a last -resort, disable KeepAlive. Edit your httpd.conf and change "KeepAlive On" -to "KeepAlive Off". - -
- -
-Below is a message from Roy Fielding, one of the authors of HTTP/1.1. - -
- -If a server closes the input side of the connection while the client -is sending data (or is planning to send data), then the server's TCP -stack will signal an RST (reset) back to the client. Upon -receipt of the RST, the client will flush its own incoming TCP buffer -back to the un-ACKed packet indicated by the RST packet argument. -If the server has sent a message, usually an error response, to the -client just before the close, and the client receives the RST packet -before its application code has read the error message from its incoming -TCP buffer and before the server has received the ACK sent by the client -upon receipt of that buffer, then the RST will flush the error message -before the client application has a chance to see it. The result is -that the client is left thinking that the connection failed for no -apparent reason.
- -There are two conditions under which this is likely to occur: -
-The solution in all cases is to send the response, close only the -write half of the connection (what shutdown is supposed to do), and -continue reading on the socket until it is either closed by the -client (signifying it has finally read the response) or a timeout occurs. -That is what the kernel is supposed to do if SO_LINGER is set. -Unfortunately, SO_LINGER has no effect on some systems; on some other -systems, it does not have its own timeout and thus the TCP memory -segments just pile-up until the next reboot (planned or not).
- -Please note that simply removing the linger code will not solve the -problem -- it only moves it to a different and much harder one to detect. -
netstat
) than they saw using older
+ versions. When the server closes a TCP connection, it sends
+ a packet with the FIN bit sent to the client, which then
+ responds with a packet with the ACK bit set. The client
+ then sends a packet with the FIN bit set to the server,
+ which responds with an ACK and the connection is closed.
+ The state that the connection is in during the period
+ between when the server gets the ACK from the client and
+ the server gets the FIN from the client is known as
+ FIN_WAIT_2. See the TCP RFC for
+ the technical details of the state transitions.
+
+ The FIN_WAIT_2 state is somewhat unusual in that there + is no timeout defined in the standard for it. This means + that on many operating systems, a connection in the + FIN_WAIT_2 state will stay around until the system is + rebooted. If the system does not have a timeout and too + many FIN_WAIT_2 connections build up, it can fill up the + space allocated for storing information about the + connections and crash the kernel. The connections in + FIN_WAIT_2 do not tie up an httpd process.
+If you are lucky, this means that the buggy client will + fully close the connection and release the resources on + your server. However, there are some cases where the socket + is never fully closed, such as a dialup client + disconnecting from their provider before closing the + client. In addition, a client might sit idle for days + without making another connection, and thus may hold its + end of the socket open for days even though it has no + further use for it. This is a bug in the browser or + in its operating system's TCP implementation.
+ +The clients on which this problem has been verified to + exist:
+ +This does not appear to be a problem on:
+ +It is expected that many other clients have the same + problem. What a client should do is + periodically check its open socket(s) to see if they have + been closed by the server, and close their side of the + connection if the server has closed. This check need only + occur once every few seconds, and may even be detected by a + OS signal on some systems (e.g., Win95 and NT + clients have this capability, but they seem to be ignoring + it).
+ +Apache cannot avoid these FIN_WAIT_2 + states unless it disables persistent connections for the + buggy clients, just like we recommend doing for Navigator + 2.x clients due to other bugs. However, non-persistent + connections increase the total number of connections needed + per client and slow retrieval of an image-laden web page. + Since non-persistent connections have their own resource + consumptions and a short waiting period after each closure, + a busy server may need persistence in order to best serve + its clients.
+ +As far as we know, the client-caused FIN_WAIT_2 problem + is present for all servers that support persistent + connections, including Apache 1.1.x and 1.2.
+ +lingering_close()
which was
+ added between 1.1 and 1.2. This function is necessary for
+ the proper handling of persistent connections and any
+ request which includes content in the message body
+ (e.g., PUTs and POSTs). What it does is read any
+ data sent by the client for a certain time after the server
+ closes the connection. The exact reasons for doing this are
+ somewhat complicated, but involve what happens if the
+ client is making a request at the same time the server
+ sends a response and closes the connection. Without
+ lingering, the client might be forced to reset its TCP
+ input buffer before it has a chance to read the server's
+ response, and thus understand why the connection has
+ closed. See the appendix for more
+ details.
+
+ The code in lingering_close()
appears to
+ cause problems for a number of factors, including the
+ change in traffic patterns that it causes. The code has
+ been thoroughly reviewed and we are not aware of any bugs
+ in it. It is possible that there is some problem in the BSD
+ TCP stack, aside from the lack of a timeout for the
+ FIN_WAIT_2 state, exposed by the
+ lingering_close
code that causes the observed
+ problems.
ndd
to modify
+ tcp_fin_wait_2_flush_interval
, but the
+ default should be appropriate for most servers and
+ improper tuning can have negative impacts.SO_LINGER
socket option which is enabled by
+ Apache. This parameter can be adjusted by using
+ nettune
to modify parameters such as
+ tcp_keepstart
and tcp_keepstop
.
+ In later revisions, there is an explicit timer for
+ connections in FIN_WAIT_2 that can be modified; contact
+ HP support for details.The following systems are known to not have a + timeout:
+ +There is a + patch available for adding a timeout to the FIN_WAIT_2 + state; it was originally intended for BSD/OS, but should be + adaptable to most systems using BSD networking code. You + need kernel source code to be able to use it. If you do + adapt it to work for any other systems, please drop me a + note at marc@apache.org.
+ +lingering_close()
lingering_close()
function. This will result
+ in that section of code being similar to that which was in
+ 1.1. If you do this, be aware that it can cause problems
+ with PUTs, POSTs and persistent connections, especially if
+ the client uses pipelining. That said, it is no worse than
+ on 1.1, and we understand that keeping your server running
+ is quite important.
+
+ To compile without the lingering_close()
+ function, add -DNO_LINGCLOSE
to the end of the
+ EXTRA_CFLAGS
line in your
+ Configuration
file, rerun
+ Configure
and rebuild the server.
SO_LINGER
as an alternative to
+ lingering_close()
SO_LINGER
that can be set with
+ setsockopt(2)
. It does something very similar
+ to lingering_close()
, except that it is broken
+ on many systems so that it causes far more problems than
+ lingering_close
. On some systems, it could
+ possibly work better so it may be worth a try if you have
+ no other alternatives.
+
+ To try it, add -DUSE_SO_LINGER
+ -DNO_LINGCLOSE
to the end of the
+ EXTRA_CFLAGS
line in your
+ Configuration
file, rerun
+ Configure
and rebuild the server.
NOTE: Attempting to use
+ SO_LINGER
and lingering_close()
+ at the same time is very likely to do very bad things, so
+ don't.
The exact way to increase them may depend on your
+ OS; look for some reference to the number of "mbufs" or
+ "mbuf clusters". On many systems, this can be done by
+ adding the line NMBCLUSTERS="n"
, where
+ n
is the number of mbuf clusters you want
+ to your kernel config file and rebuilding your
+ kernel.
If you are unable to do any of the above then you + should, as a last resort, disable KeepAlive. Edit your + httpd.conf and change "KeepAlive On" to "KeepAlive + Off".
+Below is a message from Roy Fielding, one of the authors + of HTTP/1.1.
+ +If a server closes the input side of the connection + while the client is sending data (or is planning to send + data), then the server's TCP stack will signal an RST + (reset) back to the client. Upon receipt of the RST, the + client will flush its own incoming TCP buffer back to the + un-ACKed packet indicated by the RST packet argument. If + the server has sent a message, usually an error response, + to the client just before the close, and the client + receives the RST packet before its application code has + read the error message from its incoming TCP buffer and + before the server has received the ACK sent by the client + upon receipt of that buffer, then the RST will flush the + error message before the client application has a chance to + see it. The result is that the client is left thinking that + the connection failed for no apparent reason.
+ +There are two conditions under which this is likely to + occur:
+ +The solution in all cases is to send the response, close + only the write half of the connection (what shutdown is + supposed to do), and continue reading on the socket until + it is either closed by the client (signifying it has + finally read the response) or a timeout occurs. That is + what the kernel is supposed to do if SO_LINGER is set. + Unfortunately, SO_LINGER has no effect on some systems; on + some other systems, it does not have its own timeout and + thus the TCP memory segments just pile-up until the next + reboot (planned or not).
+ +Please note that simply removing the linger code will + not solve the problem -- it only moves it to a different + and much harder one to detect.
+- Below is a list of additional documentation pages that apply to the - Apache web server development project. -
-Below is a list of additional documentation pages that apply + to the Apache web server development project.
+ +Over time the Apache Group has discovered or been notified of problems -with various clients which we have had to work around, or explain. -This document describes these problems and the workarounds available. -It's not arranged in any particular order. Some familiarity with the -standards is assumed, but not necessary. - -
For brevity, Navigator will refer to Netscape's Navigator -product (which in later versions was renamed "Communicator" and -various other names), and MSIE will refer to Microsoft's -Internet Explorer product. All trademarks and copyrights belong to -their respective companies. We welcome input from the various client -authors to correct inconsistencies in this paper, or to provide us with -exact version numbers where things are broken/fixed. - -
For reference, -RFC1945 -defines HTTP/1.0, and -RFC2068 -defines HTTP/1.1. Apache as of version 1.2 is an HTTP/1.1 server (with an -optional HTTP/1.0 proxy). - -
Various of these workarounds are triggered by environment variables. -The admin typically controls which are set, and for which clients, by using -mod_browser. Unless otherwise -noted all of these workarounds exist in versions 1.2 and later. - -
This is a legacy issue. The CERN webserver required POST
-data to have an extra CRLF
following it. Thus many
-clients send an extra CRLF
that
-is not included in the Content-Length
of the request.
-Apache works around this problem by eating any empty lines which
-appear before a request.
-
-
Various clients have had broken implementations of keepalive -(persistent connections). In particular the Windows versions of -Navigator 2.0 get very confused when the server times out an -idle connection. The workaround is present in the default config files: -
-BrowserMatch Mozilla/2 nokeepalive
-
-Note that this matches some earlier versions of MSIE, which began the
-practice of calling themselves Mozilla in their user-agent
-strings just like Navigator.
-
-MSIE 4.0b2, which claims to support HTTP/1.1, does not properly
-support keepalive when it is used on 301 or 302 (redirect)
-responses. Unfortunately Apache's nokeepalive
code
-prior to 1.2.2 would not work with HTTP/1.1 clients. You must apply
-this patch to version 1.2.1. Then add this to your config:
-
-BrowserMatch "MSIE 4\.0b2;" nokeepalive
-
-
-HTTP/1.1
in responseTo quote from section 3.1 of RFC1945: -
-HTTP uses a "<MAJOR>.<MINOR>" numbering scheme to indicate versions -of the protocol. The protocol versioning policy is intended to allow -the sender to indicate the format of a message and its capacity for -understanding further HTTP communication, rather than the features -obtained via that communication. --Since Apache is an HTTP/1.1 server, it indicates so as part of its -response. Many client authors mistakenly treat this part of the response -as an indication of the protocol that the response is in, and then refuse -to accept the response. - -
The first major indication of this problem was with AOL's proxy servers.
-When Apache 1.2 went into beta it was the first wide-spread HTTP/1.1
-server. After some discussion, AOL fixed their proxies. In
-anticipation of similar problems, the force-response-1.0
-environment variable was added to Apache. When present Apache will
-indicate "HTTP/1.0" in response to an HTTP/1.0 client,
-but will not in any other way change the response.
-
-
The pre-1.1 Java Development Kit (JDK) that is used in many clients -(including Navigator 3.x and MSIE 3.x) exhibits this problem. As do some -of the early pre-releases of the 1.1 JDK. We think it is fixed in the -1.1 JDK release. In any event the workaround: -
-BrowserMatch Java/1.0 force-response-1.0
-BrowserMatch JDK/1.0 force-response-1.0
-
-
-RealPlayer 4.0 from Progressive Networks also exhibits this problem.
-However they have fixed it in version 4.01 of the player, but version
-4.01 uses the same User-Agent
as version 4.0. The
-workaround is still:
-
-BrowserMatch "RealPlayer 4.0" force-response-1.0
-
-
-MSIE 4.0b2 has this problem. Its Java VM makes requests in HTTP/1.1 -format but the responses must be in HTTP/1.0 format (in particular, it -does not understand chunked responses). The workaround -is to fool Apache into believing the request came in HTTP/1.0 format. -
-BrowserMatch "MSIE 4\.0b2;" downgrade-1.0 force-response-1.0
-
-This workaround is available in 1.2.2, and in a
-patch against 1.2.1.
-
-All versions of Navigator from 2.0 through 4.0b2 (and possibly later) -have a problem if the trailing CRLF of the response header starts at -offset 256, 257 or 258 of the response. A BrowserMatch for this would -match on nearly every hit, so the workaround is enabled automatically -on all responses. The workaround implemented detects when this condition would -occur in a response and adds extra padding to the header to push the -trailing CRLF past offset 258 of the response. - -
On multipart responses some clients will not accept quotes (") -around the boundary string. The MIME standard recommends that -such quotes be used. But the clients were probably written based -on one of the examples in RFC2068, which does not include quotes. -Apache does not include quotes on its boundary strings to workaround -this problem. - -
A byterange request is used when the client wishes to retrieve a -portion of an object, not necessarily the entire object. There -was a very old draft which included these byteranges in the URL. -Old clients such as Navigator 2.0b1 and MSIE 3.0 for the MAC -exhibit this behaviour, and -it will appear in the servers' access logs as (failed) attempts to -retrieve a URL with a trailing ";xxx-yyy". Apache does not attempt -to implement this at all. - -
A subsequent draft of this standard defines a header
-Request-Range
, and a response type
-multipart/x-byteranges
. The HTTP/1.1 standard includes
-this draft with a few fixes, and it defines the header
-Range
and type multipart/byteranges
.
-
-
Navigator (versions 2 and 3) sends both Range
and
-Request-Range
headers (with the same value), but does not
-accept a multipart/byteranges
response. The response must
-be multipart/x-byteranges
. As a workaround, if Apache
-receives a Request-Range
header it considers it "higher
-priority" than a Range
header and in response uses
-multipart/x-byteranges
.
-
-
The Adobe Acrobat Reader plugin makes extensive use of byteranges and
-prior to version 3.01 supports only the multipart/x-byterange
-response. Unfortunately there is no clue that it is the plugin
-making the request. If the plugin is used with Navigator, the above
-workaround works fine. But if the plugin is used with MSIE 3 (on
-Windows) the workaround won't work because MSIE 3 doesn't give the
-Range-Request
clue that Navigator does. To workaround this,
-Apache special cases "MSIE 3" in the User-Agent
and serves
-multipart/x-byteranges
. Note that the necessity for this
-with MSIE 3 is actually due to the Acrobat plugin, not due to the browser.
-
-
Netscape Communicator appears to not issue the non-standard
-Request-Range
header. When an Acrobat plugin prior to
-version 3.01 is used with it, it will not properly understand byteranges.
-The user must upgrade their Acrobat reader to 3.01.
-
-
Set-Cookie
header is
-unmergeableThe HTTP specifications say that it is legal to merge headers with
-duplicate names into one (separated by commas). Some browsers
-that support Cookies don't like merged headers and prefer that each
-Set-Cookie
header is sent separately. When parsing the
-headers returned by a CGI, Apache will explicitly avoid merging any
-Set-Cookie
headers.
-
-
Expires
headers and GIF89A
-animationsNavigator versions 2 through 4 will erroneously re-request
-GIF89A animations on each loop of the animation if the first
-response included an Expires
header. This happens
-regardless of how far in the future the expiry time is set. There
-is no workaround supplied with Apache, however there are hacks for 1.2
-and for 1.3.
-
-
POST
without
-Content-Length
In certain situations Navigator 3.01 through 3.03 appear to incorrectly -issue a POST without the request body. There is no -known workaround. It has been fixed in Navigator 3.04, Netscapes -provides some -information. -There's also - -some information about the actual problem. - -
The http client in the JDK1.2beta2 and beta3 will throw away the first part of -the response body when both the headers and the first part of the body are sent -in the same network packet AND keep-alive's are being used. If either condition -is not met then it works fine. - -
See also Bug-ID's 4124329 and 4125538 at the java developer connection. - -
If you are seeing this bug yourself, you can add the following BrowserMatch -directive to work around it: - -
-BrowserMatch "Java1\.2beta[23]" nokeepalive
-
-
-We don't advocate this though since bending over backwards for beta software -is usually not a good idea; ideally it gets fixed, new betas or a final release -comes out, and no one uses the broken old software anymore. In theory. - -
Content-Type
change
-is not noticed after reloadNavigator (all versions?) will cache the content-type
-for an object "forever". Using reload or shift-reload will not cause
-Navigator to notice a content-type
change. The only
-work-around is for the user to flush their caches (memory and disk). By
-way of an example, some folks may be using an old mime.types
-file which does not map .htm
to text/html
,
-in this case Apache will default to sending text/plain
.
-If the user requests the page and it is served as text/plain
.
-After the admin fixes the server, the user will have to flush their caches
-before the object will be shown with the correct text/html
-type.
-
-
MSIE versions 3.00 and 3.02 (without the Y2K patch) do not handle -cookie expiry dates in the year 2000 properly. Years after 2000 and -before 2000 work fine. This is fixed in IE4.01 service pack 1, and in -the Y2K patch for IE3.02. Users should avoid using expiry dates in the -year 2000. - -
The Lynx browser versions 2.7 and 2.8 send a "negotiate: trans" header -in their requests, which is an indication the browser supports transparent -content negotiation (TCN). However the browser does not support TCN. -As of version 1.3.4, Apache supports TCN, and this causes problems with -these versions of Lynx. As a workaround future versions of Apache will -ignore this header when sent by the Lynx client. - -
MSIE 4.0 does not handle a Vary header properly. The Vary header is -generated by mod_rewrite in apache 1.3. The result is an error from MSIE -saying it cannot download the requested file. There are more details -in PR#4118. -
--A workaround is to add the following to your server's configuration -files: -
-+ + + + + + +- - - - +Apache HTTP Server Project + + + + + + +Known Problems in Clients
+ +Over time the Apache Group has discovered or been notified + of problems with various clients which we have had to work + around, or explain. This document describes these problems and + the workarounds available. It's not arranged in any particular + order. Some familiarity with the standards is assumed, but not + necessary.
+ +For brevity, Navigator will refer to Netscape's + Navigator product (which in later versions was renamed + "Communicator" and various other names), and MSIE will + refer to Microsoft's Internet Explorer product. All trademarks + and copyrights belong to their respective companies. We welcome + input from the various client authors to correct + inconsistencies in this paper, or to provide us with exact + version numbers where things are broken/fixed.
+ +For reference, RFC1945 + defines HTTP/1.0, and RFC2068 + defines HTTP/1.1. Apache as of version 1.2 is an HTTP/1.1 + server (with an optional HTTP/1.0 proxy).
+ +Various of these workarounds are triggered by environment + variables. The admin typically controls which are set, and for + which clients, by using mod_browser. Unless + otherwise noted all of these workarounds exist in versions 1.2 + and later.
+ +Trailing CRLF on + POSTs
+ +This is a legacy issue. The CERN webserver required +
+ +POST
data to have an extraCRLF
+ following it. Thus many clients send an extraCRLF
+ that is not included in theContent-Length
of the + request. Apache works around this problem by eating any empty + lines which appear before a request.Broken + keepalive
+ +Various clients have had broken implementations of + keepalive (persistent connections). In particular the + Windows versions of Navigator 2.0 get very confused when the + server times out an idle connection. The workaround is present + in the default config files:
+ +++ Note that this matches some earlier versions of MSIE, which + began the practice of calling themselves Mozilla in + their user-agent strings just like Navigator. + +BrowserMatch Mozilla/2 nokeepalive
+MSIE 4.0b2, which claims to support HTTP/1.1, does not + properly support keepalive when it is used on 301 or 302 + (redirect) responses. Unfortunately Apache's +
+ +nokeepalive
code prior to 1.2.2 would not work + with HTTP/1.1 clients. You must apply + this patch to version 1.2.1. Then add this to your + config:++ +BrowserMatch "MSIE 4\.0b2;" nokeepalive
+Incorrect interpretation of +
+ +HTTP/1.1
in responseTo quote from section 3.1 of RFC1945:
+ ++ HTTP uses a "<MAJOR>.<MINOR>" numbering scheme to + indicate versions of the protocol. The protocol versioning + policy is intended to allow the sender to indicate the format + of a message and its capacity for understanding further HTTP + communication, rather than the features obtained via that + communication. ++ Since Apache is an HTTP/1.1 server, it indicates so as part of + its response. Many client authors mistakenly treat this part of + the response as an indication of the protocol that the response + is in, and then refuse to accept the response. + +The first major indication of this problem was with AOL's + proxy servers. When Apache 1.2 went into beta it was the first + wide-spread HTTP/1.1 server. After some discussion, AOL fixed + their proxies. In anticipation of similar problems, the +
+ +force-response-1.0
environment variable was added + to Apache. When present Apache will indicate "HTTP/1.0" in + response to an HTTP/1.0 client, but will not in any other way + change the response.The pre-1.1 Java Development Kit (JDK) that is used in many + clients (including Navigator 3.x and MSIE 3.x) exhibits this + problem. As do some of the early pre-releases of the 1.1 JDK. + We think it is fixed in the 1.1 JDK release. In any event the + workaround:
+ +++ +BrowserMatch Java/1.0 force-response-1.0
+
+ BrowserMatch JDK/1.0 force-response-1.0RealPlayer 4.0 from Progressive Networks also exhibits this + problem. However they have fixed it in version 4.01 of the + player, but version 4.01 uses the same
+ +User-Agent
+ as version 4.0. The workaround is still:++ +BrowserMatch "RealPlayer 4.0" force-response-1.0
+Requests use HTTP/1.1 + but responses must be in HTTP/1.0
+ +MSIE 4.0b2 has this problem. Its Java VM makes requests in + HTTP/1.1 format but the responses must be in HTTP/1.0 format + (in particular, it does not understand chunked + responses). The workaround is to fool Apache into believing the + request came in HTTP/1.0 format.
+ +++ This workaround is available in 1.2.2, and in a + patch against 1.2.1. + +BrowserMatch "MSIE 4\.0b2;" downgrade-1.0 + force-response-1.0
+Boundary problems with + header parsing
+ +All versions of Navigator from 2.0 through 4.0b2 (and + possibly later) have a problem if the trailing CRLF of the + response header starts at offset 256, 257 or 258 of the + response. A BrowserMatch for this would match on nearly every + hit, so the workaround is enabled automatically on all + responses. The workaround implemented detects when this + condition would occur in a response and adds extra padding to + the header to push the trailing CRLF past offset 258 of the + response.
+ +Multipart + responses and Quoted Boundary Strings
+ +On multipart responses some clients will not accept quotes + (") around the boundary string. The MIME standard recommends + that such quotes be used. But the clients were probably written + based on one of the examples in RFC2068, which does not include + quotes. Apache does not include quotes on its boundary strings + to workaround this problem.
+ +Byterange requests
+ +A byterange request is used when the client wishes to + retrieve a portion of an object, not necessarily the entire + object. There was a very old draft which included these + byteranges in the URL. Old clients such as Navigator 2.0b1 and + MSIE 3.0 for the MAC exhibit this behaviour, and it will appear + in the servers' access logs as (failed) attempts to retrieve a + URL with a trailing ";xxx-yyy". Apache does not attempt to + implement this at all.
+ +A subsequent draft of this standard defines a header +
+ +Request-Range
, and a response type +multipart/x-byteranges
. The HTTP/1.1 standard + includes this draft with a few fixes, and it defines the header +Range
and type +multipart/byteranges
.Navigator (versions 2 and 3) sends both
+ +Range
+ andRequest-Range
headers (with the same value), + but does not accept amultipart/byteranges
+ response. The response must be +multipart/x-byteranges
. As a workaround, if Apache + receives aRequest-Range
header it considers it + "higher priority" than aRange
header and in + response usesmultipart/x-byteranges
.The Adobe Acrobat Reader plugin makes extensive use of + byteranges and prior to version 3.01 supports only the +
+ +multipart/x-byterange
response. Unfortunately + there is no clue that it is the plugin making the request. If + the plugin is used with Navigator, the above workaround works + fine. But if the plugin is used with MSIE 3 (on Windows) the + workaround won't work because MSIE 3 doesn't give the +Range-Request
clue that Navigator does. To + workaround this, Apache special cases "MSIE 3" in the +User-Agent
and serves +multipart/x-byteranges
. Note that the necessity + for this with MSIE 3 is actually due to the Acrobat plugin, not + due to the browser.Netscape Communicator appears to not issue the non-standard +
+ +Request-Range
header. When an Acrobat plugin prior + to version 3.01 is used with it, it will not properly + understand byteranges. The user must upgrade their Acrobat + reader to 3.01.+ +
Set-Cookie
header is + unmergeableThe HTTP specifications say that it is legal to merge + headers with duplicate names into one (separated by commas). + Some browsers that support Cookies don't like merged headers + and prefer that each
+ +Set-Cookie
header is sent + separately. When parsing the headers returned by a CGI, Apache + will explicitly avoid merging anySet-Cookie
+ headers.+ +
Expires
headers and GIF89A + animationsNavigator versions 2 through 4 will erroneously re-request + GIF89A animations on each loop of the animation if the first + response included an
+ +Expires
header. This happens + regardless of how far in the future the expiry time is set. + There is no workaround supplied with Apache, however there are + hacks for + 1.2 and for + 1.3.+ +
POST
without +Content-Length
In certain situations Navigator 3.01 through 3.03 appear to + incorrectly issue a POST without the request body. There is no + known workaround. It has been fixed in Navigator 3.04, + Netscapes provides some information. + There's also + some information about the actual problem.
+ +JDK 1.2 betas lose + parts of responses.
+ +The http client in the JDK1.2beta2 and beta3 will throw away + the first part of the response body when both the headers and + the first part of the body are sent in the same network packet + AND keep-alive's are being used. If either condition is not met + then it works fine.
+ +See also Bug-ID's 4124329 and 4125538 at the java developer + connection.
+ +If you are seeing this bug yourself, you can add the + following BrowserMatch directive to work around it:
+ +++ +BrowserMatch "Java1\.2beta[23]" nokeepalive
+We don't advocate this though since bending over backwards + for beta software is usually not a good idea; ideally it gets + fixed, new betas or a final release comes out, and no one uses + the broken old software anymore. In theory.
+ ++ +
Content-Type
+ change is not noticed after reloadNavigator (all versions?) will cache the +
+ +content-type
for an object "forever". Using reload + or shift-reload will not cause Navigator to notice a +content-type
change. The only work-around is for + the user to flush their caches (memory and disk). By way of an + example, some folks may be using an oldmime.types
+ file which does not map.htm
to +text/html
, in this case Apache will default to + sendingtext/plain
. If the user requests the page + and it is served astext/plain
. After the admin + fixes the server, the user will have to flush their caches + before the object will be shown with the correct +text/html
type.MSIE Cookie + problem with expiry date in the year 2000
+ +MSIE versions 3.00 and 3.02 (without the Y2K patch) do not + handle cookie expiry dates in the year 2000 properly. Years + after 2000 and before 2000 work fine. This is fixed in IE4.01 + service pack 1, and in the Y2K patch for IE3.02. Users should + avoid using expiry dates in the year 2000.
+ +Lynx incorrectly asking for + transparent content negotiation
+ +The Lynx browser versions 2.7 and 2.8 send a "negotiate: + trans" header in their requests, which is an indication the + browser supports transparent content negotiation (TCN). However + the browser does not support TCN. As of version 1.3.4, Apache + supports TCN, and this causes problems with these versions of + Lynx. As a workaround future versions of Apache will ignore + this header when sent by the Lynx client.
+ +MSIE 4.0 mishandles Vary + response header
+ +MSIE 4.0 does not handle a Vary header properly. The Vary + header is generated by mod_rewrite in apache 1.3. The result is + an error from MSIE saying it cannot download the requested + file. There are more details in PR#4118.
+ +A workaround is to add the following to your server's + configuration files:
+BrowserMatch "MSIE 4\.0" force-no-vary ---(This workaround is only available with releases after -1.3.6 of the Apache Web server.) -
+
(This workaround is only available with releases + after 1.3.6 of the Apache Web server.)
+ + + diff --git a/docs/manual/misc/perf-tuning.html b/docs/manual/misc/perf-tuning.html index 565e1f19df..06eb002e15 100644 --- a/docs/manual/misc/perf-tuning.html +++ b/docs/manual/misc/perf-tuning.html @@ -1,182 +1,218 @@ - - - -Warning: -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -- -
Author: Dean Gaudet - -
-Related Modules - -mod_dir -Multi-Processing module -mod_status - - |
-Related Directives - -AllowOverride -DirectoryIndex -HostNameLookups -KeepAliveTimeout -MaxSpareServers -MinSpareServers -Options (FollowSymLinks and -FollowIfOwnerMatch) -StartServers - - |
Apache is a general webserver, which is designed to be correct first, and -fast second. Even so, its performance is quite satisfactory. Most -sites have less than 10Mbits of outgoing bandwidth, which Apache can -fill using only a low end Pentium-based webserver. In practice sites -with more bandwidth require more than one machine to fill the bandwidth -due to other constraints (such as CGI or database transaction overhead). -For these reasons the development focus has been mostly on correctness -and configurability. - -
Unfortunately many folks overlook these facts and cite raw performance -numbers as if they are some indication of the quality of a web server -product. There is a bare minimum performance that is acceptable, beyond -that extra speed only caters to a much smaller segment of the market. -But in order to avoid this hurdle to the acceptance of Apache in some -markets, effort was put into Apache 1.3 to bring performance up to a -point where the difference with other high-end webservers is minimal. - -
Finally there are the folks who just plain want to see how fast something -can go. The author falls into this category. The rest of this document -is dedicated to these folks who want to squeeze every last bit of -performance out of Apache's current model, and want to understand why -it does some things which slow it down. - -
Note that this is tailored towards Apache 1.3 on Unix. Some of it applies -to Apache on NT. Apache on NT has not been tuned for performance yet; -in fact it probably performs very poorly because NT performance requires -a different programming model. - -
The single biggest hardware issue affecting webserver performance
-is RAM. A webserver should never ever have to swap, swapping increases
-the latency of each request beyond a point that users consider "fast
-enough". This causes users to hit stop and reload, further increasing
-the load. You can, and should, control the MaxClients
-setting so that your server does not spawn so many children it starts
-swapping.
-
-
Beyond that the rest is mundane: get a fast enough CPU, a fast enough -network card, and fast enough disks, where "fast enough" is something -that needs to be determined by experimentation. - -
Operating system choice is largely a matter of local concerns. But -a general guideline is to always apply the latest vendor TCP/IP patches. -HTTP serving completely breaks many of the assumptions built into Unix -kernels up through 1994 and even 1995. Good choices include -recent FreeBSD, and Linux. - -
Prior to Apache 1.3, HostnameLookups
defaulted to On.
-This adds latency
-to every request because it requires a DNS lookup to complete before
-the request is finished. In Apache 1.3 this setting defaults to Off.
-However (1.3 or later), if you use any Allow from domain
or
-Deny from domain
directives then you will pay for a
-double reverse DNS lookup (a reverse, followed by a forward to make sure
-that the reverse is not being spoofed). So for the highest performance
-avoid using these directives (it's fine to use IP addresses rather than
-domain names).
-
-
Note that it's possible to scope the directives, such as within
-a <Location /server-status>
section. In this
-case the DNS lookups are only performed on requests matching the
-criteria. Here's an example which disables
-lookups except for .html and .cgi files:
-
-
+ Use a complete list of options: -Use a complete list of options: - -+ + + + + + +Apache Performance Notes + + + + + + ++ Warning: This document has not been updated + to take into account changes made in the 2.0 version of the + Apache HTTP Server. Some of the information may still be + relevant, but please use it with care. ++ +Apache Performance Notes
+ +Author: Dean Gaudet
+ ++
+- Introduction
+ +- Hardware and Operating System + Issues
+ +- Run-Time Configuration Issues
+ +- Compile-Time Configuration + Issues
+ +- + Appendixes + + +
+
+ ++
+ ++ +Related Modules + +
+
+ mod_dir
+ Multi-Processing + module
+ mod_status
+Related Directives +
+
+ AllowOverride
+ DirectoryIndex
+ HostNameLookups
+ KeepAliveTimeout
+ MaxSpareServers
+ MinSpareServers
+ Options + (FollowSymLinks and FollowIfOwnerMatch)
+ StartServers
+Introduction
+ +Apache is a general webserver, which is designed to be + correct first, and fast second. Even so, its performance is + quite satisfactory. Most sites have less than 10Mbits of + outgoing bandwidth, which Apache can fill using only a low end + Pentium-based webserver. In practice sites with more bandwidth + require more than one machine to fill the bandwidth due to + other constraints (such as CGI or database transaction + overhead). For these reasons the development focus has been + mostly on correctness and configurability.
+ +Unfortunately many folks overlook these facts and cite raw + performance numbers as if they are some indication of the + quality of a web server product. There is a bare minimum + performance that is acceptable, beyond that extra speed only + caters to a much smaller segment of the market. But in order to + avoid this hurdle to the acceptance of Apache in some markets, + effort was put into Apache 1.3 to bring performance up to a + point where the difference with other high-end webservers is + minimal.
+ +Finally there are the folks who just plain want to see how + fast something can go. The author falls into this category. The + rest of this document is dedicated to these folks who want to + squeeze every last bit of performance out of Apache's current + model, and want to understand why it does some things which + slow it down.
+ +Note that this is tailored towards Apache 1.3 on Unix. Some + of it applies to Apache on NT. Apache on NT has not been tuned + for performance yet; in fact it probably performs very poorly + because NT performance requires a different programming + model.
+
+ +Hardware and Operating + System Issues
+ +The single biggest hardware issue affecting webserver + performance is RAM. A webserver should never ever have to swap, + swapping increases the latency of each request beyond a point + that users consider "fast enough". This causes users to hit + stop and reload, further increasing the load. You can, and + should, control the
+ +MaxClients
setting so that + your server does not spawn so many children it starts + swapping.Beyond that the rest is mundane: get a fast enough CPU, a + fast enough network card, and fast enough disks, where "fast + enough" is something that needs to be determined by + experimentation.
+ +Operating system choice is largely a matter of local + concerns. But a general guideline is to always apply the latest + vendor TCP/IP patches. HTTP serving completely breaks many of + the assumptions built into Unix kernels up through 1994 and + even 1995. Good choices include recent FreeBSD, and Linux.
+
+ +Run-Time Configuration + Issues
+ +HostnameLookups
+ +Prior to Apache 1.3,
+ +HostnameLookups
defaulted + to On. This adds latency to every request because it requires a + DNS lookup to complete before the request is finished. In + Apache 1.3 this setting defaults to Off. However (1.3 or + later), if you use anyAllow from domain
or +Deny from domain
directives then you will pay for + a double reverse DNS lookup (a reverse, followed by a forward + to make sure that the reverse is not being spoofed). So for the + highest performance avoid using these directives (it's fine to + use IP addresses rather than domain names).Note that it's possible to scope the directives, such as + within a
+ +<Location /server-status>
section. + In this case the DNS lookups are only performed on requests + matching the criteria. Here's an example which disables lookups + except for .html and .cgi files:+- -But even still, if you just need DNS names -in some CGIs you could consider doing the -HostnameLookups off <Files ~ "\.(html|cgi)$"> HostnameLookups on </Files> -gethostbyname
call in the specific CGIs that need it. - -Similarly, if you need to have hostname information in your server -logs in order to generate reports of this information, you can -postprocess your log file with logresolve, so that these lookups can be done without making the -client wait. It is recommended that you do this postprocessing, and any -other statistical analysis of the log file, somewhere other than your -production web server machine, in order that this activity does not -adversely affect server performance.
- -FollowSymLinks and SymLinksIfOwnerMatch
-Wherever in your URL-space you do not have an -
Options FollowSymLinks
, or you do have an -Options SymLinksIfOwnerMatch
Apache will have to -issue extra system calls to check up on symlinks. One extra call per -filename component. For example, if you had: - -+ But even still, if you just need DNS names in some CGIs you + could consider doing the++gethostbyname
call in the + specific CGIs that need it. + +Similarly, if you need to have hostname information in your + server logs in order to generate reports of this information, + you can postprocess your log file with logresolve, so that + these lookups can be done without making the client wait. It is + recommended that you do this postprocessing, and any other + statistical analysis of the log file, somewhere other than your + production web server machine, in order that this activity does + not adversely affect server performance.
+ +FollowSymLinks and SymLinksIfOwnerMatch
+ +Wherever in your URL-space you do not have an
+ +Options + FollowSymLinks
, or you do have anOptions + SymLinksIfOwnerMatch
Apache will have to issue extra + system calls to check up on symlinks. One extra call per + filename component. For example, if you had:+- -and a request is made for the URIDocumentRoot /www/htdocs <Directory /> Options SymLinksIfOwnerMatch </Directory> -/index.html
. -Then Apache will performlstat(2)
on/www
, -/www/htdocs
, and/www/htdocs/index.html
. The -results of theselstats
are never cached, -so they will occur on every single request. If you really desire the -symlinks security checking you can do something like this: - -+ and a request is made for the URI++/index.html
. + Then Apache will performlstat(2)
on +/www
,/www/htdocs
, and +/www/htdocs/index.html
. The results of these +lstats
are never cached, so they will occur on + every single request. If you really desire the symlinks + security checking you can do something like this: + ++- -This at least avoids the extra checks for theDocumentRoot /www/htdocs <Directory /> Options FollowSymLinks @@ -184,440 +220,486 @@ DocumentRoot /www/htdocs <Directory /www/htdocs> Options -FollowSymLinks +SymLinksIfOwnerMatch </Directory> -DocumentRoot
-path. Note that you'll need to add similar sections if you have any -Alias
orRewriteRule
paths outside of your -document root. For highest performance, and no symlink protection, -setFollowSymLinks
everywhere, and never set -SymLinksIfOwnerMatch
. - -AllowOverride
- -Wherever in your URL-space you allow overrides (typically -
.htaccess
files) Apache will attempt to open -.htaccess
for each filename component. For example, - -+ This at least avoids the extra checks for the +++DocumentRoot
path. Note that you'll need to add + similar sections if you have anyAlias
or +RewriteRule
paths outside of your document root. + For highest performance, and no symlink protection, set +FollowSymLinks
everywhere, and never set +SymLinksIfOwnerMatch
. + +AllowOverride
+ +Wherever in your URL-space you allow overrides (typically +
+ +.htaccess
files) Apache will attempt to open +.htaccess
for each filename component. For + example,+- -and a request is made for the URIDocumentRoot /www/htdocs <Directory /> AllowOverride all </Directory> -/index.html
. Then -Apache will attempt to open/.htaccess
, -/www/.htaccess
, and/www/htdocs/.htaccess
. -The solutions are similar to the previous case ofOptions -FollowSymLinks
. For highest performance use -AllowOverride None
everywhere in your filesystem. - -Negotiation
- -If at all possible, avoid content-negotiation if you're really -interested in every last ounce of performance. In practice the -benefits of negotiation outweigh the performance penalties. There's -one case where you can speed up the server. Instead of using -a wildcard such as: - -
+ and a request is made for the URI++/index.html
. + Then Apache will attempt to open/.htaccess
, +/www/.htaccess
, and +/www/htdocs/.htaccess
. The solutions are similar + to the previous case ofOptions FollowSymLinks
. + For highest performance useAllowOverride None
+ everywhere in your filesystem. + +Negotiation
+ +If at all possible, avoid content-negotiation if you're + really interested in every last ounce of performance. In + practice the benefits of negotiation outweigh the performance + penalties. There's one case where you can speed up the server. + Instead of using a wildcard such as:
+ +++ +DirectoryIndex index -
-++- -where you list the most common choice first. - -DirectoryIndex index.cgi index.pl index.shtml index.html -Also note that explicitly creating a
- -type-map
file -provides better performance than usingMultiViews
, as the -necessary information can be determined by reading this single file, -rather than having to scan the directory for files.Process Creation
- -Prior to Apache 1.3 the
MinSpareServers
, -MaxSpareServers
, andStartServers
settings -all had drastic effects on benchmark results. In particular, Apache -required a "ramp-up" period in order to reach a number of children -sufficient to serve the load being applied. After the initial -spawning ofStartServers
children, only one child per -second would be created to satisfy theMinSpareServers
-setting. So a server being accessed by 100 simultaneous clients, -using the defaultStartServers
of 5 would take on -the order 95 seconds to spawn enough children to handle the load. This -works fine in practice on real-life servers, because they aren't restarted -frequently. But does really poorly on benchmarks which might only run -for ten minutes. - -The one-per-second rule was implemented in an effort to avoid -swamping the machine with the startup of new children. If the machine -is busy spawning children it can't service requests. But it has such -a drastic effect on the perceived performance of Apache that it had -to be replaced. As of Apache 1.3, -the code will relax the one-per-second rule. It -will spawn one, wait a second, then spawn two, wait a second, then spawn -four, and it will continue exponentially until it is spawning 32 children -per second. It will stop whenever it satisfies the -
MinSpareServers
setting. - -This appears to be responsive enough that it's -almost unnecessary to twiddle the
MinSpareServers
, -MaxSpareServers
andStartServers
knobs. When -more than 4 children are spawned per second, a message will be emitted -to theErrorLog
. If you see a lot of these errors then -consider tuning these settings. Use themod_status
output -as a guide. - -Related to process creation is process death induced by the -
MaxRequestsPerChild
setting. By default this is 0, which -means that there is no limit to the number of requests handled -per child. If your configuration currently has this set to some -very low number, such as 30, you may want to bump this up significantly. -If you are running SunOS or an old version of Solaris, limit this -to 10000 or so because of memory leaks. - -When keep-alives are in use, children will be kept busy -doing nothing waiting for more requests on the already open -connection. The default
KeepAliveTimeout
of -15 seconds attempts to minimize this effect. The tradeoff -here is between network bandwidth and server resources. -In no event should you raise this above about 60 seconds, as -most of the benefits are lost. - -
- -Compile-Time Configuration Issues
- -mod_status and ExtendedStatus On
- -If you include
mod_status
-and you also setExtendedStatus On
when building and running -Apache, then on every request Apache will perform two calls to -gettimeofday(2)
(ortimes(2)
depending -on your operating system), and (pre-1.3) several extra calls to -time(2)
. This is all done so that the status report -contains timing indications. For highest performance, set -ExtendedStatus off
(which is the default). - -accept Serialization - multiple sockets
- -This discusses a shortcoming in the Unix socket API. -Suppose your -web server uses multiple
Listen
statements to listen on -either multiple ports or multiple addresses. In order to test each -socket to see if a connection is ready Apache usesselect(2)
. -select(2)
indicates that a socket has zero or -at least one connection waiting on it. Apache's model includes -multiple children, and all the idle ones test for new connections at the -same time. A naive implementation looks something like this -(these examples do not match the code, they're contrived for -pedagogical purposes): - -+ where you list the most common choice first. + +++Also note that explicitly creating a
+ +type-map
+ file provides better performance than using +MultiViews
, as the necessary information can be + determined by reading this single file, rather than having to + scan the directory for files.Process Creation
+ +Prior to Apache 1.3 the
+ +MinSpareServers
, +MaxSpareServers
, andStartServers
+ settings all had drastic effects on benchmark results. In + particular, Apache required a "ramp-up" period in order to + reach a number of children sufficient to serve the load being + applied. After the initial spawning of +StartServers
children, only one child per second + would be created to satisfy theMinSpareServers
+ setting. So a server being accessed by 100 simultaneous + clients, using the defaultStartServers
of 5 would + take on the order 95 seconds to spawn enough children to handle + the load. This works fine in practice on real-life servers, + because they aren't restarted frequently. But does really + poorly on benchmarks which might only run for ten minutes.The one-per-second rule was implemented in an effort to + avoid swamping the machine with the startup of new children. If + the machine is busy spawning children it can't service + requests. But it has such a drastic effect on the perceived + performance of Apache that it had to be replaced. As of Apache + 1.3, the code will relax the one-per-second rule. It will spawn + one, wait a second, then spawn two, wait a second, then spawn + four, and it will continue exponentially until it is spawning + 32 children per second. It will stop whenever it satisfies the +
+ +MinSpareServers
setting.This appears to be responsive enough that it's almost + unnecessary to twiddle the
+ +MinSpareServers
, +MaxSpareServers
andStartServers
+ knobs. When more than 4 children are spawned per second, a + message will be emitted to theErrorLog
. If you + see a lot of these errors then consider tuning these settings. + Use themod_status
output as a guide.Related to process creation is process death induced by the +
+ +MaxRequestsPerChild
setting. By default this is 0, + which means that there is no limit to the number of requests + handled per child. If your configuration currently has this set + to some very low number, such as 30, you may want to bump this + up significantly. If you are running SunOS or an old version of + Solaris, limit this to 10000 or so because of memory leaks.When keep-alives are in use, children will be kept busy + doing nothing waiting for more requests on the already open + connection. The default
+KeepAliveTimeout
of 15 + seconds attempts to minimize this effect. The tradeoff here is + between network bandwidth and server resources. In no event + should you raise this above about 60 seconds, as + most of the benefits are lost.
+ +Compile-Time + Configuration Issues
+ +mod_status and ExtendedStatus On
+ +If you include
+ +mod_status
and you also set +ExtendedStatus On
when building and running + Apache, then on every request Apache will perform two calls to +gettimeofday(2)
(ortimes(2)
+ depending on your operating system), and (pre-1.3) several + extra calls totime(2)
. This is all done so that + the status report contains timing indications. For highest + performance, setExtendedStatus off
(which is the + default).accept Serialization - multiple sockets
+ +This discusses a shortcoming in the Unix socket API. Suppose + your web server uses multiple
+ +Listen
statements to + listen on either multiple ports or multiple addresses. In order + to test each socket to see if a connection is ready Apache uses +select(2)
.select(2)
indicates that a + socket has zero or at least one connection + waiting on it. Apache's model includes multiple children, and + all the idle ones test for new connections at the same time. A + naive implementation looks something like this (these examples + do not match the code, they're contrived for pedagogical + purposes):+- -But this naive implementation has a serious starvation problem. Recall -that multiple children execute this loop at the same time, and so multiple -children will block at+ for (;;) { for (;;) { - for (;;) { - fd_set accept_fds; - - FD_ZERO (&accept_fds); - for (i = first_socket; i <= last_socket; ++i) { - FD_SET (i, &accept_fds); - } - rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL); - if (rc < 1) continue; - new_connection = -1; - for (i = first_socket; i <= last_socket; ++i) { - if (FD_ISSET (i, &accept_fds)) { - new_connection = accept (i, NULL, NULL); - if (new_connection != -1) break; - } - } - if (new_connection != -1) break; - } - process the new_connection; + fd_set accept_fds; + + FD_ZERO (&accept_fds); + for (i = first_socket; i <= last_socket; ++i) { + FD_SET (i, &accept_fds); + } + rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL); + if (rc < 1) continue; + new_connection = -1; + for (i = first_socket; i <= last_socket; ++i) { + if (FD_ISSET (i, &accept_fds)) { + new_connection = accept (i, NULL, NULL); + if (new_connection != -1) break; + } + } + if (new_connection != -1) break; + } + process the new_connection; } -select
when they are in between -requests. All those blocked children will awaken and return from -select
when a single request appears on any socket -(the number of children which awaken varies depending on the operating -system and timing issues). -They will all then fall down into the loop and try toaccept
-the connection. But only one will succeed (assuming there's still only -one connection ready), the rest will be blocked in -accept
. -This effectively locks those children into serving requests from that -one socket and no other sockets, and they'll be stuck there until enough -new requests appear on that socket to wake them all up. -This starvation problem was first documented in -PR#467. There -are at least two solutions. - -One solution is to make the sockets non-blocking. In this case the -
accept
won't block the children, and they will be allowed -to continue immediately. But this wastes CPU time. Suppose you have -ten idle children inselect
, and one connection arrives. -Then nine of those children will wake up, try toaccept
the -connection, fail, and loop back intoselect
, accomplishing -nothing. Meanwhile none of those children are servicing requests that -occurred on other sockets until they get back up to theselect
-again. Overall this solution does not seem very fruitful unless you -have as many idle CPUs (in a multiprocessor box) as you have idle children, -not a very likely situation. - -Another solution, the one used by Apache, is to serialize entry into -the inner loop. The loop looks like this (differences highlighted): - -
+ But this naive implementation has a serious starvation problem. + Recall that multiple children execute this loop at the same + time, and so multiple children will block at +++select
when they are in between requests. All + those blocked children will awaken and return from +select
when a single request appears on any socket + (the number of children which awaken varies depending on the + operating system and timing issues). They will all then fall + down into the loop and try toaccept
the + connection. But only one will succeed (assuming there's still + only one connection ready), the rest will be blocked + inaccept
. This effectively locks those children + into serving requests from that one socket and no other + sockets, and they'll be stuck there until enough new requests + appear on that socket to wake them all up. This starvation + problem was first documented in PR#467. There + are at least two solutions. + +One solution is to make the sockets non-blocking. In this + case the
+ +accept
won't block the children, and they + will be allowed to continue immediately. But this wastes CPU + time. Suppose you have ten idle children in +select
, and one connection arrives. Then nine of + those children will wake up, try toaccept
the + connection, fail, and loop back intoselect
, + accomplishing nothing. Meanwhile none of those children are + servicing requests that occurred on other sockets until they + get back up to theselect
again. Overall this + solution does not seem very fruitful unless you have as many + idle CPUs (in a multiprocessor box) as you have idle children, + not a very likely situation.Another solution, the one used by Apache, is to serialize + entry into the inner loop. The loop looks like this + (differences highlighted):
+ ++- -The functions -for (;;) { - accept_mutex_on (); - for (;;) { - fd_set accept_fds; - - FD_ZERO (&accept_fds); - for (i = first_socket; i <= last_socket; ++i) { - FD_SET (i, &accept_fds); - } - rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL); - if (rc < 1) continue; - new_connection = -1; - for (i = first_socket; i <= last_socket; ++i) { - if (FD_ISSET (i, &accept_fds)) { - new_connection = accept (i, NULL, NULL); - if (new_connection != -1) break; - } - } - if (new_connection != -1) break; - } - accept_mutex_off (); - process the new_connection; + accept_mutex_on (); + for (;;) { + fd_set accept_fds; + + FD_ZERO (&accept_fds); + for (i = first_socket; i <= last_socket; ++i) { + FD_SET (i, &accept_fds); + } + rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL); + if (rc < 1) continue; + new_connection = -1; + for (i = first_socket; i <= last_socket; ++i) { + if (FD_ISSET (i, &accept_fds)) { + new_connection = accept (i, NULL, NULL); + if (new_connection != -1) break; + } + } + if (new_connection != -1) break; + } + accept_mutex_off (); + process the new_connection; } -accept_mutex_on
andaccept_mutex_off
-implement a mutual exclusion semaphore. Only one child can have the -mutex at any time. There are several choices for implementing these -mutexes. The choice is defined insrc/conf.h
(pre-1.3) or -src/include/ap_config.h
(1.3 or later). Some architectures -do not have any locking choice made, on these architectures it is unsafe -to use multipleListen
directives. - --
- -USE_FLOCK_SERIALIZED_ACCEPT
-- This method uses the
flock(2)
system call to lock a -lock file (located by theLockFile
directive). - -USE_FCNTL_SERIALIZED_ACCEPT
-- This method uses the
fcntl(2)
system call to lock a -lock file (located by theLockFile
directive). - -USE_SYSVSEM_SERIALIZED_ACCEPT
-- (1.3 or later) This method uses SysV-style semaphores to implement the -mutex. Unfortunately SysV-style semaphores have some bad side-effects. -One is that it's possible Apache will die without cleaning up the semaphore -(see the
ipcs(8)
man page). The other is that the semaphore -API allows for a denial of service attack by any CGIs running under the -same uid as the webserver (i.e., all CGIs, unless you use something -like suexec or cgiwrapper). For these reasons this method is not used -on any architecture except IRIX (where the previous two are prohibitively -expensive on most IRIX boxes). - -USE_USLOCK_SERIALIZED_ACCEPT
-- (1.3 or later) This method is only available on IRIX, and uses -
usconfig(2)
to create a mutex. While this method avoids -the hassles of SysV-style semaphores, it is not the default for IRIX. -This is because on single processor IRIX boxes (5.3 or 6.2) the -uslock code is two orders of magnitude slower than the SysV-semaphore -code. On multi-processor IRIX boxes the uslock code is an order of magnitude -faster than the SysV-semaphore code. Kind of a messed up situation. -So if you're using a multiprocessor IRIX box then you should rebuild your -webserver with-DUSE_USLOCK_SERIALIZED_ACCEPT
on the -EXTRA_CFLAGS
. - -USE_PTHREAD_SERIALIZED_ACCEPT
-- (1.3 or later) This method uses POSIX mutexes and should work on -any architecture implementing the full POSIX threads specification, -however appears to only work on Solaris (2.5 or later), and even then -only in certain configurations. If you experiment with this you should -watch out for your server hanging and not responding. Static content -only servers may work just fine. -
If your system has another method of serialization which isn't in the -above list then it may be worthwhile adding code for it (and submitting -a patch back to Apache). - -
Another solution that has been considered but never implemented is -to partially serialize the loop -- that is, let in a certain number -of processes. This would only be of interest on multiprocessor boxes -where it's possible multiple children could run simultaneously, and the -serialization actually doesn't take advantage of the full bandwidth. -This is a possible area of future investigation, but priority remains -low because highly parallel web servers are not the norm. - -
Ideally you should run servers without multiple
Listen
-statements if you want the highest performance. But read on. - -accept Serialization - single socket
- -The above is fine and dandy for multiple socket servers, but what -about single socket servers? In theory they shouldn't experience -any of these same problems because all children can just block in -
accept(2)
until a connection arrives, and no starvation -results. In practice this hides almost the same "spinning" behaviour -discussed above in the non-blocking solution. The way that most TCP -stacks are implemented, the kernel actually wakes up all processes blocked -inaccept
when a single connection arrives. One of those -processes gets the connection and returns to user-space, the rest spin in -the kernel and go back to sleep when they discover there's no connection -for them. This spinning is hidden from the user-land code, but it's -there nonetheless. This can result in the same load-spiking wasteful -behaviour that a non-blocking solution to the multiple sockets case can. - -For this reason we have found that many architectures behave more -"nicely" if we serialize even the single socket case. So this is -actually the default in almost all cases. Crude experiments under -Linux (2.0.30 on a dual Pentium pro 166 w/128Mb RAM) have shown that -the serialization of the single socket case causes less than a 3% -decrease in requests per second over unserialized single-socket. -But unserialized single-socket showed an extra 100ms latency on -each request. This latency is probably a wash on long haul lines, -and only an issue on LANs. If you want to override the single socket -serialization you can define
SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-and then single-socket servers will not serialize at all. - -Lingering Close
- -As discussed in -draft-ietf-http-connection-00.txt section 8, -in order for an HTTP server to reliably implement the protocol -it needs to shutdown each direction of the communication independently -(recall that a TCP connection is bi-directional, each half is independent -of the other). This fact is often overlooked by other servers, but -is correctly implemented in Apache as of 1.2. - -
When this feature was added to Apache it caused a flurry of -problems on various versions of Unix because of a shortsightedness. -The TCP specification does not state that the FIN_WAIT_2 state has a -timeout, but it doesn't prohibit it. On systems without the timeout, -Apache 1.2 induces many sockets stuck forever in the FIN_WAIT_2 state. -In many cases this can be avoided by simply upgrading to the latest -TCP/IP patches supplied by the vendor. In cases where the vendor has -never released patches (i.e., SunOS4 -- although folks with a source -license can patch it themselves) we have decided to disable this feature. - -
There are two ways of accomplishing this. One is the -socket option
SO_LINGER
. But as fate would have it, -this has never been implemented properly in most TCP/IP stacks. Even -on those stacks with a proper implementation (i.e., Linux 2.0.31) this -method proves to be more expensive (cputime) than the next solution. - -For the most part, Apache implements this in a function called -
lingering_close
(inhttp_main.c
). The -function looks roughly like this: - -+ The functions +++accept_mutex_on
andaccept_mutex_off
+ implement a mutual exclusion semaphore. Only one child can have + the mutex at any time. There are several choices for + implementing these mutexes. The choice is defined in +src/conf.h
(pre-1.3) or +src/include/ap_config.h
(1.3 or later). Some + architectures do not have any locking choice made, on these + architectures it is unsafe to use multipleListen
+ directives. + ++
+ +- + +
USE_FLOCK_SERIALIZED_ACCEPT
- This method uses the
+ +flock(2)
system call to + lock a lock file (located by theLockFile
+ directive).- + +
USE_FCNTL_SERIALIZED_ACCEPT
- This method uses the
+ +fcntl(2)
system call to + lock a lock file (located by theLockFile
+ directive).- + +
USE_SYSVSEM_SERIALIZED_ACCEPT
- (1.3 or later) This method uses SysV-style semaphores to + implement the mutex. Unfortunately SysV-style semaphores have + some bad side-effects. One is that it's possible Apache will + die without cleaning up the semaphore (see the +
+ +ipcs(8)
man page). The other is that the + semaphore API allows for a denial of service attack by any + CGIs running under the same uid as the webserver + (i.e., all CGIs, unless you use something like + suexec or cgiwrapper). For these reasons this method is not + used on any architecture except IRIX (where the previous two + are prohibitively expensive on most IRIX boxes).- + +
USE_USLOCK_SERIALIZED_ACCEPT
- (1.3 or later) This method is only available on IRIX, and + uses
+ +usconfig(2)
to create a mutex. While this + method avoids the hassles of SysV-style semaphores, it is not + the default for IRIX. This is because on single processor + IRIX boxes (5.3 or 6.2) the uslock code is two orders of + magnitude slower than the SysV-semaphore code. On + multi-processor IRIX boxes the uslock code is an order of + magnitude faster than the SysV-semaphore code. Kind of a + messed up situation. So if you're using a multiprocessor IRIX + box then you should rebuild your webserver with +-DUSE_USLOCK_SERIALIZED_ACCEPT
on the +EXTRA_CFLAGS
.- + +
USE_PTHREAD_SERIALIZED_ACCEPT
- (1.3 or later) This method uses POSIX mutexes and should + work on any architecture implementing the full POSIX threads + specification, however appears to only work on Solaris (2.5 + or later), and even then only in certain configurations. If + you experiment with this you should watch out for your server + hanging and not responding. Static content only servers may + work just fine.
+If your system has another method of serialization which + isn't in the above list then it may be worthwhile adding code + for it (and submitting a patch back to Apache).
+ +Another solution that has been considered but never + implemented is to partially serialize the loop -- that is, let + in a certain number of processes. This would only be of + interest on multiprocessor boxes where it's possible multiple + children could run simultaneously, and the serialization + actually doesn't take advantage of the full bandwidth. This is + a possible area of future investigation, but priority remains + low because highly parallel web servers are not the norm.
+ +Ideally you should run servers without multiple +
+ +Listen
statements if you want the highest + performance. But read on.accept Serialization - single socket
+ +The above is fine and dandy for multiple socket servers, but + what about single socket servers? In theory they shouldn't + experience any of these same problems because all children can + just block in
+ +accept(2)
until a connection + arrives, and no starvation results. In practice this hides + almost the same "spinning" behaviour discussed above in the + non-blocking solution. The way that most TCP stacks are + implemented, the kernel actually wakes up all processes blocked + inaccept
when a single connection arrives. One of + those processes gets the connection and returns to user-space, + the rest spin in the kernel and go back to sleep when they + discover there's no connection for them. This spinning is + hidden from the user-land code, but it's there nonetheless. + This can result in the same load-spiking wasteful behaviour + that a non-blocking solution to the multiple sockets case + can.For this reason we have found that many architectures behave + more "nicely" if we serialize even the single socket case. So + this is actually the default in almost all cases. Crude + experiments under Linux (2.0.30 on a dual Pentium pro 166 + w/128Mb RAM) have shown that the serialization of the single + socket case causes less than a 3% decrease in requests per + second over unserialized single-socket. But unserialized + single-socket showed an extra 100ms latency on each request. + This latency is probably a wash on long haul lines, and only an + issue on LANs. If you want to override the single socket + serialization you can define +
+ +SINGLE_LISTEN_UNSERIALIZED_ACCEPT
and then + single-socket servers will not serialize at all.Lingering Close
+ +As discussed in + draft-ietf-http-connection-00.txt section 8, in order for + an HTTP server to reliably implement the + protocol it needs to shutdown each direction of the + communication independently (recall that a TCP connection is + bi-directional, each half is independent of the other). This + fact is often overlooked by other servers, but is correctly + implemented in Apache as of 1.2.
+ +When this feature was added to Apache it caused a flurry of + problems on various versions of Unix because of a + shortsightedness. The TCP specification does not state that the + FIN_WAIT_2 state has a timeout, but it doesn't prohibit it. On + systems without the timeout, Apache 1.2 induces many sockets + stuck forever in the FIN_WAIT_2 state. In many cases this can + be avoided by simply upgrading to the latest TCP/IP patches + supplied by the vendor. In cases where the vendor has never + released patches (i.e., SunOS4 -- although folks with + a source license can patch it themselves) we have decided to + disable this feature.
+ +There are two ways of accomplishing this. One is the socket + option
+ +SO_LINGER
. But as fate would have it, this + has never been implemented properly in most TCP/IP stacks. Even + on those stacks with a proper implementation (i.e., + Linux 2.0.31) this method proves to be more expensive (cputime) + than the next solution.For the most part, Apache implements this in a function + called
+ +lingering_close
(in +http_main.c
). The function looks roughly like + this:+- -This naturally adds some expense at the end of a connection, but it -is required for a reliable implementation. As HTTP/1.1 becomes more -prevalent, and all connections are persistent, this expense will be -amortized over more requests. If you want to play with fire and -disable this feature you can definevoid lingering_close (int s) { - char junk_buffer[2048]; - - /* shutdown the sending side */ - shutdown (s, 1); - - signal (SIGALRM, lingering_death); - alarm (30); - - for (;;) { - select (s for reading, 2 second timeout); - if (error) break; - if (s is ready for reading) { - if (read (s, junk_buffer, sizeof (junk_buffer)) <= 0) { - break; - } - /* just toss away whatever is here */ - } - } - - close (s); + char junk_buffer[2048]; + + /* shutdown the sending side */ + shutdown (s, 1); + + signal (SIGALRM, lingering_death); + alarm (30); + + for (;;) { + select (s for reading, 2 second timeout); + if (error) break; + if (s is ready for reading) { + if (read (s, junk_buffer, sizeof (junk_buffer)) <= 0) { + break; + } + /* just toss away whatever is here */ + } } -NO_LINGCLOSE
, but -this is not recommended at all. In particular, as HTTP/1.1 pipelined -persistent connections come into uselingering_close
-is an absolute necessity (and - -pipelined connections are faster, so you -want to support them). - -Scoreboard File
- -Apache's parent and children communicate with each other through -something called the scoreboard. Ideally this should be implemented -in shared memory. For those operating systems that we either have -access to, or have been given detailed ports for, it typically is -implemented using shared memory. The rest default to using an -on-disk file. The on-disk file is not only slow, but it is unreliable -(and less featured). Peruse the
src/main/conf.h
file -for your architecture and look for eitherUSE_MMAP_SCOREBOARD
or -USE_SHMGET_SCOREBOARD
. Defining one of those two (as -well as their companionsHAVE_MMAP
andHAVE_SHMGET
-respectively) enables the supplied shared memory code. If your system has -another type of shared memory, edit the filesrc/main/http_main.c
-and add the hooks necessary to use it in Apache. (Send us back a patch -too please.) - -Historical note: The Linux port of Apache didn't start to use -shared memory until version 1.2 of Apache. This oversight resulted -in really poor and unreliable behaviour of earlier versions of Apache -on Linux. - -
- -
DYNAMIC_MODULE_LIMIT
If you have no intention of using dynamically loaded modules -(you probably don't if you're reading this and tuning your -server for every last ounce of performance) then you should add -
-DDYNAMIC_MODULE_LIMIT=0
when building your server. -This will save RAM that's allocated only for supporting dynamically -loaded modules. - -
- -Appendix: Detailed Analysis of a Trace
- -Here is a system call trace of Apache 1.3 running on Linux. The run-time -configuration file is essentially the default plus: - -+ This naturally adds some expense at the end of a connection, + but it is required for a reliable implementation. As HTTP/1.1 + becomes more prevalent, and all connections are persistent, + this expense will be amortized over more requests. If you want + to play with fire and disable this feature you can define ++ + close (s); + } ++NO_LINGCLOSE
, but this is not recommended at all. + In particular, as HTTP/1.1 pipelined persistent connections + come into uselingering_close
is an absolute + necessity (and + pipelined connections are faster, so you want to support + them). + +Scoreboard File
+ +Apache's parent and children communicate with each other + through something called the scoreboard. Ideally this should be + implemented in shared memory. For those operating systems that + we either have access to, or have been given detailed ports + for, it typically is implemented using shared memory. The rest + default to using an on-disk file. The on-disk file is not only + slow, but it is unreliable (and less featured). Peruse the +
+ +src/main/conf.h
file for your architecture and + look for eitherUSE_MMAP_SCOREBOARD
or +USE_SHMGET_SCOREBOARD
. Defining one of those two + (as well as their companionsHAVE_MMAP
and +HAVE_SHMGET
respectively) enables the supplied + shared memory code. If your system has another type of shared + memory, edit the filesrc/main/http_main.c
and add + the hooks necessary to use it in Apache. (Send us back a patch + too please.)Historical note: The Linux port of Apache didn't start to + use shared memory until version 1.2 of Apache. This oversight + resulted in really poor and unreliable behaviour of earlier + versions of Apache on Linux.
+ ++ +
DYNAMIC_MODULE_LIMIT
If you have no intention of using dynamically loaded modules + (you probably don't if you're reading this and tuning your + server for every last ounce of performance) then you should add +
+-DDYNAMIC_MODULE_LIMIT=0
when building your + server. This will save RAM that's allocated only for supporting + dynamically loaded modules.
+ +Appendix: Detailed Analysis of a + Trace
+ Here is a system call trace of Apache 1.3 running on Linux. The + run-time configuration file is essentially the default plus: + ++- -The file being requested is a static 6K file of no particular content. -Traces of non-static requests or requests with content negotiation -look wildly different (and quite ugly in some cases). First the -entire trace, then we'll examine details. (This was generated by -the<Directory /> AllowOverride none Options FollowSymLinks </Directory> -strace
program, other similar programs include -truss
,ktrace
, andpar
.) - -+ The file being requested is a static 6K file of no particular + content. Traces of non-static requests or requests with content + negotiation look wildly different (and quite ugly in some + cases). First the entire trace, then we'll examine details. + (This was generated by the++strace
program, other + similar programs includetruss
, +ktrace
, andpar
.) + +++ +accept(15, {sin_family=AF_INET, sin_port=htons(22283), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3 flock(18, LOCK_UN) = 0 sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0 @@ -643,203 +725,232 @@ close(3) = 0 sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0 munmap(0x400ee000, 6144) = 0 flock(18, LOCK_EX) = 0 -
Notice the accept serialization: +
Notice the accept serialization:
-+ These two calls can be removed by defining ++++- -These two calls can be removed by defining -flock(18, LOCK_UN) = 0 ... flock(18, LOCK_EX) = 0 -SINGLE_LISTEN_UNSERIALIZED_ACCEPT
as described earlier. +
SINGLE_LISTEN_UNSERIALIZED_ACCEPT
as described
+ earlier.
-Notice the SIGUSR1
manipulation:
+
Notice the SIGUSR1
manipulation:
+ It is possible to eliminate this call in many situations (such + as when there are no virtual hosts, or when+++- -This is caused by the implementation of graceful restarts. When the -parent receives asigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0 ... sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0 ... sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0 -SIGUSR1
it sends aSIGUSR1
-to all of its children (and it also increments a "generation counter" -in shared memory). Any children that are idle (between connections) -will immediately die -off when they receive the signal. Any children that are in keep-alive -connections, but are in between requests will die off immediately. But -any children that have a connection and are still waiting for the first -request will not die off immediately. - -To see why this is necessary, consider how a browser reacts to a closed -connection. If the connection was a keep-alive connection and the request -being serviced was not the first request then the browser will quietly -reissue the request on a new connection. It has to do this because the -server is always free to close a keep-alive connection in between requests -(i.e., due to a timeout or because of a maximum number of requests). -But, if the connection is closed before the first response has been -received the typical browser will display a "document contains no data" -dialogue (or a broken image icon). This is done on the assumption that -the server is broken in some way (or maybe too overloaded to respond -at all). So Apache tries to avoid ever deliberately closing the connection -before it has sent a single response. This is the cause of those -
SIGUSR1
manipulations. - -Note that it is theoretically possible to eliminate all three of -these calls. But in rough tests the gain proved to be almost unnoticeable. - -
In order to implement virtual hosts, Apache needs to know the -local socket address used to accept the connection: - -
+ This is caused by the implementation of graceful restarts. When + the parent receives a++SIGUSR1
it sends a +SIGUSR1
to all of its children (and it also + increments a "generation counter" in shared memory). Any + children that are idle (between connections) will immediately + die off when they receive the signal. Any children that are in + keep-alive connections, but are in between requests will die + off immediately. But any children that have a connection and + are still waiting for the first request will not die off + immediately. + +To see why this is necessary, consider how a browser reacts + to a closed connection. If the connection was a keep-alive + connection and the request being serviced was not the first + request then the browser will quietly reissue the request on a + new connection. It has to do this because the server is always + free to close a keep-alive connection in between requests + (i.e., due to a timeout or because of a maximum number + of requests). But, if the connection is closed before the first + response has been received the typical browser will display a + "document contains no data" dialogue (or a broken image icon). + This is done on the assumption that the server is broken in + some way (or maybe too overloaded to respond at all). So Apache + tries to avoid ever deliberately closing the connection before + it has sent a single response. This is the cause of those +
+ +SIGUSR1
manipulations.Note that it is theoretically possible to eliminate all + three of these calls. But in rough tests the gain proved to be + almost unnoticeable.
+ +In order to implement virtual hosts, Apache needs to know + the local socket address used to accept the connection:
+ +++getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 -
Listen
+ directives are used which do not have wildcard addresses). But
+ no effort has yet been made to do these optimizations.
-It is possible to eliminate this call in many situations (such as when
-there are no virtual hosts, or when Listen
directives are
-used which do not have wildcard addresses). But no effort has yet been
-made to do these optimizations.
+ Apache turns off the Nagle algorithm:
-Apache turns off the Nagle algorithm: - -
+ because of problems described in a + paper by John Heidemann. -+++- -because of problems described in -a -paper by John Heidemann. +setsockopt(3, IPPROTO_TCP1, [1], 4) = 0 -
Notice the two time
calls:
+
Notice the two time
calls:
+ These can be removed by setting+++- -One of these occurs at the beginning of the request, and the other occurs -as a result of writing the log. At least one of these is required to -properly implement the HTTP protocol. The second occurs because the -Common Log Format dictates that the log record include a timestamp of the -end of the request. A custom logging module could eliminate one of the -calls. Or you can use a method which moves the time into shared memory, -see the patches section below. - -time(NULL) = 873959960 ... time(NULL) = 873959960 -As described earlier,
ExtendedStatus On
causes two -gettimeofday
calls and a call totimes
: - -+ One of these occurs at the beginning of the request, and the + other occurs as a result of writing the log. At least one of + these is required to properly implement the HTTP protocol. The + second occurs because the Common Log Format dictates that the + log record include a timestamp of the end of the request. A + custom logging module could eliminate one of the calls. Or you + can use a method which moves the time into shared memory, see + the patches section below. + +++As described earlier,
+ +ExtendedStatus On
causes + twogettimeofday
calls and a call to +times
:++gettimeofday({873959960, 404935}, NULL) = 0 ... gettimeofday({873959960, 417742}, NULL) = 0 times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747 -
ExtendedStatus Off
+ (which is the default).
-These can be removed by setting ExtendedStatus Off
(which
-is the default).
+ It might seem odd to call stat
:
It might seem odd to call stat
:
-
-
+ which were described earlier. -which were described earlier. ++++- -This is part of the algorithm which calculates the -stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0 -PATH_INFO
for use by CGIs. In fact if the request had been -for the URI/cgi-bin/printenv/foobar
then there would be -two calls tostat
. The first for -/home/dgaudet/ap/apachen/cgi-bin/printenv/foobar
-which does not exist, and the second for -/home/dgaudet/ap/apachen/cgi-bin/printenv
, which does exist. -Regardless, at least onestat
call is necessary when -serving static files because the file size and modification times are -used to generate HTTP headers (such asContent-Length
, -Last-Modified
) and implement protocol features (such -asIf-Modified-Since
). A somewhat more clever server -could avoid thestat
when serving non-static files, -however doing so in Apache is very difficult given the modular structure. - -All static files are served using
mmap
: - -+ This is part of the algorithm which calculates the +++PATH_INFO
for use by CGIs. In fact if the request + had been for the URI/cgi-bin/printenv/foobar
then + there would be two calls tostat
. The first for +/home/dgaudet/ap/apachen/cgi-bin/printenv/foobar
+ which does not exist, and the second for +/home/dgaudet/ap/apachen/cgi-bin/printenv
, which + does exist. Regardless, at least onestat
call is + necessary when serving static files because the file size and + modification times are used to generate HTTP headers (such as +Content-Length
,Last-Modified
) and + implement protocol features (such as +If-Modified-Since
). A somewhat more clever server + could avoid thestat
when serving non-static + files, however doing so in Apache is very difficult given the + modular structure. + +All static files are served using
+ +mmap
:+- -On some architectures it's slower tommap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000 ... munmap(0x400ee000, 6144) = 0 -mmap
small -files than it is to simplyread
them. The define -MMAP_THRESHOLD
can be set to the minimum -size required before usingmmap
. By default -it's set to 0 (except on SunOS4 where experimentation has -shown 8192 to be a better value). Using a tool such as lmbench you -can determine the optimal setting for your environment. - -You may also wish to experiment with
MMAP_SEGMENT_SIZE
-(default 32768) which determines the maximum number of bytes that -will be written at a time from mmap()d files. Apache only resets the -client'sTimeout
in between write()s. So setting this -large may lock out low bandwidth clients unless you also increase the -Timeout
. - -It may even be the case that
mmap
isn't -used on your architecture; if so then definingUSE_MMAP_FILES
-andHAVE_MMAP
might work (if it works then report back to us). - -Apache does its best to avoid copying bytes around in memory. The -first write of any request typically is turned into a
writev
-which combines both the headers and the first hunk of data: - -+ On some architectures it's slower to++mmap
small + files than it is to simplyread
them. The define +MMAP_THRESHOLD
can be set to the minimum size + required before usingmmap
. By default it's set to + 0 (except on SunOS4 where experimentation has shown 8192 to be + a better value). Using a tool such as lmbench you can + determine the optimal setting for your environment. + +You may also wish to experiment with +
+ +MMAP_SEGMENT_SIZE
(default 32768) which determines + the maximum number of bytes that will be written at a time from + mmap()d files. Apache only resets the client's +Timeout
in between write()s. So setting this large + may lock out low bandwidth clients unless you also increase the +Timeout
.It may even be the case that
+ +mmap
isn't used on + your architecture; if so then defining +USE_MMAP_FILES
andHAVE_MMAP
might + work (if it works then report back to us).Apache does its best to avoid copying bytes around in + memory. The first write of any request typically is turned into + a
+ +writev
which combines both the headers and the + first hunk of data:+- -When doing HTTP/1.1 chunked encoding Apache will generate up to four -elementwritev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389 -writev
s. The goal is to push the byte copying -into the kernel, where it typically has to happen anyhow (to assemble -network packets). On testing, various Unixes (BSDI 2.x, Solaris 2.5, -Linux 2.0.31+) properly combine the elements into network packets. -Pre-2.0.31 Linux will not combine, and will create a packet for -each element, so upgrading is a good idea. DefiningNO_WRITEV
-will disable this combining, but result in very poor chunked encoding -performance. - -The log write: - -
+ When doing HTTP/1.1 chunked encoding Apache will generate up to + four element++writev
s. The goal is to push the byte + copying into the kernel, where it typically has to happen + anyhow (to assemble network packets). On testing, various + Unixes (BSDI 2.x, Solaris 2.5, Linux 2.0.31+) properly combine + the elements into network packets. Pre-2.0.31 Linux will not + combine, and will create a packet for each element, so + upgrading is a good idea. DefiningNO_WRITEV
will + disable this combining, but result in very poor chunked + encoding performance. + +The log write:
+ ++- -can be deferred by definingwrite(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71 -BUFFERED_LOGS
. In this case -up toPIPE_BUF
bytes (a POSIX defined constant) of log entries -are buffered before writing. At no time does it split a log entry -across aPIPE_BUF
boundary because those writes may not -be atomic. (i.e., entries from multiple children could become mixed together). -The code does its best to flush this buffer when a child dies. - -The lingering close code causes four system calls: - -
+ can be deferred by defining++BUFFERED_LOGS
. In this + case up toPIPE_BUF
bytes (a POSIX defined + constant) of log entries are buffered before writing. At no + time does it split a log entry across aPIPE_BUF
+ boundary because those writes may not be atomic. + (i.e., entries from multiple children could become + mixed together). The code does its best to flush this buffer + when a child dies. + +The lingering close code causes four system calls:
+ +++shutdown(3, 1 /* send */) = 0 oldselect(4, [3], NULL, [3], {2, 0}) = 1 (in [3], left {2, 0}) read(3, "", 2048) = 0 close(3) = 0 -
Let's apply some of these optimizations:
+ -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT
+ -DBUFFERED_LOGS
and ExtendedStatus Off
.
+ Here's the final trace:
Let's apply some of these optimizations:
--DSINGLE_LISTEN_UNSERIALIZED_ACCEPT -DBUFFERED_LOGS
and
-ExtendedStatus Off
. Here's the final trace:
-
-
+ That's 19 system calls, of which 4 remain relatively easy to + remove, but don't seem worth the effort. + +++- -That's 19 system calls, of which 4 remain relatively easy to remove, -but don't seem worth the effort. - -accept(15, {sin_family=AF_INET, sin_port=htons(22286), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3 sigaction(SIGUSR1, {SIG_IGN}, {0x8058c98, [], SA_INTERRUPT}) = 0 getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 @@ -859,85 +970,94 @@ read(3, "", 2048) = 0 close(3) = 0 sigaction(SIGUSR1, {0x8058c98, [], SA_INTERRUPT}, {SIG_IGN}) = 0 munmap(0x400e3000, 6144) = 0 -Appendix: Patches Available
- -There are - -several performance patches available for 1.3. Although they may -not apply cleanly to the current version, -it shouldn't be difficult for someone with a little C knowledge to -update them. In particular: - --
- -- A -patch to remove all
time(2)
system calls. -- A -patch to remove various system calls from
mod_include
, -these calls are used by few sites but required for backwards compatibility. -- A -patch which integrates the above two plus a few other speedups at the -cost of removing some functionality. -
Appendix: The Pre-Forking Model
- -Apache (on Unix) is a pre-forking model server. The -parent process is responsible only for forking child -processes, it does not serve any requests or service any network -sockets. The child processes actually process connections, they serve -multiple connections (one at a time) before dying. -The parent spawns new or kills off old -children in response to changes in the load on the server (it does so -by monitoring a scoreboard which the children keep up to date). - -
This model for servers offers a robustness that other models do -not. In particular, the parent code is very simple, and with a high -degree of confidence the parent will continue to do its job without -error. The children are complex, and when you add in third party -code via modules, you risk segmentation faults and other forms of -corruption. Even should such a thing happen, it only affects one -connection and the server continues serving requests. The parent -quickly replaces the dead child. - -
Pre-forking is also very portable across dialects of Unix. -Historically this has been an important goal for Apache, and it continues -to remain so. - -
The pre-forking model comes under criticism for various -performance aspects. Of particular concern are the overhead -of forking a process, the overhead of context switches between -processes, and the memory overhead of having multiple processes. -Furthermore it does not offer as many opportunities for data-caching -between requests (such as a pool of
mmapped
files). -Various other models exist and extensive analysis can be found in the - papers -of the JAWS project. In practice all of these costs vary drastically -depending on the operating system. - -Apache's core code is already multithread aware, and Apache version -1.3 is multithreaded on NT. There have been at least two other experimental -implementations of threaded Apache, one using the 1.3 code base on DCE, -and one using a custom user-level threads package and the 1.0 code base; -neither is publicly available. There is also an experimental port of -Apache 1.3 to -Netscape's Portable Run Time, which -is available -(but you're encouraged to join the -new-httpd mailing list -if you intend to use it). -Part of our redesign for version 2.0 -of Apache will include abstractions of the server model so that we -can continue to support the pre-forking model, and also support various -threaded models. - - - - + +
time(2)
system
+ calls.mod_include
, these calls are used by few sites
+ but required for backwards compatibility.Apache (on Unix) is a pre-forking model server. The + parent process is responsible only for forking + child processes, it does not serve any requests or + service any network sockets. The child processes actually + process connections, they serve multiple connections (one at a + time) before dying. The parent spawns new or kills off old + children in response to changes in the load on the server (it + does so by monitoring a scoreboard which the children keep up + to date).
+ +This model for servers offers a robustness that other models + do not. In particular, the parent code is very simple, and with + a high degree of confidence the parent will continue to do its + job without error. The children are complex, and when you add + in third party code via modules, you risk segmentation faults + and other forms of corruption. Even should such a thing happen, + it only affects one connection and the server continues serving + requests. The parent quickly replaces the dead child.
+ +Pre-forking is also very portable across dialects of Unix. + Historically this has been an important goal for Apache, and it + continues to remain so.
+ +The pre-forking model comes under criticism for various
+ performance aspects. Of particular concern are the overhead of
+ forking a process, the overhead of context switches between
+ processes, and the memory overhead of having multiple
+ processes. Furthermore it does not offer as many opportunities
+ for data-caching between requests (such as a pool of
+ mmapped
files). Various other models exist and
+ extensive analysis can be found in the papers
+ of the JAWS project. In practice all of these costs vary
+ drastically depending on the operating system.
Apache's core code is already multithread aware, and Apache + version 1.3 is multithreaded on NT. There have been at least + two other experimental implementations of threaded Apache, one + using the 1.3 code base on DCE, and one using a custom + user-level threads package and the 1.0 code base; neither is + publicly available. There is also an experimental port of + Apache 1.3 to Netscape's + Portable Run Time, which is + available (but you're encouraged to join the new-httpd mailing + list if you intend to use it). Part of our redesign for + version 2.0 of Apache will include abstractions of the server + model so that we can continue to support the pre-forking model, + and also support various threaded models. + +
+ + + diff --git a/docs/manual/misc/rewriteguide.html b/docs/manual/misc/rewriteguide.html index 0f469bd8b0..278e93b247 100644 --- a/docs/manual/misc/rewriteguide.html +++ b/docs/manual/misc/rewriteguide.html @@ -1,114 +1,129 @@ - - -- - -+ + - - - - diff --git a/docs/manual/misc/security_tips.html b/docs/manual/misc/security_tips.html index 964d7d89bf..9ff6a709bb 100644 --- a/docs/manual/misc/security_tips.html +++ b/docs/manual/misc/security_tips.html @@ -1,183 +1,194 @@ - - - -- -- --Apache 1.3
- -Originally written by
-URL Rewriting Guide
-
-Ralf S. Engelschall <rse@apache.org>
-December 1997 - --This document supplements the mod_rewrite reference documentation. It describes -how one can use Apache's mod_rewrite to solve typical URL-based problems -webmasters are usually confronted with in practice. I give detailed -descriptions on how to solve each problem by configuring URL rewriting -rulesets. - -
Introduction to mod_rewrite
- -The Apache module mod_rewrite is a killer one, i.e. it is a really -sophisticated module which provides a powerful way to do URL manipulations. -With it you can nearly do all types of URL manipulations you ever dreamed -about. The price you have to pay is to accept complexity, because -mod_rewrite's major drawback is that it is not easy to understand and use for -the beginner. And even Apache experts sometimes discover new aspects where -mod_rewrite can help. --In other words: With mod_rewrite you either shoot yourself in the foot the -first time and never use it again or love it for the rest of your life because -of its power. This paper tries to give you a few initial success events to -avoid the first case by presenting already invented solutions to you. - -
Practical Solutions
- -Here come a lot of practical solutions I've either invented myself or -collected from other peoples solutions in the past. Feel free to learn the -black magic of URL rewriting from these examples. - --
- -
-ATTENTION: Depending on your server-configuration it can be necessary to -slightly change the examples for your situation, e.g. adding the [PT] flag -when additionally using mod_alias and mod_userdir, etc. Or rewriting a ruleset -to fit in .htaccess
context instead of per-server context. Always try -to understand what a particular ruleset really does before you use it. It -avoid problems. -URL Layout
- --
Canonical URLs
-- -
-
- -- Description: -
- -On some webservers there are more than one URL for a resource. Usually there -are canonical URLs (which should be actually used and distributed) and those -which are just shortcuts, internal ones, etc. Independed which URL the user -supplied with the request he should finally see the canonical one only. - -
-
- Solution: -
- -We do an external HTTP redirect for all non-canonical URLs to fix them in the -location view of the Browser and for all subsequent requests. In the example -ruleset below we replace
/~user
by the canonical/u/user
and -fix a missing trailing slash for/u/user
. - -- -
-RewriteRule ^/~([^/]+)/?(.*) /u/$1/$2 [R] -RewriteRule ^/([uge])/([^/]+)$ /$1/$2/ [R] --
Canonical Hostnames
-- -
-
+ +- Description: -
- -... - -
-
- Solution: -
- - -
++
+ + + + + + + + +Apache 1.3 URL Rewriting Guide + + + + ++ + +++ +Apache 1.3
+ + + Originally written by
+ URL Rewriting Guide
+
+ Ralf S. Engelschall <rse@apache.org>
+ December 1997 + +This document supplements the mod_rewrite reference documentation. + It describes how one can use Apache's mod_rewrite to solve + typical URL-based problems webmasters are usually confronted + with in practice. I give detailed descriptions on how to + solve each problem by configuring URL rewriting rulesets.
+ +Introduction to + mod_rewrite
+ The Apache module mod_rewrite is a killer one, i.e. it is a + really sophisticated module which provides a powerful way to + do URL manipulations. With it you can nearly do all types of + URL manipulations you ever dreamed about. The price you have + to pay is to accept complexity, because mod_rewrite's major + drawback is that it is not easy to understand and use for the + beginner. And even Apache experts sometimes discover new + aspects where mod_rewrite can help. + +In other words: With mod_rewrite you either shoot yourself + in the foot the first time and never use it again or love it + for the rest of your life because of its power. This paper + tries to give you a few initial success events to avoid the + first case by presenting already invented solutions to + you.
+ +Practical Solutions
+ Here come a lot of practical solutions I've either invented + myself or collected from other peoples solutions in the past. + Feel free to learn the black magic of URL rewriting from + these examples. + ++
+ ++ +ATTENTION: Depending on your server-configuration it + can be necessary to slightly change the examples for your + situation, e.g. adding the [PT] flag when additionally + using mod_alias and mod_userdir, etc. Or rewriting a + ruleset to fit in +.htaccess
context instead + of per-server context. Always try to understand what a + particular ruleset really does before you use it. It + avoid problems.URL Layout
+ +Canonical URLs
+ ++
+ +- Description:
+ +- On some webservers there are more than one URL for a + resource. Usually there are canonical URLs (which should be + actually used and distributed) and those which are just + shortcuts, internal ones, etc. Independed which URL the + user supplied with the request he should finally see the + canonical one only.
+ +- Solution:
+ +- + We do an external HTTP redirect for all non-canonical + URLs to fix them in the location view of the Browser and + for all subsequent requests. In the example ruleset below + we replace
+/~user
by the canonical +/u/user
and fix a missing trailing slash for +/u/user
. + ++
++ ++ ++RewriteRule ^/~([^/]+)/?(.*) /u/$1/$2 [R] +RewriteRule ^/([uge])/([^/]+)$ /$1/$2/ [R] ++Canonical Hostnames
+ ++
- -- Description:
+ +- ...
+ +- Solution:
+ +- +
+
- -+ + RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] RewriteCond %{HTTP_HOST} !^$ RewriteCond %{SERVER_PORT} !^80$ @@ -116,228 +131,281 @@ RewriteRule ^/(.*) http://fully.qualified.domain.name:%{SERVER_PORT}/$1 RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] RewriteCond %{HTTP_HOST} !^$ RewriteRule ^/(.*) http://fully.qualified.domain.name/$1 [L,R] --
Moved DocumentRoot
-- -
-
+ +- Description: -
- -Usually the DocumentRoot of the webserver directly relates to the URL -``
/
''. But often this data is not really of top-level priority, it is -perhaps just one entity of a lot of data pools. For instance at our Intranet -sites there are/e/www/
(the homepage for WWW),/e/sww/
(the -homepage for the Intranet) etc. Now because the data of the DocumentRoot stays -at/e/www/
we had to make sure that all inlined images and other -stuff inside this data pool work for subsequent requests. - --
- Solution: -
- -We just redirect the URL
+/
to/e/www/
. While is seems -trivial it is actually trivial with mod_rewrite, only. Because the typical -old mechanisms of URL Aliases (as provides by mod_alias and friends) -only used prefix matching. With this you cannot do such a redirection -because the DocumentRoot is a prefix of all URLs. With mod_rewrite it is -really trivial: - -+
+ + ++Moved DocumentRoot
+ ++
- -- Description:
+ +- Usually the DocumentRoot of the webserver directly + relates to the URL ``
+ +/
''. But often this data + is not really of top-level priority, it is perhaps just one + entity of a lot of data pools. For instance at our Intranet + sites there are/e/www/
(the homepage for + WWW),/e/sww/
(the homepage for the Intranet) + etc. Now because the data of the DocumentRoot stays at +/e/www/
we had to make sure that all inlined + images and other stuff inside this data pool work for + subsequent requests.- Solution:
+ +- + We just redirect the URL
/
to +/e/www/
. While is seems trivial it is + actually trivial with mod_rewrite, only. Because the + typical old mechanisms of URL Aliases (as + provides by mod_alias and friends) only used + prefix matching. With this you cannot do such a + redirection because the DocumentRoot is a prefix of all + URLs. With mod_rewrite it is really trivial: + ++
- -+ + RewriteEngine on -RewriteRule ^/$ /e/www/ [R] --
Trailing Slash Problem
-- -
-
+ +- Description: -
- -Every webmaster can sing a song about the problem of the trailing slash on -URLs referencing directories. If they are missing, the server dumps an error, -because if you say
/~quux/foo
instead of -/~quux/foo/
then the server searches for a file named -foo
. And because this file is a directory it complains. Actually -is tries to fix it themself in most of the cases, but sometimes this mechanism -need to be emulated by you. For instance after you have done a lot of -complicated URL rewritings to CGI scripts etc. - --
- Solution: -
- -The solution to this subtle problem is to let the server add the trailing -slash automatically. To do this correctly we have to use an external redirect, -so the browser correctly requests subsequent images etc. If we only did a -internal rewrite, this would only work for the directory page, but would go -wrong when any images are included into this page with relative URLs, because -the browser would request an in-lined object. For instance, a request for -
+image.gif
in/~quux/foo/index.html
would become -/~quux/image.gif
without the external redirect! --So, to do this trick we write: - -
+
+ + +RewriteRule ^/$ /e/www/ [R] ++Trailing Slash Problem
+ ++
- -- Description:
+ +- Every webmaster can sing a song about the problem of + the trailing slash on URLs referencing directories. If they + are missing, the server dumps an error, because if you say +
+ +/~quux/foo
instead of/~quux/foo/
+ then the server searches for a file named +foo
. And because this file is a directory it + complains. Actually is tries to fix it themself in most of + the cases, but sometimes this mechanism need to be emulated + by you. For instance after you have done a lot of + complicated URL rewritings to CGI scripts etc.- Solution:
+ +- + The solution to this subtle problem is to let the server + add the trailing slash automatically. To do this + correctly we have to use an external redirect, so the + browser correctly requests subsequent images etc. If we + only did a internal rewrite, this would only work for the + directory page, but would go wrong when any images are + included into this page with relative URLs, because the + browser would request an in-lined object. For instance, a + request for
image.gif
in +/~quux/foo/index.html
would become +/~quux/image.gif
without the external + redirect! + +So, to do this trick we write:
+ ++
- -+ + RewriteEngine on RewriteBase /~quux/ -RewriteRule ^foo$ foo/ [R] --The crazy and lazy can even do the following in the top-level -
.htaccess
file of their homedir. But notice that this creates some -processing overhead. - -+ +
+ + +RewriteRule ^foo$ foo/ [R] ++The crazy and lazy can even do the following in the + top-level
+ +.htaccess
file of their homedir. + But notice that this creates some processing + overhead.+
- -+ + RewriteEngine on RewriteBase /~quux/ -RewriteCond %{REQUEST_FILENAME} -d -RewriteRule ^(.+[^/])$ $1/ [R] --
Webcluster through Homogeneous URL Layout
-- -
-
- Description: -
- -We want to create a homogenous and consistent URL layout over all WWW servers -on a Intranet webcluster, i.e. all URLs (per definition server local and thus -server dependent!) become actually server independed! What we want is -to give the WWW namespace a consistent server-independend layout: no URL -should have to include any physically correct target server. The cluster -itself should drive us automatically to the physical target host. - -
-
- Solution: -
- -First, the knowledge of the target servers come from (distributed) external -maps which contain information where our users, groups and entities stay. -The have the form - -
+RewriteCond %{REQUEST_FILENAME} -d +RewriteRule ^(.+[^/])$ $1/ [R] ++Webcluster through Homogeneous URL Layout
+ ++
+ +- Description:
+ +- We want to create a homogenous and consistent URL + layout over all WWW servers on a Intranet webcluster, i.e. + all URLs (per definition server local and thus server + dependent!) become actually server independed! + What we want is to give the WWW namespace a consistent + server-independend layout: no URL should have to include + any physically correct target server. The cluster itself + should drive us automatically to the physical target + host.
+ +- Solution:
+ +- + First, the knowledge of the target servers come from + (distributed) external maps which contain information + where our users, groups and entities stay. The have the + form +
+user1 server_of_user1 user2 server_of_user2 : : -- -We put them into files
map.xxx-to-host
. Second we need to instruct -all servers to redirect URLs of the forms + -+-to - -We put them into files
+map.xxx-to-host
. + Second we need to instruct all servers to redirect URLs + of the forms/u/user/anypath /g/group/anypath /e/entity/anypath -+
+to
+http://physical-host/u/user/anypath http://physical-host/g/group/anypath http://physical-host/e/entity/anypath -- -when the URL is not locally valid to a server. The following ruleset does -this for us by the help of the map files (assuming that server0 is a default -server which will be used if a user has no entry in the map): - -
+
+ + ++ +when the URL is not locally valid to a server. The + following ruleset does this for us by the help of the map + files (assuming that server0 is a default server which + will be used if a user has no entry in the map):
+ ++
- - - -+ + RewriteEngine on RewriteMap user-to-host txt:/path/to/map.user-to-host RewriteMap group-to-host txt:/path/to/map.group-to-host RewriteMap entity-to-host txt:/path/to/map.entity-to-host -RewriteRule ^/u/([^/]+)/?(.*) http://${user-to-host:$1|server0}/u/$1/$2 -RewriteRule ^/g/([^/]+)/?(.*) http://${group-to-host:$1|server0}/g/$1/$2 -RewriteRule ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2 +RewriteRule ^/u/([^/]+)/?(.*) http://${user-to-host:$1|server0}/u/$1/$2 +RewriteRule ^/g/([^/]+)/?(.*) http://${group-to-host:$1|server0}/g/$1/$2 +RewriteRule ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2 RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\ --
Move Homedirs to Different Webserver
-- -
-
+ +- Description: -
- -A lot of webmaster aksed for a solution to the following situation: They -wanted to redirect just all homedirs on a webserver to another webserver. -They usually need such things when establishing a newer webserver which will -replace the old one over time. - -
-
- Solution: -
- -The solution is trivial with mod_rewrite. On the old webserver we just -redirect all
+/~user/anypath
URLs to -http://newserver/~user/anypath
. - -+
+ + ++Move Homedirs to Different Webserver
+ ++
- -- Description:
+ +- A lot of webmaster aksed for a solution to the + following situation: They wanted to redirect just all + homedirs on a webserver to another webserver. They usually + need such things when establishing a newer webserver which + will replace the old one over time.
+ +- Solution:
+ +- + The solution is trivial with mod_rewrite. On the old + webserver we just redirect all +
/~user/anypath
URLs to +http://newserver/~user/anypath
. + ++
- -+ + RewriteEngine on -RewriteRule ^/~(.+) http://newserver/~$1 [R,L] --
Structured Homedirs
-- -
-
+ +- Description: -
- -Some sites with thousend of users usually use a structured homedir layout, -i.e. each homedir is in a subdirectory which begins for instance with the -first character of the username. So,
/~foo/anypath
is -/home/f/foo/.www/anypath
while/~bar/anypath
is -/home/b/bar/.www/anypath
. - --
- Solution: -
- -We use the following ruleset to expand the tilde URLs into exactly the above -layout. - -
++
+ + +RewriteRule ^/~(.+) http://newserver/~$1 [R,L] ++Structured Homedirs
+ ++
- -- Description:
+ +- Some sites with thousend of users usually use a + structured homedir layout, i.e. each homedir is in a + subdirectory which begins for instance with the first + character of the username. So,
+ +/~foo/anypath
+ is/home/f/foo/.www/anypath
+ while/~bar/anypath
is +/home/b/bar/.www/anypath
.- Solution:
+ +- + We use the following ruleset to expand the tilde URLs + into exactly the above layout. + +
+
- -+ + RewriteEngine on -RewriteRule ^/~(([a-z])[a-z0-9]+)(.*) /home/$2/$1/.www$3 --
Filesystem Reorganisation
-- -
-
- Description: -
- -This really is a hardcore example: a killer application which heavily uses -per-directory
RewriteRules
to get a smooth look and feel on the Web -while its data structure is never touched or adjusted. - -Background: net.sw is my archive of freely available Unix -software packages, which I started to collect in 1992. It is both my hobby and -job to to this, because while I'm studying computer science I have also worked -for many years as a system and network administrator in my spare time. Every -week I need some sort of software so I created a deep hierarchy of -directories where I stored the packages: - -+RewriteRule ^/~(([a-z])[a-z0-9]+)(.*) /home/$2/$1/.www$3 ++Filesystem Reorganisation
+ ++
- +- Description:
+ +- + This really is a hardcore example: a killer application + which heavily uses per-directory +
RewriteRules
to get a smooth look and feel + on the Web while its data structure is never touched or + adjusted. Background: net.sw is + my archive of freely available Unix software packages, + which I started to collect in 1992. It is both my hobby + and job to to this, because while I'm studying computer + science I have also worked for many years as a system and + network administrator in my spare time. Every week I need + some sort of software so I created a deep hierarchy of + directories where I stored the packages: +drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/ drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/ drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/ @@ -354,24 +422,27 @@ drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/ drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/ drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/ drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/ -- -In July 1996 I decided to make this archive public to the world via a -nice Web interface. "Nice" means that I wanted to -offer an interface where you can browse directly through the archive hierarchy. -And "nice" means that I didn't wanted to change anything inside this hierarchy -- not even by putting some CGI scripts at the top of it. Why? Because the -above structure should be later accessible via FTP as well, and I didn't -want any Web or CGI stuff to be there. - -
-
- Solution: -
- -The solution has two parts: The first is a set of CGI scripts which create all -the pages at all directory levels on-the-fly. I put them under -
+ +/e/netsw/.www/
as follows: - -++ +In July 1996 I decided to make this archive public to + the world via a nice Web interface. "Nice" means that I + wanted to offer an interface where you can browse + directly through the archive hierarchy. And "nice" means + that I didn't wanted to change anything inside this + hierarchy - not even by putting some CGI scripts at the + top of it. Why? Because the above structure should be + later accessible via FTP as well, and I didn't want any + Web or CGI stuff to be there.
+- Solution:
+ +- + The solution has two parts: The first is a set of CGI + scripts which create all the pages at all directory + levels on-the-fly. I put them under +
+/e/netsw/.www/
as follows: +-rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/ -rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE @@ -385,32 +456,45 @@ drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/ -rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi -rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi -rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst -- -The
DATA/
subdirectory holds the above directory structure, i.e. the -real net.sw stuff and gets automatically updated via -rdist
from time to time. - -The second part of the problem remains: how to link these two structures -together into one smooth-looking URL tree? We want to hide theDATA/
-directory from the user while running the appropriate CGI scripts for the -various URLs. - -Here is the solution: first I put the following into the per-directory -configuration file in the Document Root of the server to rewrite the announced -URL/net.sw/
to the internal path/e/netsw
: - -+
+ + ++ +The
+ +DATA/
subdirectory holds the above + directory structure, i.e. the real + net.sw stuff and gets + automatically updated viardist
from time to + time. The second part of the problem remains: how to link + these two structures together into one smooth-looking URL + tree? We want to hide theDATA/
directory + from the user while running the appropriate CGI scripts + for the various URLs. Here is the solution: first I put + the following into the per-directory configuration file + in the Document Root of the server to rewrite the + announced URL/net.sw/
to the internal path +/e/netsw
:+
- -+ + RewriteRule ^net.sw$ net.sw/ [R] RewriteRule ^net.sw/(.*)$ e/netsw/$1 --The first rule is for requests which miss the trailing slash! The second rule -does the real thing. And then comes the killer configuration which stays in -the per-directory config file
/e/netsw/.www/.wwwacl
: - -+ +
+ + ++The first rule is for requests which miss the trailing + slash! The second rule does the real thing. And then + comes the killer configuration which stays in the + per-directory config file +
+ +/e/netsw/.www/.wwwacl
:+
- -+ + Options ExecCGI FollowSymLinks Includes MultiViews RewriteEngine on @@ -439,239 +523,309 @@ RewriteRule ^netsw-img/.*$ - [L] # by another cgi script RewriteRule !^netsw-lsdir\.cgi.* - [C] RewriteRule (.*) netsw-lsdir.cgi/$1 --Some hints for interpretation: -
-
- - - -- Notice the L (last) flag and no substitution field ('-') in the - forth part -
- Notice the ! (not) character and the C (chain) flag - at the first rule in the last part -
- Notice the catch-all pattern in the last rule -
-
NCSA imagemap to Apache mod_imap
-- -
-
+ +- Description: -
- -When switching from the NCSA webserver to the more modern Apache webserver a -lot of people want a smooth transition. So they want pages which use their old -NCSA
imagemap
program to work under Apache with the modern -mod_imap
. The problem is that there are a lot of -hyperlinks around which reference theimagemap
program via -/cgi-bin/imagemap/path/to/page.map
. Under Apache this -has to read just/path/to/page.map
. - --
- Solution: -
- -We use a global rule to remove the prefix on-the-fly for all requests: - -
++ +
+ + ++Some hints for interpretation:
+ ++
+- Notice the L (last) flag and no substitution field + ('-') in the forth part
+ +- Notice the ! (not) character and the C (chain) flag + at the first rule in the last part
+ +- Notice the catch-all pattern in the last rule
+NCSA imagemap to Apache mod_imap
+ ++
- Description:
+ +- When switching from the NCSA webserver to the more + modern Apache webserver a lot of people want a smooth + transition. So they want pages which use their old NCSA +
+ +imagemap
program to work under Apache with the + modernmod_imap
. The problem is that there are + a lot of hyperlinks around which reference the +imagemap
program via +/cgi-bin/imagemap/path/to/page.map
. Under + Apache this has to read just +/path/to/page.map
.- Solution:
+ +- + We use a global rule to remove the prefix on-the-fly for + all requests: + +
+
+ ++ + RewriteEngine on RewriteRule ^/cgi-bin/imagemap(.*) $1 [PT] -Search pages in more than one directory
--
Search pages in more than one directory
-+
+
+- Description:
--
+ +- Description: -
- -Sometimes it is neccessary to let the webserver search for pages in more than -one directory. Here MultiViews or other techniques cannot help. +
- Sometimes it is neccessary to let the webserver search + for pages in more than one directory. Here MultiViews or + other techniques cannot help.
--
- Solution: -
- -We program a explicit ruleset which searches for the files in the directories. +
- Solution:
-+ +
+ + +- + We program a explicit ruleset which searches for the + files in the directories. + +
+
- - - -+ + RewriteEngine on # first try to find it in custom/... # ...and if found stop and be happy: -RewriteCond /your/docroot/dir1/%{REQUEST_FILENAME} -f -RewriteRule ^(.+) /your/docroot/dir1/$1 [L] +RewriteCond /your/docroot/dir1/%{REQUEST_FILENAME} -f +RewriteRule ^(.+) /your/docroot/dir1/$1 [L] # second try to find it in pub/... # ...and if found stop and be happy: -RewriteCond /your/docroot/dir2/%{REQUEST_FILENAME} -f -RewriteRule ^(.+) /your/docroot/dir2/$1 [L] +RewriteCond /your/docroot/dir2/%{REQUEST_FILENAME} -f +RewriteRule ^(.+) /your/docroot/dir2/$1 [L] # else go on for other Alias or ScriptAlias directives, # etc. RewriteRule ^(.+) - [PT] --
Set Environment Variables According To URL Parts
-- -
-
+ +- Description: -
- -Perhaps you want to keep status information between requests and use the URL -to encode it. But you don't want to use a CGI wrapper for all pages just to -strip out this information. - -
-
- Solution: -
- -We use a rewrite rule to strip out the status information and remember it via -an environment variable which can be later dereferenced from within XSSI or -CGI. This way a URL
+/foo/S=java/bar/
gets translated to -/foo/bar/
and the environment variable namedSTATUS
is set -to the value "java". - -+
+ + ++Set Environment Variables According To URL Parts
+ ++
- -- Description:
+ +- Perhaps you want to keep status information between + requests and use the URL to encode it. But you don't want + to use a CGI wrapper for all pages just to strip out this + information.
+ +- Solution:
+ +- + We use a rewrite rule to strip out the status information + and remember it via an environment variable which can be + later dereferenced from within XSSI or CGI. This way a + URL
/foo/S=java/bar/
gets translated to +/foo/bar/
and the environment variable named +STATUS
is set to the value "java". + ++
- -+ + RewriteEngine on -RewriteRule ^(.*)/S=([^/]+)/(.*) $1/$3 [E=STATUS:$2] --
Virtual User Hosts
-- -
-
+ +- Description: -
- -Assume that you want to provide
www.username.host.domain.com
-for the homepage of username via just DNS A records to the same machine and -without any virtualhosts on this machine. - --
- Solution: -
- -For HTTP/1.0 requests there is no solution, but for HTTP/1.1 requests which -contain a Host: HTTP header we can use the following ruleset to rewrite -
+http://www.username.host.com/anypath
internally to -/home/username/anypath
: - -+
+ + +RewriteRule ^(.*)/S=([^/]+)/(.*) $1/$3 [E=STATUS:$2] ++Virtual User Hosts
+ ++
- -- Description:
+ +- Assume that you want to provide +
+ +www.username.host.domain.com
+ for the homepage of username via just DNS A records to the + same machine and without any virtualhosts on this + machine.- Solution:
+ +- + For HTTP/1.0 requests there is no solution, but for + HTTP/1.1 requests which contain a Host: HTTP header we + can use the following ruleset to rewrite +
http://www.username.host.com/anypath
+ internally to/home/username/anypath
: + ++
- -+ + RewriteEngine on -RewriteCond %{HTTP_HOST} ^www\.[^.]+\.host\.com$ +RewriteCond %{HTTP_HOST} ^www\.[^.]+\.host\.com$ RewriteRule ^(.+) %{HTTP_HOST}$1 [C] -RewriteRule ^www\.([^.]+)\.host\.com(.*) /home/$1$2 --
Redirect Homedirs For Foreigners
-- -
-
+ +- Description: -
- -We want to redirect homedir URLs to another webserver -
www.somewhere.com
when the requesting user does not stay in the local -domainourdomain.com
. This is sometimes used in virtual host -contexts. - --
- Solution: -
- -Just a rewrite condition: - -
++
+ + +RewriteRule ^www\.([^.]+)\.host\.com(.*) /home/$1$2 ++Redirect Homedirs For Foreigners
+ ++
- -- Description:
+ +- We want to redirect homedir URLs to another webserver +
+ +www.somewhere.com
when the requesting user + does not stay in the local domain +ourdomain.com
. This is sometimes used in + virtual host contexts.- Solution:
+ +- + Just a rewrite condition: + +
+
- -+ + RewriteEngine on -RewriteCond %{REMOTE_HOST} !^.+\.ourdomain\.com$ +RewriteCond %{REMOTE_HOST} !^.+\.ourdomain\.com$ RewriteRule ^(/~.+) http://www.somewhere.com/$1 [R,L] --
Redirect Failing URLs To Other Webserver
-- -
-
+ +- Description: -
- -A typical FAQ about URL rewriting is how to redirect failing requests on -webserver A to webserver B. Usually this is done via ErrorDocument -CGI-scripts in Perl, but there is also a mod_rewrite solution. But notice that -this is less performant than using a ErrorDocument CGI-script! - -
-
- Solution: -
- -The first solution has the best performance but less flexibility and is less -error safe: - -
++
+ + ++Redirect Failing URLs To Other Webserver
+ ++
- -- Description:
+ +- A typical FAQ about URL rewriting is how to redirect + failing requests on webserver A to webserver B. Usually + this is done via ErrorDocument CGI-scripts in Perl, but + there is also a mod_rewrite solution. But notice that this + is less performant than using a ErrorDocument + CGI-script!
+ +- Solution:
+ +- + The first solution has the best performance but less + flexibility and is less error safe: + +
+
- -+ + RewriteEngine on -RewriteCond /your/docroot/%{REQUEST_FILENAME} !-f -RewriteRule ^(.+) http://webserverB.dom/$1 --The problem here is that this will only work for pages inside the -DocumentRoot. While you can add more Conditions (for instance to also handle -homedirs, etc.) there is better variant: - -
+ +
+ + +RewriteCond /your/docroot/%{REQUEST_FILENAME} !-f +RewriteRule ^(.+) http://webserverB.dom/$1 ++The problem here is that this will only work for pages + inside the DocumentRoot. While you can add more + Conditions (for instance to also handle homedirs, etc.) + there is better variant:
+ ++
- -+ + RewriteEngine on -RewriteCond %{REQUEST_URI} !-U -RewriteRule ^(.+) http://webserverB.dom/$1 --This uses the URL look-ahead feature of mod_rewrite. The result is that this -will work for all types of URLs and is a safe way. But it does a performance -impact on the webserver, because for every request there is one more internal -subrequest. So, if your webserver runs on a powerful CPU, use this one. If it -is a slow machine, use the first approach or better a ErrorDocument -CGI-script. - -
-
Extended Redirection
-- -
-
+ +- Description: -
- -Sometimes we need more control (concerning the character escaping mechanism) -of URLs on redirects. Usually the Apache kernels URL escape function also -escapes anchors, i.e. URLs like "url#anchor". You cannot use this directly on -redirects with mod_rewrite because the uri_escape() function of Apache would -also escape the hash character. How can we redirect to such a URL? - -
-
- Solution: -
- -We have to use a kludge by the use of a NPH-CGI script which does the redirect -itself. Because here no escaping is done (NPH=non-parseable headers). First -we introduce a new URL scheme
+xredirect:
by the following per-server -config-line (should be one of the last rewrite rules): - -+ +
+ + +RewriteCond %{REQUEST_URI} !-U +RewriteRule ^(.+) http://webserverB.dom/$1 ++This uses the URL look-ahead feature of mod_rewrite. + The result is that this will work for all types of URLs + and is a safe way. But it does a performance impact on + the webserver, because for every request there is one + more internal subrequest. So, if your webserver runs on a + powerful CPU, use this one. If it is a slow machine, use + the first approach or better a ErrorDocument + CGI-script.
+Extended Redirection
+ ++
- -- Description:
+ +- Sometimes we need more control (concerning the + character escaping mechanism) of URLs on redirects. Usually + the Apache kernels URL escape function also escapes + anchors, i.e. URLs like "url#anchor". You cannot use this + directly on redirects with mod_rewrite because the + uri_escape() function of Apache would also escape the hash + character. How can we redirect to such a URL?
+ +- Solution:
+ +- + We have to use a kludge by the use of a NPH-CGI script + which does the redirect itself. Because here no escaping + is done (NPH=non-parseable headers). First we introduce a + new URL scheme
xredirect:
by the following + per-server config-line (should be one of the last rewrite + rules): + ++
- -+ + RewriteRule ^xredirect:(.+) /path/to/nph-xredirect.cgi/$1 \ [T=application/x-httpd-cgi,L] --This forces all URLs prefixed with
xredirect:
to be piped through the -nph-xredirect.cgi
program. And this program just looks like: - -+ +
+ + -++This forces all URLs prefixed with +
+ +xredirect:
to be piped through the +nph-xredirect.cgi
program. And this program + just looks like:+
- -+ + #!/path/to/perl ## ## nph-xredirect.cgi -- NPH/CGI script for extended redirects @@ -697,55 +851,79 @@ print "</body>\n"; print "</html>\n"; ##EOF## ---This provides you with the functionality to do redirects to all URL schemes, -i.e. including the one which are not directly accepted by mod_rewrite. For -instance you can now also redirect to
news:newsgroup
via - -+ +
+ + ++This provides you with the functionality to do + redirects to all URL schemes, i.e. including the one + which are not directly accepted by mod_rewrite. For + instance you can now also redirect to +
+ +news:newsgroup
via+
- -+ + RewriteRule ^anyurl xredirect:news:newsgroup --Notice: You have not to put [R] or [R,L] to the above rule because the -
xredirect:
need to be expanded later by our special "pipe through" -rule above. - --
Archive Access Multiplexer
-- -
-
+ +- Description: -
- -Do you know the great CPAN (Comprehensive Perl Archive Network) under http://www.perl.com/CPAN? This does a -redirect to one of several FTP servers around the world which carry a CPAN -mirror and is approximately near the location of the requesting client. -Actually this can be called an FTP access multiplexing service. While CPAN -runs via CGI scripts, how can a similar approach implemented via mod_rewrite? - -
-
- Solution: -
- -First we notice that from version 3.0.0 mod_rewrite can also use the "ftp:" -scheme on redirects. And second, the location approximation can be done by a -rewritemap over the top-level domain of the client. With a tricky chained -ruleset we can use this top-level domain as a key to our multiplexing map. - -
++ +
+ + ++Notice: You have not to put [R] or [R,L] to the above + rule because the
+xredirect:
need to be + expanded later by our special "pipe through" rule + above.Archive Access Multiplexer
+ ++
- -- Description:
+ +- Do you know the great CPAN (Comprehensive Perl Archive + Network) under http://www.perl.com/CPAN? + This does a redirect to one of several FTP servers around + the world which carry a CPAN mirror and is approximately + near the location of the requesting client. Actually this + can be called an FTP access multiplexing service. While + CPAN runs via CGI scripts, how can a similar approach + implemented via mod_rewrite?
+ +- Solution:
+ +- + First we notice that from version 3.0.0 mod_rewrite can + also use the "ftp:" scheme on redirects. And second, the + location approximation can be done by a rewritemap over + the top-level domain of the client. With a tricky chained + ruleset we can use this top-level domain as a key to our + multiplexing map. + +
+
- -+ + RewriteEngine on RewriteMap multiplex txt:/path/to/map.cxan RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C] -RewriteRule ^.+\.([a-zA-Z]+)::(.*)$ ${multiplex:$1|ftp.default.dom}$2 [R,L] -+ +
+ + +RewriteRule ^.+\.([a-zA-Z]+)::(.*)$ ${multiplex:$1|ftp.default.dom}$2 [R,L] +++
- -+ + ## ## map.cxan -- Multiplexing Map for CxAN ## @@ -755,62 +933,77 @@ uk ftp://ftp.cxan.uk/CxAN/ com ftp://ftp.cxan.com/CxAN/ : ##EOF## --
Time-Dependend Rewriting
-- -
-
+ +- Description: -
- -When tricks like time-dependend content should happen a lot of webmasters -still use CGI scripts which do for instance redirects to specialized pages. -How can it be done via mod_rewrite? - -
-
- Solution: -
- -There are a lot of variables named
+TIME_xxx
for rewrite conditions. -In conjunction with the special lexicographic comparison patterns <STRING, ->STRING and =STRING we can do time-dependend redirects: - -+
+ + ++Time-Dependend Rewriting
+ ++
- -- Description:
+ +- When tricks like time-dependend content should happen a + lot of webmasters still use CGI scripts which do for + instance redirects to specialized pages. How can it be done + via mod_rewrite?
+ +- Solution:
+ +- + There are a lot of variables named
TIME_xxx
+ for rewrite conditions. In conjunction with the special + lexicographic comparison patterns <STRING, >STRING + and =STRING we can do time-dependend redirects: + ++
- -+ + RewriteEngine on RewriteCond %{TIME_HOUR}%{TIME_MIN} >0700 RewriteCond %{TIME_HOUR}%{TIME_MIN} <1900 RewriteRule ^foo\.html$ foo.day.html RewriteRule ^foo\.html$ foo.night.html --This provides the content of
foo.day.html
under the URL -foo.html
from 07:00-19:00 and at the remaining time the contents of -foo.night.html
. Just a nice feature for a homepage... - --
Backward Compatibility for YYYY to XXXX migration
-- -
-
+ +- Description: -
- -How can we make URLs backward compatible (still existing virtually) after -migrating document.YYYY to document.XXXX, e.g. after translating a bunch of -.html files to .phtml? - -
-
- Solution: -
- -We just rewrite the name to its basename and test for existence of the new -extension. If it exists, we take that name, else we rewrite the URL to its -original state. - -
++ +
+ + ++This provides the content of
+foo.day.html
+ under the URLfoo.html
from 07:00-19:00 and + at the remaining time the contents of +foo.night.html
. Just a nice feature for a + homepage...Backward Compatibility for YYYY to XXXX migration
+ ++
- Description:
+ +- How can we make URLs backward compatible (still + existing virtually) after migrating document.YYYY to + document.XXXX, e.g. after translating a bunch of .html + files to .phtml?
+ +- Solution:
+ +- + We just rewrite the name to its basename and test for + existence of the new extension. If it exists, we take + that name, else we rewrite the URL to its original state. + + +
+
+ ++ + # backward compatibility ruleset for # rewriting document.html to document.phtml # when and only when document.phtml exists @@ -825,237 +1018,307 @@ RewriteRule ^(.*)$ $1.phtml [S=1] # else reverse the previous basename cutout RewriteCond %{ENV:WasHTML} ^yes$ RewriteRule ^(.*)$ $1.html -Content Handling
-From Old to New (intern)
-Content Handling
++
+- Description:
--
From Old to New (intern)
-+
- Assume we have recently renamed the page +
-bar.html
tofoo.html
and now want + to provide the old URL for backward compatibility. Actually + we want that users of the old URL even not recognize that + the pages was renamed.-
-- Description: -
- -Assume we have recently renamed the page
bar.html
to -foo.html
and now want to provide the old URL for backward -compatibility. Actually we want that users of the old URL even not recognize -that the pages was renamed. +- Solution:
--
- Solution: -
- -We rewrite the old URL to the new one internally via the following rule: +
- + We rewrite the old URL to the new one internally via the + following rule: -
++
+ + ++
- - - -+ + RewriteEngine on RewriteBase /~quux/ -RewriteRule ^foo\.html$ bar.html --
From Old to New (extern)
-- -
-
+ +- Description: -
- -Assume again that we have recently renamed the page
bar.html
to -foo.html
and now want to provide the old URL for backward -compatibility. But this time we want that the users of the old URL get hinted -to the new one, i.e. their browsers Location field should change, too. - --
- Solution: -
- -We force a HTTP redirect to the new URL which leads to a change of the -browsers and thus the users view: - -
++
+ + +RewriteRule ^foo\.html$ bar.html ++From Old to New (extern)
+ ++
- -- Description:
+ +- Assume again that we have recently renamed the page +
+ +bar.html
tofoo.html
and now want + to provide the old URL for backward compatibility. But this + time we want that the users of the old URL get hinted to + the new one, i.e. their browsers Location field should + change, too.- Solution:
+ +- + We force a HTTP redirect to the new URL which leads to a + change of the browsers and thus the users view: + +
+
- -+ + RewriteEngine on RewriteBase /~quux/ -RewriteRule ^foo\.html$ bar.html [R] --
Browser Dependend Content
-- -
-
- -- Description: -
- -At least for important top-level pages it is sometimes necesarry to provide -the optimum of browser dependend content, i.e. one has to provide a maximum -version for the latest Netscape variants, a minimum version for the Lynx -browsers and a average feature version for all others. - -
-
- Solution: -
- -We cannot use content negotiation because the browsers do not provide their -type in that form. Instead we have to act on the HTTP header "User-Agent". -The following condig does the following: If the HTTP header "User-Agent" -begins with "Mozilla/3", the page
foo.html
is rewritten to -foo.NS.html
and and the rewriting stops. If the browser is "Lynx" or -"Mozilla" of version 1 or 2 the URL becomesfoo.20.html
. All other -browsers receive pagefoo.32.html
. This is done by the following -ruleset: - -- -
-RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.* -RewriteRule ^foo\.html$ foo.NS.html [L] - -RewriteCond %{HTTP_USER_AGENT} ^Lynx/.* [OR] -RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[12].* -RewriteRule ^foo\.html$ foo.20.html [L] - -RewriteRule ^foo\.html$ foo.32.html [L] --
Dynamic Mirror
-- -
-
+ +- Description: -
- -Assume there are nice webpages on remote hosts we want to bring into our -namespace. For FTP servers we would use the
mirror
program which -actually maintains an explicit up-to-date copy of the remote data on the local -machine. For a webserver we could use the programwebcopy
which acts -similar via HTTP. But both techniques have one major drawback: The local copy -is always just as up-to-date as often we run the program. It would be much -better if the mirror is not a static one we have to establish explicitly. -Instead we want a dynamic mirror with data which gets updated automatically -when there is need (updated data on the remote host). - --
- Solution: -
- -To provide this feature we map the remote webpage or even the complete remote -webarea to our namespace by the use of the Proxy Throughput feature -(flag [P]): - -
++
+ + +RewriteRule ^foo\.html$ bar.html [R] ++Browser Dependend Content
+ ++
+ +- Description:
+ +- At least for important top-level pages it is sometimes + necesarry to provide the optimum of browser dependend + content, i.e. one has to provide a maximum version for the + latest Netscape variants, a minimum version for the Lynx + browsers and a average feature version for all others.
+ +- Solution:
+ +- + We cannot use content negotiation because the browsers do + not provide their type in that form. Instead we have to + act on the HTTP header "User-Agent". The following condig + does the following: If the HTTP header "User-Agent" + begins with "Mozilla/3", the page
+foo.html
+ is rewritten tofoo.NS.html
and and the + rewriting stops. If the browser is "Lynx" or "Mozilla" of + version 1 or 2 the URL becomesfoo.20.html
. + All other browsers receive pagefoo.32.html
. + This is done by the following ruleset: + ++
++ ++ ++RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.* +RewriteRule ^foo\.html$ foo.NS.html [L] + +RewriteCond %{HTTP_USER_AGENT} ^Lynx/.* [OR] +RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[12].* +RewriteRule ^foo\.html$ foo.20.html [L] + +RewriteRule ^foo\.html$ foo.32.html [L] ++Dynamic Mirror
+ ++
- Description:
+ +- Assume there are nice webpages on remote hosts we want + to bring into our namespace. For FTP servers we would use + the
+ +mirror
program which actually maintains an + explicit up-to-date copy of the remote data on the local + machine. For a webserver we could use the program +webcopy
which acts similar via HTTP. But both + techniques have one major drawback: The local copy is + always just as up-to-date as often we run the program. It + would be much better if the mirror is not a static one we + have to establish explicitly. Instead we want a dynamic + mirror with data which gets updated automatically when + there is need (updated data on the remote host).- Solution:
+ +- + To provide this feature we map the remote webpage or even + the complete remote webarea to our namespace by the use + of the Proxy Throughput feature (flag [P]): + +
+
- -+ + RewriteEngine on RewriteBase /~quux/ -RewriteRule ^hotsheet/(.*)$ http://www.tstimpreso.com/hotsheet/$1 [P] -+ +
+ + +RewriteRule ^hotsheet/(.*)$ http://www.tstimpreso.com/hotsheet/$1 [P] +++
+RewriteRule ^usa-news\.html$ http://www.quux-corp.com/news/index.html [P] + ++ + RewriteEngine on RewriteBase /~quux/ -RewriteRule ^usa-news\.html$ http://www.quux-corp.com/news/index.html [P] -Reverse Dynamic Mirror
--
Reverse Dynamic Mirror
-+
+
+- Description:
--
+ +- Description: -
- -... +
- ...
--
- Solution: -
- +
- Solution:
-+ +
+ + +- +
+
- - - -+ + RewriteEngine on RewriteCond /mirror/of/remotesite/$1 -U RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1 --
Retrieve Missing Data from Intranet
-- -
-
+ +- Description: -
- -This is a tricky way of virtually running a corporates (external) Internet -webserver (
www.quux-corp.dom
), while actually keeping and maintaining -its data on a (internal) Intranet webserver -(www2.quux-corp.dom
) which is protected by a firewall. The -trick is that on the external webserver we retrieve the requested data -on-the-fly from the internal one. - --
- Solution: -
- -First, we have to make sure that our firewall still protects the internal -webserver and that only the external webserver is allowed to retrieve data -from it. For a packet-filtering firewall we could for instance configure a -firewall ruleset like the following: - -
+- -
-ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80 -DENY Host * Port * --> Host www2.quux-corp.dom Port 80 --Just adjust it to your actual configuration syntax. Now we can establish the -mod_rewrite rules which request the missing data in the background through the -proxy throughput feature: - -
+
+ + ++Retrieve Missing Data from Intranet
+ ++
- -- Description:
+ +- This is a tricky way of virtually running a corporates + (external) Internet webserver + (
+ +www.quux-corp.dom
), while actually keeping + and maintaining its data on a (internal) Intranet webserver + (www2.quux-corp.dom
) which is protected by a + firewall. The trick is that on the external webserver we + retrieve the requested data on-the-fly from the internal + one.- Solution:
+ +- + First, we have to make sure that our firewall still + protects the internal webserver and that only the + external webserver is allowed to retrieve data from it. + For a packet-filtering firewall we could for instance + configure a firewall ruleset like the following: + +
+
+ ++ ++ ++ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80 +DENY Host * Port * --> Host www2.quux-corp.dom Port 80 ++Just adjust it to your actual configuration syntax. + Now we can establish the mod_rewrite rules which request + the missing data in the background through the proxy + throughput feature:
+ ++
- -+ + RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2 -RewriteCond %{REQUEST_FILENAME} !-f -RewriteCond %{REQUEST_FILENAME} !-d -RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P] --
Load Balancing
-- -
-
+ +- Description: -
- -Suppose we want to load balance the traffic to
www.foo.com
over -www[0-5].foo.com
(a total of 6 servers). How can this be done? - --
- Solution: -
- -There are a lot of possible solutions for this problem. We will discuss first -a commonly known DNS-based variant and then the special one with mod_rewrite: - -
+-
- DNS Round-Robin - -
-The simplest method for load-balancing is to use the DNS round-robin feature -of BIND. Here you just configure
www[0-9].foo.com
as usual in your -DNS with A(address) records, e.g. - -+
+ + +RewriteCond %{REQUEST_FILENAME} !-f +RewriteCond %{REQUEST_FILENAME} !-d +RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P] ++Load Balancing
+ ++
- -- Description:
+ +- Suppose we want to load balance the traffic to +
+ +www.foo.com
overwww[0-5].foo.com
+ (a total of 6 servers). How can this be done?- Solution:
+ +- + There are a lot of possible solutions for this problem. + We will discuss first a commonly known DNS-based variant + and then the special one with mod_rewrite: + +
+
- -- + DNS Round-Robin + +
The simplest method for load-balancing is to use + the DNS round-robin feature of BIND. Here you just + configure
+ +www[0-9].foo.com
as usual in + your DNS with A(address) records, e.g.+
- -+ + www0 IN A 1.2.3.1 www1 IN A 1.2.3.2 www2 IN A 1.2.3.3 www3 IN A 1.2.3.4 www4 IN A 1.2.3.5 www5 IN A 1.2.3.6 --Then you additionally add the following entry: - -
+ +
+ + ++Then you additionally add the following entry:
+ ++
- -+ + www IN CNAME www0.foo.com. IN CNAME www1.foo.com. IN CNAME www2.foo.com. @@ -1063,60 +1326,89 @@ www IN CNAME www0.foo.com. IN CNAME www4.foo.com. IN CNAME www5.foo.com. IN CNAME www6.foo.com. --Notice that this seems wrong, but is actually an intended feature of BIND and -can be used in this way. However, now when
www.foo.com
gets resolved, -BIND gives outwww0-www6
- but in a slightly permutated/rotated order -every time. This way the clients are spread over the various servers. - -But notice that this not a perfect load balancing scheme, because DNS resolve -information gets cached by the other nameservers on the net, so once a client -has resolvedwww.foo.com
to a particularwwwN.foo.com
, all -subsequent requests also go to this particular namewwwN.foo.com
. But -the final result is ok, because the total sum of the requests are really -spread over the various webservers. - --
- DNS Load-Balancing - -
-A sophisticated DNS-based method for load-balancing is to use the program -
lbnamed
which can be found at http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html. -It is a Perl 5 program in conjunction with auxilliary tools which provides a -real load-balancing for DNS. - --
- Proxy Throughput Round-Robin - -
+ +-In this variant we use mod_rewrite and its proxy throughput feature. First we -dedicate
www0.foo.com
to be actuallywww.foo.com
by using a -single - -+ +
+ + ++Notice that this seems wrong, but is actually an + intended feature of BIND and can be used in this way. + However, now when
+www.foo.com
gets + resolved, BIND gives outwww0-www6
- but + in a slightly permutated/rotated order every time. + This way the clients are spread over the various + servers. But notice that this not a perfect load + balancing scheme, because DNS resolve information + gets cached by the other nameservers on the net, so + once a client has resolvedwww.foo.com
+ to a particularwwwN.foo.com
, all + subsequent requests also go to this particular name +wwwN.foo.com
. But the final result is + ok, because the total sum of the requests are really + spread over the various webservers.- + DNS Load-Balancing + +
+ +A sophisticated DNS-based method for + load-balancing is to use the program +
+lbnamed
which can be found at + http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html. + It is a Perl 5 program in conjunction with auxilliary + tools which provides a real load-balancing for + DNS.- + Proxy Throughput Round-Robin + +
In this variant we use mod_rewrite and its proxy + throughput feature. First we dedicate +
+ +www0.foo.com
to be actually +www.foo.com
by using a single+
- -+ + www IN CNAME www0.foo.com. --entry in the DNS. Then we convert
www0.foo.com
to a proxy-only -server, i.e. we configure this machine so all arriving URLs are just pushed -through the internal proxy to one of the 5 other servers (www1-www5
). -To accomplish this we first establish a ruleset which contacts a load -balancing scriptlb.pl
for all URLs. - -+ +
+ + ++entry in the DNS. Then we convert +
+ +www0.foo.com
to a proxy-only server, + i.e. we configure this machine so all arriving URLs + are just pushed through the internal proxy to one of + the 5 other servers (www1-www5
). To + accomplish this we first establish a ruleset which + contacts a load balancing scriptlb.pl
+ for all URLs.+
- -+ + RewriteEngine on RewriteMap lb prg:/path/to/lb.pl RewriteRule ^/(.+)$ ${lb:$1} [P,L] --Then we write
lb.pl
: - -+ +
+ + ++Then we write
+ +lb.pl
:+
- -+ + #!/path/to/perl ## ## lb.pl -- load balancing script @@ -1137,41 +1429,48 @@ while (<STDIN>) { } ##EOF## --A last notice: Why is this useful? Seems like
www0.foo.com
still is -overloaded? The answer is yes, it is overloaded, but with plain proxy -throughput requests, only! All SSI, CGI, ePerl, etc. processing is completely -done on the other machines. This is the essential point. - --
- Hardware/TCP Round-Robin - -
-There is a hardware solution available, too. Cisco has a beast called -LocalDirector which does a load balancing at the TCP/IP level. Actually this -is some sort of a circuit level gateway in front of a webcluster. If you have -enough money and really need a solution with high performance, use this one. - -
-
Reverse Proxy
-- -
-
+ +- Description: -
- -... - -
-
- Solution: -
- - -
++ +
+ + ++A last notice: Why is this useful? Seems like +
+ + +www0.foo.com
still is overloaded? The + answer is yes, it is overloaded, but with plain proxy + throughput requests, only! All SSI, CGI, ePerl, etc. + processing is completely done on the other machines. + This is the essential point.- + Hardware/TCP Round-Robin + +
+ +There is a hardware solution available, too. Cisco + has a beast called LocalDirector which does a load + balancing at the TCP/IP level. Actually this is some + sort of a circuit level gateway in front of a + webcluster. If you have enough money and really need + a solution with high performance, use this one.
+Reverse Proxy
+ ++
- -- Description:
+ +- ...
+ +- Solution:
+ +- +
+
- -+ + ## ## apache-rproxy.conf -- Apache configuration for Reverse Proxy Usage ## @@ -1256,9 +1555,16 @@ ProxyPassReverse / http://www3.foo.dom/ ProxyPassReverse / http://www4.foo.dom/ ProxyPassReverse / http://www5.foo.dom/ ProxyPassReverse / http://www6.foo.dom/ -+ +
+ + +++
- -+ + ## ## apache-rproxy.conf-servers -- Apache/mod_rewrite selection table ## @@ -1270,182 +1576,227 @@ static www1.foo.dom|www2.foo.dom|www3.foo.dom|www4.foo.dom # list of backend servers which serve dynamically # generated page (CGI programs or mod_perl scripts) dynamic www5.foo.dom|www6.foo.dom --
New MIME-type, New Service
-- -
-
- Description: -
- -On the net there are a lot of nifty CGI programs. But their usage is usually -boring, so a lot of webmaster don't use them. Even Apache's Action handler -feature for MIME-types is only appropriate when the CGI programs don't need -special URLs (actually PATH_INFO and QUERY_STRINGS) as their input. - -First, let us configure a new file type with extension
.scgi
-(for secure CGI) which will be processed by the popularcgiwrap
-program. The problem here is that for instance we use a Homogeneous URL Layout -(see above) a file inside the user homedirs has the URL -/u/user/foo/bar.scgi
. Butcgiwrap
needs the URL in the form -/~user/foo/bar.scgi/
. The following rule solves the problem: - -- -
-RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ... -... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3 [NS,T=application/x-http-cgi] --Or assume we have some more nifty programs: -
wwwlog
(which displays theaccess.log
for a URL subtree and -wwwidx
(which runs Glimpse on a URL subtree). We have to -provide the URL area to these programs so they know on which area -they have to act on. But usually this ugly, because they are all the -times still requested from that areas, i.e. typically we would run -theswwidx
program from within/u/user/foo/
via -hyperlink to - -++New MIME-type, New Service
+ ++
- -- Description:
+ +- + On the net there are a lot of nifty CGI programs. But + their usage is usually boring, so a lot of webmaster + don't use them. Even Apache's Action handler feature for + MIME-types is only appropriate when the CGI programs + don't need special URLs (actually PATH_INFO and + QUERY_STRINGS) as their input. First, let us configure a + new file type with extension
.scgi
(for + secure CGI) which will be processed by the popular +cgiwrap
program. The problem here is that + for instance we use a Homogeneous URL Layout (see above) + a file inside the user homedirs has the URL +/u/user/foo/bar.scgi
. But +cgiwrap
needs the URL in the form +/~user/foo/bar.scgi/
. The following rule + solves the problem: + ++
+ ++ ++ ++RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ... +... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3 [NS,T=application/x-http-cgi] ++Or assume we have some more nifty programs: +
+wwwlog
(which displays the +access.log
for a URL subtree and +wwwidx
(which runs Glimpse on a URL + subtree). We have to provide the URL area to these + programs so they know on which area they have to act on. + But usually this ugly, because they are all the times + still requested from that areas, i.e. typically we would + run theswwidx
program from within +/u/user/foo/
via hyperlink to/internal/cgi/user/swwidx?i=/u/user/foo/ -- -which is ugly. Because we have to hard-code both the location of the -area and the location of the CGI inside the hyperlink. When we have to -reorganise or area, we spend a lot of time changing the various hyperlinks. - -
-
- Solution: -
- -The solution here is to provide a special new URL format which automatically -leads to the proper CGI invocation. We configure the following: - -
+ +
+ + ++ +which is ugly. Because we have to hard-code + both the location of the area + and the location of the CGI inside the + hyperlink. When we have to reorganise or area, we spend a + lot of time changing the various hyperlinks.
+ + +- Solution:
+ +- + The solution here is to provide a special new URL format + which automatically leads to the proper CGI invocation. + We configure the following: + +
+
- -+ + RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/ RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3 --Now the hyperlink to search at
/u/user/foo/
reads only - -++Now the hyperlink to search at +
+/u/user/foo/
reads onlyHREF="*" -- -which internally gets automatically transformed to + -
+which internally gets automatically transformed to
+/internal/cgi/user/wwwidx?i=/u/user/foo/ -- -The same approach leads to an invocation for the access log CGI -program when the hyperlink
:log
gets used. - --
From Static to Dynamic
-- -
-
+ +- Description: -
- -How can we transform a static page
foo.html
into a dynamic variant -foo.cgi
in a seemless way, i.e. without notice by the browser/user. - --
- Solution: -
- -We just rewrite the URL to the CGI-script and force the correct MIME-type so -it gets really run as a CGI-script. This way a request to -
+/~quux/foo.html
internally leads to the invokation of -/~quux/foo.cgi
. - --
+ + ++ +The same approach leads to an invocation for the + access log CGI program when the hyperlink +
+ + + +:log
gets used.From Static to Dynamic
+ ++
- -- Description:
+ +- How can we transform a static page +
+ +foo.html
into a dynamic variant +foo.cgi
in a seemless way, i.e. without notice + by the browser/user.- Solution:
+ +- + We just rewrite the URL to the CGI-script and force the + correct MIME-type so it gets really run as a CGI-script. + This way a request to
/~quux/foo.html
+ internally leads to the invokation of +/~quux/foo.cgi
. + ++
- -+ + RewriteEngine on RewriteBase /~quux/ -RewriteRule ^foo\.html$ foo.cgi [T=application/x-httpd-cgi] --
On-the-fly Content-Regeneration
-- -
-
- -- Description: -
- -Here comes a really esoteric feature: Dynamically generated but statically -served pages, i.e. pages should be delivered as pure static pages (read from -the filesystem and just passed through), but they have to be generated -dynamically by the webserver if missing. This way you can have CGI-generated -pages which are statically served unless one (or a cronjob) removes the static -contents. Then the contents gets refreshed. - -
-
- Solution: -
- -This is done via the following ruleset: - -
- -
-RewriteCond %{REQUEST_FILENAME} !-s -RewriteRule ^page\.html$ page.cgi [T=application/x-httpd-cgi,L] --Here a request to
page.html
leads to a internal run of a -correspondingpage.cgi
ifpage.html
is still missing or has -filesize null. The trick here is thatpage.cgi
is a usual CGI script -which (additionally to its STDOUT) writes its output to the file -page.html
. Once it was run, the server sends out the data of -page.html
. When the webmaster wants to force a refresh the contents, -he just removespage.html
(usually done by a cronjob). - --
Document With Autorefresh
-- -
-
+ +- Description: -
- -Wouldn't it be nice while creating a complex webpage if the webbrowser would -automatically refresh the page every time we write a new version from within -our editor? Impossible? - -
-
- Solution: -
- -No! We just combine the MIME multipart feature, the webserver NPH feature and -the URL manipulation power of mod_rewrite. First, we establish a new URL -feature: Adding just
+:refresh
to any URL causes this to be refreshed -every time it gets updated on the filesystem. - -+
+ + +RewriteRule ^foo\.html$ foo.cgi [T=application/x-httpd-cgi] ++On-the-fly Content-Regeneration
+ ++
+ +- Description:
+ +- Here comes a really esoteric feature: Dynamically + generated but statically served pages, i.e. pages should be + delivered as pure static pages (read from the filesystem + and just passed through), but they have to be generated + dynamically by the webserver if missing. This way you can + have CGI-generated pages which are statically served unless + one (or a cronjob) removes the static contents. Then the + contents gets refreshed.
+ +- Solution:
+ +- + This is done via the following ruleset: + +
++
+ ++ ++ ++RewriteCond %{REQUEST_FILENAME} !-s +RewriteRule ^page\.html$ page.cgi [T=application/x-httpd-cgi,L] ++Here a request to
+page.html
leads to a + internal run of a correspondingpage.cgi
if +page.html
is still missing or has filesize + null. The trick here is thatpage.cgi
is a + usual CGI script which (additionally to its STDOUT) + writes its output to the filepage.html
. + Once it was run, the server sends out the data of +page.html
. When the webmaster wants to force + a refresh the contents, he just removes +page.html
(usually done by a cronjob).Document With Autorefresh
+ ++
- Description:
+ +- Wouldn't it be nice while creating a complex webpage if + the webbrowser would automatically refresh the page every + time we write a new version from within our editor? + Impossible?
+ +- Solution:
+ +- + No! We just combine the MIME multipart feature, the + webserver NPH feature and the URL manipulation power of + mod_rewrite. First, we establish a new URL feature: + Adding just
:refresh
to any URL causes this + to be refreshed every time it gets updated on the + filesystem. + ++
+ ++ + RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1 --Now when we reference the URL - -
+-Now when we reference the URL
+/u/foo/bar/page.html:refresh -- -this leads to the internal invocation of the URL +
+-The only missing part is the NPH-CGI script. Although one would usually say -"left as an exercise to the reader" ;-) I will provide this, too. - -this leads to the internal invocation of the URL
+/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html -+
++The only missing part is the NPH-CGI script. Although + one would usually say "left as an exercise to the reader" + ;-) I will provide this, too.
+#!/sw/bin/perl ## ## nph-refresh -- NPH/CGI script for auto refreshing pages @@ -1547,29 +1898,33 @@ for ($n = 0; $n < $QS_n; $n++) { exit(0); ##EOF## -+Mass Virtual Hosting
-+
+- Description:
--
Mass Virtual Hosting
-+
- The
-<VirtualHost>
feature of Apache + is nice and works great when you just have a few dozens + virtual hosts. But when you are an ISP and have hundreds of + virtual hosts to provide this feature is not the best + choice.-
-- Description: -
- -The
<VirtualHost>
feature of Apache is nice and works great -when you just have a few dozens virtual hosts. But when you are an ISP and -have hundreds of virtual hosts to provide this feature is not the best choice. +- Solution:
--
- Solution: -
- -To provide this feature we map the remote webpage or even the complete remote -webarea to our namespace by the use of the Proxy Throughput feature -(flag [P]): +
- + To provide this feature we map the remote webpage or even + the complete remote webarea to our namespace by the use + of the Proxy Throughput feature (flag [P]): -
++
+ + ++
- -+ + ## ## vhost.map ## @@ -1577,9 +1932,16 @@ www.vhost1.dom:80 /path/to/docroot/vhost1 www.vhost2.dom:80 /path/to/docroot/vhost2 : www.vhostN.dom:80 /path/to/docroot/vhostN -+ +
+ + +++
- - - -+ + ## ## httpd.conf ## @@ -1627,101 +1989,135 @@ RewriteCond ${vhost:%1} ^(/.*)$ # and remember the virtual host for logging puposes RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}] : -Access Restriction
- --
Blocking of Robots
-- -
-
- -- Description: -
- -How can we block a really annoying robot from retrieving pages of a specific -webarea? A
/robots.txt
file containing entries of the "Robot -Exclusion Protocol" is typically not enough to get rid of such a robot. - --
- Solution: -
- -We use a ruleset which forbids the URLs of the webarea -
/~quux/foo/arc/
(perhaps a very deep directory indexed area where the -robot traversal would create big server load). We have to make sure that we -forbid access only to the particular robot, i.e. just forbidding the host -where the robot runs is not enough. This would block users from this host, -too. We accomplish this by also matching the User-Agent HTTP header -information. - -- -
-RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.* -RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$ -RewriteRule ^/~quux/foo/arc/.+ - [F] --
Blocked Inline-Images
-- -
-
+ +- Description: -
- -Assume we have under http://www.quux-corp.de/~quux/ some pages with inlined -GIF graphics. These graphics are nice, so others directly incorporate them via -hyperlinks to their pages. We don't like this practice because it adds useless -traffic to our server. - -
-
- Solution: -
- -While we cannot 100% protect the images from inclusion, we -can at least restrict the cases where the browser sends -a HTTP Referer header. - -
++
+ + -RewriteCond %{HTTP_REFERER} !^$ ++Access Restriction
+ +Blocking of Robots
+ ++
+ +- Description:
+ +- How can we block a really annoying robot from + retrieving pages of a specific webarea? A +
+ +/robots.txt
file containing entries of the + "Robot Exclusion Protocol" is typically not enough to get + rid of such a robot.- Solution:
+ +- + We use a ruleset which forbids the URLs of the webarea +
+/~quux/foo/arc/
(perhaps a very deep + directory indexed area where the robot traversal would + create big server load). We have to make sure that we + forbid access only to the particular robot, i.e. just + forbidding the host where the robot runs is not enough. + This would block users from this host, too. We accomplish + this by also matching the User-Agent HTTP header + information. + ++
++ ++ ++RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.* +RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$ +RewriteRule ^/~quux/foo/arc/.+ - [F] ++Blocked Inline-Images
+ ++
- Description:
+ +- Assume we have under http://www.quux-corp.de/~quux/ + some pages with inlined GIF graphics. These graphics are + nice, so others directly incorporate them via hyperlinks to + their pages. We don't like this practice because it adds + useless traffic to our server.
+ +- Solution:
+ +- + While we cannot 100% protect the images from inclusion, + we can at least restrict the cases where the browser + sends a HTTP Referer header. + +
+
- -+ + +RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC] -RewriteRule .*\.gif$ - [F] -+ +
+ + +RewriteRule .*\.gif$ - [F] +++
+RewriteRule ^inlined-in-foo\.gif$ - [F] + ++ + RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !.*/foo-with-gif\.html$ -RewriteRule ^inlined-in-foo\.gif$ - [F] -Host Deny
--
Host Deny
-+
+
+- Description:
--
-- Description: -
- -How can we forbid a list of externally configured hosts from using our server? +
- How can we forbid a list of externally configured hosts + from using our server?
--
- Solution: -
- +
- Solution:
-For Apache >= 1.3b6: +- + For Apache >= 1.3b6: -
++
+ + ++
+ + RewriteEngine on RewriteMap hosts-deny txt:/path/to/hosts.deny RewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR] RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND RewriteRule ^/.* - [F] -- -For Apache <= 1.3b6: - -
+ +
+ + ++For Apache <= 1.3b6:
+ ++
- -+ + RewriteEngine on RewriteMap hosts-deny txt:/path/to/hosts.deny RewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1 @@ -1729,9 +2125,16 @@ RewriteRule !^NOT-FOUND/.* - [F] RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1 RewriteRule !^NOT-FOUND/.* - [F] RewriteRule ^NOT-FOUND/(.*)$ /$1 -+ +
+ + +++
- - - -+ + ## ## hosts.deny ## @@ -1743,84 +2146,110 @@ RewriteRule ^NOT-FOUND/(.*)$ /$1 193.102.180.41 - bsdti1.sdm.de - 192.76.162.40 - --
Proxy Deny
-- -
-
+ +- Description: -
- -How can we forbid a certain host or even a user of a special host from using -the Apache proxy? - -
-
- Solution: -
- -We first have to make sure mod_rewrite is below(!) mod_proxy in the -
+Configuration
file when compiling the Apache webserver. This way it -gets called _before_ mod_proxy. Then we configure the following for a -host-dependend deny... - -+
+ + -RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$ ++Proxy Deny
+ ++
- -- Description:
+ +- How can we forbid a certain host or even a user of a + special host from using the Apache proxy?
+ +- Solution:
+ +- + We first have to make sure mod_rewrite is below(!) + mod_proxy in the
Configuration
file when + compiling the Apache webserver. This way it gets called + _before_ mod_proxy. Then we configure the following for a + host-dependend deny... + ++
- -+ + +RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$ RewriteRule !^http://[^/.]\.mydomain.com.* - [F] -...and this one for a user@host-dependend deny: - -
+ +
+ + -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} ^badguy@badhost\.mydomain\.com$ ++...and this one for a user@host-dependend deny:
+ ++
- -+ + +RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} ^badguy@badhost\.mydomain\.com$ RewriteRule !^http://[^/.]\.mydomain.com.* - [F] --
Special Authentication Variant
-- -
-
+ +- Description: -
- -Sometimes a very special authentication is needed, for instance a -authentication which checks for a set of explicitly configured users. Only -these should receive access and without explicit prompting (which would occur -when using the Basic Auth via mod_access). - -
-
- Solution: -
- -We use a list of rewrite conditions to exclude all except our friends: - -
++
+ + -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend1@client1.quux-corp\.com$ -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend2@client2.quux-corp\.com$ -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend3@client3.quux-corp\.com$ ++Special Authentication Variant
+ ++
- Description:
+ +- Sometimes a very special authentication is needed, for + instance a authentication which checks for a set of + explicitly configured users. Only these should receive + access and without explicit prompting (which would occur + when using the Basic Auth via mod_access).
+ +- Solution:
+ +- + We use a list of rewrite conditions to exclude all except + our friends: + +
+
+ ++ + +RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend1@client1.quux-corp\.com$ +RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend2@client2.quux-corp\.com$ +RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend3@client3.quux-corp\.com$ RewriteRule ^/~quux/only-for-friends/ - [F] -Referer-based Deflector
--
Referer-based Deflector
-+
+
- Description:
--
+ +- Description: -
- -How can we program a flexible URL Deflector which acts on the "Referer" HTTP -header and can be configured with as many referring pages as we like? +
- How can we program a flexible URL Deflector which acts + on the "Referer" HTTP header and can be configured with as + many referring pages as we like?
--
- Solution: -
- -Use the following really tricky ruleset... +
- Solution:
-+ +
+ + +- + Use the following really tricky ruleset... + +
+
- -+ + RewriteMap deflector txt:/path/to/deflector.map RewriteCond %{HTTP_REFERER} !="" @@ -1830,12 +2259,19 @@ RewriteRule ^.* %{HTTP_REFERER} [R,L] RewriteCond %{HTTP_REFERER} !="" RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L] -... -in conjunction with a corresponding rewrite map: - -
+ +
+ + ++... in conjunction with a corresponding rewrite + map:
+ ++
- -+ + ## ## deflector.map ## @@ -1843,41 +2279,55 @@ in conjunction with a corresponding rewrite map: http://www.badguys.com/bad/index.html - http://www.badguys.com/bad/index2.html - http://www.badguys.com/bad/index3.html http://somewhere.com/ --This automatically redirects the request back to the referring page (when "-" -is used as the value in the map) or to a specific URL (when an URL is -specified in the map as the second argument). - - - -
Other
- --
External Rewriting Engine
-- -
-
+ +- Description: -
- -A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? There seems no solution -by the use of mod_rewrite... - -
-
- Solution: -
- -Use an external rewrite map, i.e. a program which acts like a rewrite map. It -is run once on startup of Apache receives the requested URLs on STDIN and has -to put the resulting (usually rewritten) URL on STDOUT (same order!). - -
++ +
+ + ++This automatically redirects the request back to the + referring page (when "-" is used as the value in the map) + or to a specific URL (when an URL is specified in the map + as the second argument).
+Other
+ +External Rewriting Engine
+ ++
+ +- Description:
+ +- A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? + There seems no solution by the use of mod_rewrite...
+ +- Solution:
+ +- + Use an external rewrite map, i.e. a program which acts + like a rewrite map. It is run once on startup of Apache + receives the requested URLs on STDIN and has to put the + resulting (usually rewritten) URL on STDOUT (same + order!). + +
+
- -+ + RewriteEngine on -RewriteMap quux-map prg:/path/to/map.quux.pl -RewriteRule ^/~quux/(.*)$ /~quux/${quux-map:$1} -+ +
+ + +RewriteMap quux-map prg:/path/to/map.quux.pl +RewriteRule ^/~quux/(.*)$ /~quux/${quux-map:$1} +++
- -+ + #!/path/to/perl # disable buffered I/O which would lead @@ -1890,17 +2340,21 @@ while (<>) { s|^foo/|bar/|; print $_; } --This is a demonstration-only example and just rewrites all URLs -
/~quux/foo/...
to/~quux/bar/...
. Actually you can program -whatever you like. But notice that while such maps can be used also by -an average user, only the system administrator can define it. - -This is a demonstration-only example and just rewrites + all URLs
+ +/~quux/foo/...
to +/~quux/bar/...
. Actually you can program + whatever you like. But notice that while such maps can be + used also by an average user, only the + system administrator can define it.
Some hints and tips on security issues in setting up a web server. Some of -the suggestions will be general, others specific to Apache. - -
In typical operation, Apache is started by the root
-user, and it switches to the user defined by the User directive to serve hits.
-As is the case with any command that root executes, you must take care
-that it is protected from modification by non-root users. Not only
-must the files themselves be writeable only by root, but so must the
-directories, and parents of all directories. For example, if you
-choose to place ServerRoot in /usr/local/apache
then it is
-suggested that you create that directory as root, with commands
-like these:
-
-
+ ++ + + + + + +Apache HTTP Server: Security Tips + + + + + + +Security Tips for Server Configuration
+
+ +Some hints and tips on security issues in setting up a web + server. Some of the suggestions will be general, others + specific to Apache.
+
+ +Permissions on + ServerRoot Directories
+ +In typical operation, Apache is started by the root user, + and it switches to the user defined by the User + directive to serve hits. As is the case with any command that + root executes, you must take care that it is protected from + modification by non-root users. Not only must the files + themselves be writeable only by root, but so must the + directories, and parents of all directories. For example, if + you choose to place ServerRoot in +
+ +/usr/local/apache
then it is suggested that you + create that directory as root, with commands like these:+- -It is assumed that /, /usr, and /usr/local are only modifiable by root. -When you install the httpd executable, you should ensure that it is -similarly protected: - -mkdir /usr/local/apache cd /usr/local/apache mkdir bin conf logs chown 0 . bin conf logs chgrp 0 . bin conf logs chmod 755 . bin conf logs -+ It is assumed that /, /usr, and /usr/local are only modifiable + by root. When you install the httpd executable, you should + ensure that it is similarly protected: + ++++- -cp httpd /usr/local/apache/bin chown 0 /usr/local/apache/bin/httpd chgrp 0 /usr/local/apache/bin/httpd chmod 511 /usr/local/apache/bin/httpd -You can create an htdocs subdirectory which is modifiable by other -users -- since root never executes any files out of there, and shouldn't -be creating files in there. - -
If you allow non-root users to modify any files that root either -executes or writes on then you open your system to root compromises. -For example, someone could replace the httpd binary so that the next -time you start it, it will execute some arbitrary code. If the logs -directory is writeable (by a non-root user), someone -could replace a log file with a symlink to some other system file, -and then root might overwrite that file with arbitrary data. If the -log files themselves are writeable (by a non-root user), then someone -may be able to overwrite the log itself with bogus data. -
-
-Server Side Includes
-Server side includes (SSI) can be configured so that users can execute -arbitrary programs on the server. That thought alone should send a shiver -down the spine of any sys-admin.
- -One solution is to disable that part of SSI. To do that you use the -IncludesNOEXEC option to the Options -directive.
- -
- -Non Script Aliased CGI
-Allowing users to execute CGI scripts in any directory -should only -be considered if; -
-
- You trust your users not to write scripts which will deliberately or -accidentally expose your system to an attack. -
- You consider security at your site to be so feeble in other areas, as to -make one more potential hole irrelevant. -
- You have no users, and nobody ever visits your server. -
-
- -Script Alias'ed CGI
-Limiting CGI to special directories gives the admin -control over -what goes into those directories. This is inevitably more secure than -non script aliased CGI, but only if users with write access to the -directories are trusted or the admin is willing to test each new CGI -script/program for potential security holes.
- -Most sites choose this option over the non script aliased CGI approach.
- -
-CGI in general
-Always remember that you must trust the writers of the CGI script/programs -or your ability to spot potential security holes in CGI, whether they were -deliberate or accidental.
- -All the CGI scripts will run as the same user, so they have potential to -conflict (accidentally or deliberately) with other scripts e.g. -User A hates User B, so he writes a script to trash User B's CGI -database. One program which can be used to allow scripts to run -as different users is suEXEC which is -included with Apache as of 1.2 and is called from special hooks in -the Apache server code. Another popular way of doing this is with -CGIWrap.
- -
- - -Stopping users overriding system wide settings...
-To run a really tight ship, you'll want to stop users from setting -up
.htaccess
files which can override security features -you've configured. Here's one way to do it...- -In the server configuration file, put -
- -Then setup for specific directories-<Directory />
-AllowOverride None
-Options None
-Allow from all
-</Directory>
-- -This stops all overrides, Includes and accesses in all directories apart -from those named.
-
-- Protect server files by default -
--One aspect of Apache which is occasionally misunderstood is the feature -of default access. That is, unless you take steps to change it, if the -server can find its way to a file through normal URL mapping rules, it -can serve it to clients. -
--For instance, consider the following example: -
--
-- # cd /; ln -s / public_html -
-- Accessing http://localhost/~root/ -
--This would allow clients to walk through the entire filesystem. To work -around this, add the following block to your server's configuration: -
-++
You can create an htdocs subdirectory which is modifiable by + other users -- since root never executes any files out of + there, and shouldn't be creating files in there.
+ +If you allow non-root users to modify any files that root + either executes or writes on then you open your system to root + compromises. For example, someone could replace the httpd + binary so that the next time you start it, it will execute some + arbitrary code. If the logs directory is writeable (by a + non-root user), someone could replace a log file with a symlink + to some other system file, and then root might overwrite that + file with arbitrary data. If the log files themselves are + writeable (by a non-root user), then someone may be able to + overwrite the log itself with bogus data.
+Server side includes (SSI) can be configured so that users + can execute arbitrary programs on the server. That thought + alone should send a shiver down the spine of any sys-admin.
+ +One solution is to disable that part of SSI. To do that you + use the IncludesNOEXEC option to the Options directive.
+Allowing users to execute CGI scripts in + any directory should only be considered if;
+ +Limiting CGI to special directories gives + the admin control over what goes into those directories. This + is inevitably more secure than non script aliased CGI, but + only if users with write access to the directories are + trusted or the admin is willing to test each new CGI + script/program for potential security holes.
+ +Most sites choose this option over the non script aliased + CGI approach.
+Always remember that you must trust the writers of the CGI + script/programs or your ability to spot potential security + holes in CGI, whether they were deliberate or accidental.
+ +All the CGI scripts will run as the same user, so they have + potential to conflict (accidentally or deliberately) with other + scripts e.g. User A hates User B, so he writes a + script to trash User B's CGI database. One program which can be + used to allow scripts to run as different users is suEXEC which is included with Apache + as of 1.2 and is called from special hooks in the Apache server + code. Another popular way of doing this is with CGIWrap.
+To run a really tight ship, you'll want to stop users from
+ setting up .htaccess
files which can override
+ security features you've configured. Here's one way to do
+ it...
In the server configuration file, put
+ +
+ <Directory />
+ AllowOverride None
+ Options None
+ Allow from all
+ </Directory>
+
+
+ Then setup for specific directories
+
+ This stops all overrides, Includes and accesses in all + directories apart from those named.
+One aspect of Apache which is occasionally misunderstood is + the feature of default access. That is, unless you take steps + to change it, if the server can find its way to a file through + normal URL mapping rules, it can serve it to clients.
+ +For instance, consider the following example:
+ +This would allow clients to walk through the entire + filesystem. To work around this, add the following block to + your server's configuration:
+<Directory /> Order Deny,Allow Deny from all </Directory> --
-This will forbid default access to filesystem locations. Add -appropriate -<Directory> -blocks to allow access only -in those areas you wish. For example, -
-++ +
This will forbid default access to filesystem locations. Add + appropriate <Directory> + blocks to allow access only in those areas you wish. For + example,
+<Directory /usr/users/*/public_html> Order Deny,Allow Allow from all @@ -186,46 +197,39 @@ in those areas you wish. For example, Order Deny,Allow Allow from all </Directory> --
-Pay particular attention to the interactions of -<Location> -and -<Directory> -directives; for instance, even if <Directory /> -denies access, a <Location /> directive might -overturn it. -
--Also be wary of playing games with the -UserDir -directive; setting it to something like "./" -would have the same effect, for root, as the first example above. -If you are using Apache 1.3 or above, we strongly recommend that you -include the following line in your server configuration files: -
-Please send any other useful security tips to The Apache Group -by filling out a -problem report. -If you are confident you have found a security bug in the Apache -source code itself, please let us -know. - -
- - - - + + +
Pay particular attention to the interactions of <Location> + and <Directory> + directives; for instance, even if <Directory + /> denies access, a <Location /> + directive might overturn it.
+ +Also be wary of playing games with the UserDir directive; + setting it to something like "./" would have the + same effect, for root, as the first example above. If you are + using Apache 1.3 or above, we strongly recommend that you + include the following line in your server configuration + files:
+ +Please send any other useful security tips to The Apache + Group by filling out a problem + report. If you are confident you have found a security bug + in the Apache source code itself, please let us + know.
+ ++
+ + + diff --git a/docs/manual/misc/tutorials.html b/docs/manual/misc/tutorials.html index 90bcdb2d15..c31d614bdd 100644 --- a/docs/manual/misc/tutorials.html +++ b/docs/manual/misc/tutorials.html @@ -1,209 +1,213 @@ - - - -+ Warning: This document has not been updated + to take into account changes made in the 2.0 version of the + Apache HTTP Server. Some of the information may still be + relevant, but please use it with care. ++ +
The following documents give you step-by-step instructions + on how to accomplish common tasks with the Apache http server. + Many of these documents are located at external sites and are + not the work of the Apache Software Foundation. Copyright to + documents on external sites is owned by the authors or their + assignees. Please consult the official Apache + Server documentation to verify what you read on external + sites.
+ +If you have a pointer to a an accurate and well-written + tutorial not included here, please let us know by submitting it + to the Apache Bug + Database. +
+ + - - - - -Warning: -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -- - -
The following documents give you step-by-step instructions on how -to accomplish common tasks with the Apache http server. Many of these -documents are located at external sites and are not the work of the -Apache Software Foundation. Copyright to documents on external sites -is owned by the authors or their assignees. Please consult the official Apache Server documentation to verify what you -read on external sites. - - -
If you have a pointer to a an accurate and well-written tutorial -not included here, please let us know by submitting it to the -Apache Bug Database. - - - - diff --git a/docs/manual/mod/directives.html b/docs/manual/mod/directives.html index f9d49da12f..81e719ecb9 100644 --- a/docs/manual/mod/directives.html +++ b/docs/manual/mod/directives.html @@ -57,7 +57,8 @@
-Below is a list of all of the modules that come as part of the Apache -distribution. See also the list of modules sorted alphabetically and the complete -alphabetical list of all Apache -directives. -
- -Below is a list of all of the modules that come as part of + the Apache distribution. See also the list of modules sorted alphabetically and the complete + alphabetical list of all Apache + directives.
+ +-Below is a list of all of the modules that come as part of the Apache -distribution. See also the list of modules sorted by type and the complete -alphabetical list of all Apache -directives. - -
- -Below is a list of all of the modules that come as part of + the Apache distribution. See also the list of modules sorted by type and the complete + alphabetical list of all Apache + directives.
+ +Directive3 is described here, and so on. - -
- - --This module provides access control based on client hostname, IP -address, or other characteristics of the client request. -
- -Status: Base
-
-Source File: mod_access.c
-
-Module Identifier: access_module
-
The directives provided by mod_access are used in <Directory>, <Files>,
and <Location>
sections as
-well as .htaccess
files
-to control access to particular parts of the server. Access
-can be controlled based on the client hostname, IP address,
-or other characteristics of the client request, as captured
-in environment variables. The
-Allow
and Deny
directives are used
-to specify which clients are or are not allowed access to the
-server, while the Order
directive sets the
-default access state, and configures how the Allow
-and Deny
directives interact with each other.
Both host-based access restrictions and password-based -authentication may be implemented simultaneously. In -that case, the Satisfy directive -is used to determine how the two sets of restrictions -interact.
- -In general, access restriction directives apply to all access
-methods (GET
, PUT
, POST
, etc).
-This is the desired behavior in most cases. However, it is possible
-to restrict some methods, while leaving other methods unrestricted, by
-enclosing the directives in a <Limit> section.
See also Satisfy - and Require. - -
-
-Syntax: Allow from
- all|host|env=env-variable
- [host|env=env-variable] ...
-Context: directory, .htaccess
-Override: Limit
-Status: Base
-Module: mod_access
-
-The Allow
directive affects which hosts can access an
-area of the server. Access can be controlled by hostname, IP Address,
-IP Address range, or by other characteristics of the client
-request captured in environment variables.
The first argument to this directive is always from
.
-The subsequent arguments can take three different forms. If
-Allow from all
is specified, then all hosts are allowed
-access, subject to the configuration of the Deny
and
-Order
directives as discussed below. To allow only
-particular hosts or groups of hosts to access the server, the
-host can be specified in any of the following formats:
Allow from
-apache.org
foo.apache.org
but it will not
-match fooapache.org
. This configuration will cause the
-server to perform a reverse DNS lookup on the client IP address,
-regardless of the setting of the HostNameLookups directive.Allow from 10.1.2.3
Allow from 10.1
Allow from 10.1.0.0/255.255.0.0
Allow
-from 10.1.0.0/16
Note that the last three examples above match exactly the -same set of hosts.
- -The third format of the arguments to the Allow
-directive allows access to the server to be controlled based on the
-existence of an environment variable. When
-Allow from env=
env-variable is specified, then
-the request is allowed access if the environment variable
-env-variable exists. The server provides the ability to set
-environment variables in a flexible way based on characteristics of
-the client request using the directives provided by mod_setenvif. Therefore, this directive
-can be used to allow access based on such factors as the clients
-User-Agent
(browser type), Referer
, or other
-HTTP request header fields.
-Example: -
-+ ++ + + + + + ++ + +Apache module mod_access + + + + + + +Module mod_access
+ +This module provides access control based on client + hostname, IP address, or other characteristics of the client + request.
+ +Status: Base
+ +
+ Source File: mod_access.c
+ Module Identifier: + access_moduleSummary
+ +The directives provided by mod_access are used in
+ +<Directory>, <Files>,
and<Location>
sections + as well as.htaccess
files to + control access to particular parts of the server. Access can be + controlled based on the client hostname, IP address, or other + characteristics of the client request, as captured in environment variables. The +Allow
andDeny
directives are used to + specify which clients are or are not allowed access to the + server, while theOrder
directive sets the default + access state, and configures how theAllow
and +Deny
directives interact with each other.Both host-based access restrictions and password-based + authentication may be implemented simultaneously. In that case, + the Satisfy directive is used + to determine how the two sets of restrictions interact.
+ +In general, access restriction directives apply to all + access methods (
+ +GET
,PUT
, +POST
, etc). This is the desired behavior in most + cases. However, it is possible to restrict some methods, while + leaving other methods unrestricted, by enclosing the directives + in a <Limit> section.Directives
+ +
+ +Allow directive
+ ++ Syntax: Allow from + all|host|env=env-variable + [host|env=env-variable] ...
+ +
+ Context: directory, + .htaccess
+ Override: Limit
+ Status: Base
+ Module: mod_accessThe
+ +Allow
directive affects which hosts can + access an area of the server. Access can be controlled by + hostname, IP Address, IP Address range, or by other + characteristics of the client request captured in environment + variables.The first argument to this directive is always +
+ +from
. The subsequent arguments can take three + different forms. IfAllow from all
is specified, + then all hosts are allowed access, subject to the configuration + of theDeny
andOrder
directives as + discussed below. To allow only particular hosts or groups of + hosts to access the server, the host can be specified + in any of the following formats:+
+ +- A (partial) domain-name
+ +- Example:
+ +Allow from apache.org
+ Hosts whose names match, or end in, this string are allowed + access. Only complete components are matched, so the above + example will matchfoo.apache.org
but it will + not matchfooapache.org
. This configuration will + cause the server to perform a reverse DNS lookup on the + client IP address, regardless of the setting of the HostNameLookups + directive.- A full IP address
+ +- Example:
+ +Allow from 10.1.2.3
+ An IP address of a host allowed access- A partial IP address
+ +- Example:
+ +Allow from 10.1
+ The first 1 to 3 bytes of an IP address, for subnet + restriction.- A network/netmask pair
+ +- Example:
+ +Allow from + 10.1.0.0/255.255.0.0
+ A network a.b.c.d, and a netmask w.x.y.z. For more + fine-grained subnet restriction.- A network/nnn CIDR specification
+ +- Example:
+Allow from 10.1.0.0/16
+ Similar to the previous case, except the netmask consists of + nnn high-order 1 bits.Note that the last three examples above match exactly the + same set of hosts.
+ +The third format of the arguments to the
+ +Allow
+ directive allows access to the server to be controlled based on + the existence of an environment + variable. WhenAllow from + env=
env-variable is specified, then the request + is allowed access if the environment variable + env-variable exists. The server provides the ability + to set environment variables in a flexible way based on + characteristics of the client request using the directives + provided by mod_setenvif. + Therefore, this directive can be used to allow access based on + such factors as the clientsUser-Agent
(browser + type),Referer
, or other HTTP request header + fields.Example:
+ ++- -SetEnvIf User-Agent ^KnockKnock/2.0 let_me_in <Directory /docroot> Order Deny,Allow Deny from all Allow from env=let_me_in </Directory> -In this case, browsers with a user-agent string beginning with -KnockKnock/2.0 will be allowed access, and all others will be -denied.
--See also Deny, Order -and SetEnvIf. -
-
- -Deny directive
-- -Syntax: Deny from - all|host|env=env-variable - [host|env=env-variable] ...
- -
-Context: directory, .htaccess
-Override: Limit
-Status: Base
-Module: mod_access -This directive allows access to the server to be restricted based -on hostname, IP address, or environment variables. The arguments for -the
- -Deny
directive are identical to the arguments for the -Allow directive.See also Allow, Order -and SetEnvIf.
-
- -Order directive
-- -Syntax: Order ordering
-
-Default:Order Deny,Allow
-Context: directory, .htaccess
-Override: Limit
-Status: Base
-Module: mod_access --The
-Order
directive controls the default access state and -the order in which Allow and Deny directives are evaluated. Ordering is -one of --
- -- Deny,Allow
- The
- -Deny
directives are evaluated -before theAllow
directives. Access is allowed -by default. Any client which does not match aDeny
-directive or does match anAllow
directive will be -allowed access to the server.- Allow,Deny
- The
- -Allow
directives are -evaluated before theDeny
directives. Access is -denied by default. Any client which does not match -anAllow
directive or does match aDeny
-directive will be denied access to the server.- Mutual-failure
- Only those hosts which appear on the -
-Allow
list and do not appear on theDeny
-list are granted access. This ordering has the same effect as -Order Allow,Deny
and is deprecated in favor of that -configuration.Keywords may only be separated by a comma; no whitespace is allowed -between them. Note that in all cases every
- -Allow
-andDeny
statement is evaluated.In the following example, all hosts in the apache.org domain are -allowed access; all other hosts are denied access. -
- -- -- Order Deny,Allow
- Deny from all
- Allow from apache.org
-In the next example, all hosts in the apache.org domain are allowed -access, except for the hosts which are in the foo.apache.org -subdomain, who are denied access. All hosts not in the apache.org -domain are denied access because the default state is to deny access -to the server. -
- -- -- Order Allow,Deny
- Allow from apache.org
- Deny from foo.apache.org
-On the other hand, if the
- -Order
in the last example is -changed toDeny,Allow
, all hosts will be allowed access. -This happens because, regardless of the actual ordering of the -directives in the configuration file, theAllow from -apache.org
will be evaluated last and will override the -Deny from foo.apache.org
. All hosts not in the -apache.org
domain will also be allowed access because the -default state will change to allow.The presence of an
- -Order
directive can -affect access to a part of the server even in the absence -of accompanyingAllow
andDeny
-directives because of its effect on the default access state. -For example,- --<Directory /www>
- Order Allow,Deny
-</Directory> -will deny all access to the
- -/www
directory because -the default access state will be set to deny.The
- -Order
directive controls the order of access -directive processing only within each phase of the server's -configuration processing. This implies, for example, that an -Allow
orDeny
directive occurring -in a <Location> section will always be evaluated after -anAllow
orDeny
directive occurring -in a <Directory> section or.htaccess
file, -regardless of the setting of theOrder
directive. -For details on the merging of configuration sections, -see the documentation on How Directory, -Location and Files sections work.
In this case, browsers with a user-agent string beginning + with KnockKnock/2.0 will be allowed access, and all + others will be denied.
+ +See also Deny, Order and SetEnvIf.
+
+ Syntax: Deny from
+ all|host|env=env-variable
+ [host|env=env-variable] ...
+ Context: directory,
+ .htaccess
+ Override: Limit
+ Status: Base
+ Module: mod_access
This directive allows access to the server to be restricted
+ based on hostname, IP address, or environment variables. The
+ arguments for the Deny
directive are identical to
+ the arguments for the Allow directive.
See also Allow, Order and SetEnvIf.
+
+ Syntax: Order
+ ordering
+ Default: Order
+ Deny,Allow
+ Context: directory,
+ .htaccess
+ Override: Limit
+ Status: Base
+ Module: mod_access
The Order
directive controls the default access
+ state and the order in which Allow and Deny directives are evaluated.
+ Ordering is one of
Deny
directives are evaluated before the
+ Allow
directives. Access is allowed by default.
+ Any client which does not match a Deny
directive
+ or does match an Allow
directive will be allowed
+ access to the server.Allow
directives are evaluated before
+ the Deny
directives. Access is denied by
+ default. Any client which does not match an
+ Allow
directive or does match a
+ Deny
directive will be denied access to the
+ server.Allow
+ list and do not appear on the Deny
list are
+ granted access. This ordering has the same effect as
+ Order Allow,Deny
and is deprecated in favor of
+ that configuration.Keywords may only be separated by a comma; no whitespace is
+ allowed between them. Note that in all cases every
+ Allow
and Deny
statement is
+ evaluated.
In the following example, all hosts in the apache.org domain + are allowed access; all other hosts are denied access.
+ +
+ Order Deny,Allow
+ Deny from all
+ Allow from apache.org
+
+
+
+ In the next example, all hosts in the apache.org domain are + allowed access, except for the hosts which are in the + foo.apache.org subdomain, who are denied access. All hosts not + in the apache.org domain are denied access because the default + state is to deny access to the server.
+ +
+ Order Allow,Deny
+ Allow from apache.org
+ Deny from foo.apache.org
+
+
+
+ On the other hand, if the Order
in the last
+ example is changed to Deny,Allow
, all hosts will
+ be allowed access. This happens because, regardless of the
+ actual ordering of the directives in the configuration file,
+ the Allow from apache.org
will be evaluated last
+ and will override the Deny from foo.apache.org
.
+ All hosts not in the apache.org
domain will also
+ be allowed access because the default state will change to
+ allow.
The presence of an Order
directive can affect
+ access to a part of the server even in the absence of
+ accompanying Allow
and Deny
+ directives because of its effect on the default access state.
+ For example,
+ <Directory /www>
+ Order Allow,Deny
+ </Directory>
+
+
+ will deny all access to the /www
directory
+ because the default access state will be set to
+ deny.
The Order
directive controls the order of
+ access directive processing only within each phase of the
+ server's configuration processing. This implies, for example,
+ that an Allow
or Deny
directive
+ occurring in a <Location> section will always be
+ evaluated after an Allow
or Deny
+ directive occurring in a <Directory> section or
+ .htaccess
file, regardless of the setting of the
+ Order
directive. For details on the merging of
+ configuration sections, see the documentation on How Directory, Location and Files
+ sections work.
diff --git a/docs/manual/mod/mod_access.html b/docs/manual/mod/mod_access.html index f6cef3b3b8..938ab6ece0 100644 --- a/docs/manual/mod/mod_access.html +++ b/docs/manual/mod/mod_access.html @@ -1,346 +1,336 @@ - - -
-
- - - -
+