1 <?xml version="1.0" encoding="ISO-8859-1"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><!--
4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5 This file is generated from xml source: DO NOT EDIT
6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
8 <title>URL Rewriting Guide - Advanced topics - Apache HTTP Server</title>
9 <link href="../style/css/manual.css" rel="stylesheet" media="all" type="text/css" title="Main stylesheet" />
10 <link href="../style/css/manual-loose-100pc.css" rel="alternate stylesheet" media="all" type="text/css" title="No Sidebar - Default font size" />
11 <link href="../style/css/manual-print.css" rel="stylesheet" media="print" type="text/css" />
12 <link href="../images/favicon.ico" rel="shortcut icon" /></head>
13 <body id="manual-page"><div id="page-header">
14 <p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="../faq/">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p>
15 <p class="apache">Apache HTTP Server Version 2.3</p>
16 <img alt="" src="../images/feather.gif" /></div>
17 <div class="up"><a href="./"><img title="<-" alt="<-" src="../images/left.gif" /></a></div>
19 <a href="http://www.apache.org/">Apache</a> > <a href="http://httpd.apache.org/">HTTP Server</a> > <a href="http://httpd.apache.org/docs/">Documentation</a> > <a href="../">Version 2.3</a> > <a href="./">Rewrite</a></div><div id="page-content"><div id="preamble"><h1>URL Rewriting Guide - Advanced topics</h1>
21 <p><span>Available Languages: </span><a href="../en/rewrite/rewrite_guide_advanced.html" title="English"> en </a></p>
25 <p>This document supplements the <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
26 <a href="../mod/mod_rewrite.html">reference documentation</a>.
27 It describes how one can use Apache's <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
28 to solve typical URL-based problems with which webmasters are
29 commonly confronted. We give detailed descriptions on how to
30 solve each problem by configuring URL rewriting rulesets.</p>
32 <div class="warning">ATTENTION: Depending on your server configuration
33 it may be necessary to slightly change the examples for your
34 situation, e.g. adding the <code>[PT]</code> flag when
35 additionally using <code class="module"><a href="../mod/mod_alias.html">mod_alias</a></code> and
36 <code class="module"><a href="../mod/mod_userdir.html">mod_userdir</a></code>, etc. Or rewriting a ruleset
37 to fit in <code>.htaccess</code> context instead
38 of per-server context. Always try to understand what a
39 particular ruleset really does before you use it. This
40 avoids many problems.</div>
43 <div id="quickview"><ul id="toc"><li><img alt="" src="../images/down.gif" /> <a href="#cluster">Webcluster through Homogeneous URL Layout</a></li>
44 <li><img alt="" src="../images/down.gif" /> <a href="#structuredhomedirs">Structured Homedirs</a></li>
45 <li><img alt="" src="../images/down.gif" /> <a href="#filereorg">Filesystem Reorganization</a></li>
46 <li><img alt="" src="../images/down.gif" /> <a href="#redirect404">Redirect Failing URLs To Other Webserver</a></li>
47 <li><img alt="" src="../images/down.gif" /> <a href="#archive-access-multiplexer">Archive Access Multiplexer</a></li>
48 <li><img alt="" src="../images/down.gif" /> <a href="#browser-dependent-content">Browser Dependent Content</a></li>
49 <li><img alt="" src="../images/down.gif" /> <a href="#dynamic-mirror">Dynamic Mirror</a></li>
50 <li><img alt="" src="../images/down.gif" /> <a href="#reverse-dynamic-mirror">Reverse Dynamic Mirror</a></li>
51 <li><img alt="" src="../images/down.gif" /> <a href="#retrieve-missing-data">Retrieve Missing Data from Intranet</a></li>
52 <li><img alt="" src="../images/down.gif" /> <a href="#load-balancing">Load Balancing</a></li>
53 <li><img alt="" src="../images/down.gif" /> <a href="#new-mime-type">New MIME-type, New Service</a></li>
54 <li><img alt="" src="../images/down.gif" /> <a href="#on-the-fly-content">On-the-fly Content-Regeneration</a></li>
55 <li><img alt="" src="../images/down.gif" /> <a href="#autorefresh">Document With Autorefresh</a></li>
56 <li><img alt="" src="../images/down.gif" /> <a href="#mass-virtual-hosting">Mass Virtual Hosting</a></li>
57 <li><img alt="" src="../images/down.gif" /> <a href="#host-deny">Host Deny</a></li>
58 <li><img alt="" src="../images/down.gif" /> <a href="#proxy-deny">Proxy Deny</a></li>
59 <li><img alt="" src="../images/down.gif" /> <a href="#special-authentication">Special Authentication Variant</a></li>
60 <li><img alt="" src="../images/down.gif" /> <a href="#referer-deflector">Referer-based Deflector</a></li>
61 </ul><h3>See also</h3><ul class="seealso"><li><a href="../mod/mod_rewrite.html">Module
62 documentation</a></li><li><a href="rewrite_intro.html">mod_rewrite
63 introduction</a></li><li><a href="rewrite_guide.html">Rewrite Guide - useful
64 examples</a></li><li><a href="rewrite_tech.html">Technical details</a></li></ul></div>
65 <div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
67 <h2><a name="cluster" id="cluster">Webcluster through Homogeneous URL Layout</a></h2>
75 <p>We want to create a homogeneous and consistent URL
76 layout over all WWW servers on a Intranet webcluster, i.e.
77 all URLs (per definition server local and thus server
78 dependent!) become actually server <em>independent</em>!
79 What we want is to give the WWW namespace a consistent
80 server-independent layout: no URL should have to include
81 any physically correct target server. The cluster itself
82 should drive us automatically to the physical target
89 <p>First, the knowledge of the target servers come from
90 (distributed) external maps which contain information
91 where our users, groups and entities stay. They have the
94 <div class="example"><pre>
100 <p>We put them into files <code>map.xxx-to-host</code>.
101 Second we need to instruct all servers to redirect URLs
104 <div class="example"><pre>
112 <div class="example"><pre>
113 http://physical-host/u/user/anypath
114 http://physical-host/g/group/anypath
115 http://physical-host/e/entity/anypath
118 <p>when the URL is not locally valid to a server. The
119 following ruleset does this for us by the help of the map
120 files (assuming that server0 is a default server which
121 will be used if a user has no entry in the map):</p>
123 <div class="example"><pre>
126 RewriteMap user-to-host txt:/path/to/map.user-to-host
127 RewriteMap group-to-host txt:/path/to/map.group-to-host
128 RewriteMap entity-to-host txt:/path/to/map.entity-to-host
130 RewriteRule ^/u/<strong>([^/]+)</strong>/?(.*) http://<strong>${user-to-host:$1|server0}</strong>/u/$1/$2
131 RewriteRule ^/g/<strong>([^/]+)</strong>/?(.*) http://<strong>${group-to-host:$1|server0}</strong>/g/$1/$2
132 RewriteRule ^/e/<strong>([^/]+)</strong>/?(.*) http://<strong>${entity-to-host:$1|server0}</strong>/e/$1/$2
134 RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/
135 RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
140 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
141 <div class="section">
142 <h2><a name="structuredhomedirs" id="structuredhomedirs">Structured Homedirs</a></h2>
147 <dt>Description:</dt>
150 <p>Some sites with thousands of users usually use a
151 structured homedir layout, i.e. each homedir is in a
152 subdirectory which begins for instance with the first
153 character of the username. So, <code>/~foo/anypath</code>
154 is <code>/home/<strong>f</strong>/foo/.www/anypath</code>
155 while <code>/~bar/anypath</code> is
156 <code>/home/<strong>b</strong>/bar/.www/anypath</code>.</p>
162 <p>We use the following ruleset to expand the tilde URLs
163 into exactly the above layout.</p>
165 <div class="example"><pre>
167 RewriteRule ^/~(<strong>([a-z])</strong>[a-z0-9]+)(.*) /home/<strong>$2</strong>/$1/.www$3
172 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
173 <div class="section">
174 <h2><a name="filereorg" id="filereorg">Filesystem Reorganization</a></h2>
179 <dt>Description:</dt>
182 <p>This really is a hardcore example: a killer application
183 which heavily uses per-directory
184 <code>RewriteRules</code> to get a smooth look and feel
185 on the Web while its data structure is never touched or
186 adjusted. Background: <strong><em>net.sw</em></strong> is
187 my archive of freely available Unix software packages,
188 which I started to collect in 1992. It is both my hobby
189 and job to to this, because while I'm studying computer
190 science I have also worked for many years as a system and
191 network administrator in my spare time. Every week I need
192 some sort of software so I created a deep hierarchy of
193 directories where I stored the packages:</p>
195 <div class="example"><pre>
196 drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/
197 drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/
198 drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/
199 drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/
200 drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/
201 drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/
202 drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/
203 drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/
204 drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/
205 drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/
206 drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/
207 drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/
208 drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/
209 drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/
210 drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/
211 drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/
214 <p>In July 1996 I decided to make this archive public to
215 the world via a nice Web interface. "Nice" means that I
216 wanted to offer an interface where you can browse
217 directly through the archive hierarchy. And "nice" means
218 that I didn't wanted to change anything inside this
219 hierarchy - not even by putting some CGI scripts at the
220 top of it. Why? Because the above structure should be
221 later accessible via FTP as well, and I didn't want any
222 Web or CGI stuff to be there.</p>
228 <p>The solution has two parts: The first is a set of CGI
229 scripts which create all the pages at all directory
230 levels on-the-fly. I put them under
231 <code>/e/netsw/.www/</code> as follows:</p>
233 <div class="example"><pre>
234 -rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl
235 drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/
236 -rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE
237 -rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO
238 -rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html
239 -rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl
240 -rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi
241 -rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi
242 drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/
243 -rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi
244 -rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi
245 -rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi
246 -rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst
249 <p>The <code>DATA/</code> subdirectory holds the above
250 directory structure, i.e. the real
251 <strong><em>net.sw</em></strong> stuff and gets
252 automatically updated via <code>rdist</code> from time to
253 time. The second part of the problem remains: how to link
254 these two structures together into one smooth-looking URL
255 tree? We want to hide the <code>DATA/</code> directory
256 from the user while running the appropriate CGI scripts
257 for the various URLs. Here is the solution: first I put
258 the following into the per-directory configuration file
259 in the <code class="directive"><a href="../mod/core.html#documentroot">DocumentRoot</a></code>
260 of the server to rewrite the announced URL
261 <code>/net.sw/</code> to the internal path
262 <code>/e/netsw</code>:</p>
264 <div class="example"><pre>
265 RewriteRule ^net.sw$ net.sw/ [R]
266 RewriteRule ^net.sw/(.*)$ e/netsw/$1
269 <p>The first rule is for requests which miss the trailing
270 slash! The second rule does the real thing. And then
271 comes the killer configuration which stays in the
272 per-directory config file
273 <code>/e/netsw/.www/.wwwacl</code>:</p>
275 <div class="example"><pre>
276 Options ExecCGI FollowSymLinks Includes MultiViews
280 # we are reached via /net.sw/ prefix
283 # first we rewrite the root dir to
284 # the handling cgi script
285 RewriteRule ^$ netsw-home.cgi [L]
286 RewriteRule ^index\.html$ netsw-home.cgi [L]
288 # strip out the subdirs when
289 # the browser requests us from perdir pages
290 RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L]
292 # and now break the rewriting for local files
293 RewriteRule ^netsw-home\.cgi.* - [L]
294 RewriteRule ^netsw-changes\.cgi.* - [L]
295 RewriteRule ^netsw-search\.cgi.* - [L]
296 RewriteRule ^netsw-tree\.cgi$ - [L]
297 RewriteRule ^netsw-about\.html$ - [L]
298 RewriteRule ^netsw-img/.*$ - [L]
300 # anything else is a subdir which gets handled
301 # by another cgi script
302 RewriteRule !^netsw-lsdir\.cgi.* - [C]
303 RewriteRule (.*) netsw-lsdir.cgi/$1
306 <p>Some hints for interpretation:</p>
309 <li>Notice the <code>L</code> (last) flag and no
310 substitution field ('<code>-</code>') in the forth part</li>
312 <li>Notice the <code>!</code> (not) character and
313 the <code>C</code> (chain) flag at the first rule
314 in the last part</li>
316 <li>Notice the catch-all pattern in the last rule</li>
321 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
322 <div class="section">
323 <h2><a name="redirect404" id="redirect404">Redirect Failing URLs To Other Webserver</a></h2>
328 <dt>Description:</dt>
331 <p>A typical FAQ about URL rewriting is how to redirect
332 failing requests on webserver A to webserver B. Usually
333 this is done via <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code> CGI-scripts in Perl, but
334 there is also a <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> solution.
335 But notice that this performs more poorly than using an
336 <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code>
343 <p>The first solution has the best performance but less
344 flexibility, and is less error safe:</p>
346 <div class="example"><pre>
348 RewriteCond /your/docroot/%{REQUEST_FILENAME} <strong>!-f</strong>
349 RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1
352 <p>The problem here is that this will only work for pages
353 inside the <code class="directive"><a href="../mod/core.html#documentroot">DocumentRoot</a></code>. While you can add more
354 Conditions (for instance to also handle homedirs, etc.)
355 there is better variant:</p>
357 <div class="example"><pre>
359 RewriteCond %{REQUEST_URI} <strong>!-U</strong>
360 RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1
363 <p>This uses the URL look-ahead feature of <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>.
364 The result is that this will work for all types of URLs
365 and is a safe way. But it does a performance impact on
366 the webserver, because for every request there is one
367 more internal subrequest. So, if your webserver runs on a
368 powerful CPU, use this one. If it is a slow machine, use
369 the first approach or better a <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code> CGI-script.</p>
373 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
374 <div class="section">
375 <h2><a name="archive-access-multiplexer" id="archive-access-multiplexer">Archive Access Multiplexer</a></h2>
380 <dt>Description:</dt>
383 <p>Do you know the great CPAN (Comprehensive Perl Archive
384 Network) under <a href="http://www.perl.com/CPAN">http://www.perl.com/CPAN</a>?
385 This does a redirect to one of several FTP servers around
386 the world which carry a CPAN mirror and is approximately
387 near the location of the requesting client. Actually this
388 can be called an FTP access multiplexing service. While
389 CPAN runs via CGI scripts, how can a similar approach
390 be implemented via <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>?</p>
396 <p>First we notice that from version 3.0.0
397 <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> can
398 also use the "<code>ftp:</code>" scheme on redirects.
399 And second, the location approximation can be done by a
400 <code class="directive"><a href="../mod/mod_rewrite.html#rewritemap">RewriteMap</a></code>
401 over the top-level domain of the client.
402 With a tricky chained ruleset we can use this top-level
403 domain as a key to our multiplexing map.</p>
405 <div class="example"><pre>
407 RewriteMap multiplex txt:/path/to/map.cxan
408 RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C]
409 RewriteRule ^.+\.<strong>([a-zA-Z]+)</strong>::(.*)$ ${multiplex:<strong>$1</strong>|ftp.default.dom}$2 [R,L]
412 <div class="example"><pre>
414 ## map.cxan -- Multiplexing Map for CxAN
417 de ftp://ftp.cxan.de/CxAN/
418 uk ftp://ftp.cxan.uk/CxAN/
419 com ftp://ftp.cxan.com/CxAN/
426 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
427 <div class="section">
428 <h2><a name="browser-dependent-content" id="browser-dependent-content">Browser Dependent Content</a></h2>
433 <dt>Description:</dt>
436 <p>At least for important top-level pages it is sometimes
437 necessary to provide the optimum of browser dependent
438 content, i.e. one has to provide a maximum version for the
439 latest Netscape variants, a minimum version for the Lynx
440 browsers and a average feature version for all others.</p>
446 <p>We cannot use content negotiation because the browsers do
447 not provide their type in that form. Instead we have to
448 act on the HTTP header "User-Agent". The following config
449 does the following: If the HTTP header "User-Agent"
450 begins with "Mozilla/3", the page <code>foo.html</code>
451 is rewritten to <code>foo.NS.html</code> and the
452 rewriting stops. If the browser is "Lynx" or "Mozilla" of
453 version 1 or 2 the URL becomes <code>foo.20.html</code>.
454 All other browsers receive page <code>foo.32.html</code>.
455 This is done by the following ruleset:</p>
457 <div class="example"><pre>
458 RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/3</strong>.*
459 RewriteRule ^foo\.html$ foo.<strong>NS</strong>.html [<strong>L</strong>]
461 RewriteCond %{HTTP_USER_AGENT} ^<strong>Lynx/</strong>.* [OR]
462 RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/[12]</strong>.*
463 RewriteRule ^foo\.html$ foo.<strong>20</strong>.html [<strong>L</strong>]
465 RewriteRule ^foo\.html$ foo.<strong>32</strong>.html [<strong>L</strong>]
470 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
471 <div class="section">
472 <h2><a name="dynamic-mirror" id="dynamic-mirror">Dynamic Mirror</a></h2>
477 <dt>Description:</dt>
480 <p>Assume there are nice webpages on remote hosts we want
481 to bring into our namespace. For FTP servers we would use
482 the <code>mirror</code> program which actually maintains an
483 explicit up-to-date copy of the remote data on the local
484 machine. For a webserver we could use the program
485 <code>webcopy</code> which acts similar via HTTP. But both
486 techniques have one major drawback: The local copy is
487 always just as up-to-date as often we run the program. It
488 would be much better if the mirror is not a static one we
489 have to establish explicitly. Instead we want a dynamic
490 mirror with data which gets updated automatically when
491 there is need (updated data on the remote host).</p>
497 <p>To provide this feature we map the remote webpage or even
498 the complete remote webarea to our namespace by the use
499 of the <dfn>Proxy Throughput</dfn> feature
500 (flag <code>[P]</code>):</p>
502 <div class="example"><pre>
505 RewriteRule ^<strong>hotsheet/</strong>(.*)$ <strong>http://www.tstimpreso.com/hotsheet/</strong>$1 [<strong>P</strong>]
508 <div class="example"><pre>
511 RewriteRule ^<strong>usa-news\.html</strong>$ <strong>http://www.quux-corp.com/news/index.html</strong> [<strong>P</strong>]
516 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
517 <div class="section">
518 <h2><a name="reverse-dynamic-mirror" id="reverse-dynamic-mirror">Reverse Dynamic Mirror</a></h2>
523 <dt>Description:</dt>
530 <div class="example"><pre>
532 RewriteCond /mirror/of/remotesite/$1 -U
533 RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
538 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
539 <div class="section">
540 <h2><a name="retrieve-missing-data" id="retrieve-missing-data">Retrieve Missing Data from Intranet</a></h2>
545 <dt>Description:</dt>
548 <p>This is a tricky way of virtually running a corporate
549 (external) Internet webserver
550 (<code>www.quux-corp.dom</code>), while actually keeping
551 and maintaining its data on a (internal) Intranet webserver
552 (<code>www2.quux-corp.dom</code>) which is protected by a
553 firewall. The trick is that on the external webserver we
554 retrieve the requested data on-the-fly from the internal
561 <p>First, we have to make sure that our firewall still
562 protects the internal webserver and that only the
563 external webserver is allowed to retrieve data from it.
564 For a packet-filtering firewall we could for instance
565 configure a firewall ruleset like the following:</p>
567 <div class="example"><pre>
568 <strong>ALLOW</strong> Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port <strong>80</strong>
569 <strong>DENY</strong> Host * Port * --> Host www2.quux-corp.dom Port <strong>80</strong>
572 <p>Just adjust it to your actual configuration syntax.
573 Now we can establish the <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
574 rules which request the missing data in the background
575 through the proxy throughput feature:</p>
577 <div class="example"><pre>
578 RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2
579 RewriteCond %{REQUEST_FILENAME} <strong>!-f</strong>
580 RewriteCond %{REQUEST_FILENAME} <strong>!-d</strong>
581 RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>]
586 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
587 <div class="section">
588 <h2><a name="load-balancing" id="load-balancing">Load Balancing</a></h2>
593 <dt>Description:</dt>
596 <p>Suppose we want to load balance the traffic to
597 <code>www.foo.com</code> over <code>www[0-5].foo.com</code>
598 (a total of 6 servers). How can this be done?</p>
604 <p>There are a lot of possible solutions for this problem.
605 We will discuss first a commonly known DNS-based variant
606 and then the special one with <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>:</p>
610 <strong>DNS Round-Robin</strong>
612 <p>The simplest method for load-balancing is to use
613 the DNS round-robin feature of <code>BIND</code>.
614 Here you just configure <code>www[0-9].foo.com</code>
615 as usual in your DNS with A(address) records, e.g.</p>
617 <div class="example"><pre>
626 <p>Then you additionally add the following entry:</p>
628 <div class="example"><pre>
636 <p>Now when <code>www.foo.com</code> gets
637 resolved, <code>BIND</code> gives out <code>www0-www5</code>
638 - but in a slightly permutated/rotated order every time.
639 This way the clients are spread over the various
640 servers. But notice that this is not a perfect load
641 balancing scheme, because DNS resolve information
642 gets cached by the other nameservers on the net, so
643 once a client has resolved <code>www.foo.com</code>
644 to a particular <code>wwwN.foo.com</code>, all
645 subsequent requests also go to this particular name
646 <code>wwwN.foo.com</code>. But the final result is
647 ok, because the total sum of the requests are really
648 spread over the various webservers.</p>
652 <strong>DNS Load-Balancing</strong>
654 <p>A sophisticated DNS-based method for
655 load-balancing is to use the program
656 <code>lbnamed</code> which can be found at <a href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html">
657 http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>.
658 It is a Perl 5 program in conjunction with auxilliary
659 tools which provides a real load-balancing for
664 <strong>Proxy Throughput Round-Robin</strong>
666 <p>In this variant we use <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
667 and its proxy throughput feature. First we dedicate
668 <code>www0.foo.com</code> to be actually
669 <code>www.foo.com</code> by using a single</p>
671 <div class="example"><pre>
672 www IN CNAME www0.foo.com.
675 <p>entry in the DNS. Then we convert
676 <code>www0.foo.com</code> to a proxy-only server,
677 i.e. we configure this machine so all arriving URLs
678 are just pushed through the internal proxy to one of
679 the 5 other servers (<code>www1-www5</code>). To
680 accomplish this we first establish a ruleset which
681 contacts a load balancing script <code>lb.pl</code>
684 <div class="example"><pre>
686 RewriteMap lb prg:/path/to/lb.pl
687 RewriteRule ^/(.+)$ ${lb:$1} [P,L]
690 <p>Then we write <code>lb.pl</code>:</p>
692 <div class="example"><pre>
695 ## lb.pl -- load balancing script
700 $name = "www"; # the hostname base
701 $first = 1; # the first server (not 0 here, because 0 is myself)
702 $last = 5; # the last server in the round-robin
703 $domain = "foo.dom"; # the domainname
706 while (<STDIN>) {
707 $cnt = (($cnt+1) % ($last+1-$first));
708 $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
709 print "http://$server/$_";
715 <div class="note">A last notice: Why is this useful? Seems like
716 <code>www0.foo.com</code> still is overloaded? The
717 answer is yes, it is overloaded, but with plain proxy
718 throughput requests, only! All SSI, CGI, ePerl, etc.
719 processing is completely done on the other machines.
720 This is the essential point.</div>
724 <strong>Hardware/TCP Round-Robin</strong>
726 <p>There is a hardware solution available, too. Cisco
727 has a beast called LocalDirector which does a load
728 balancing at the TCP/IP level. Actually this is some
729 sort of a circuit level gateway in front of a
730 webcluster. If you have enough money and really need
731 a solution with high performance, use this one.</p>
737 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
738 <div class="section">
739 <h2><a name="new-mime-type" id="new-mime-type">New MIME-type, New Service</a></h2>
744 <dt>Description:</dt>
747 <p>On the net there are a lot of nifty CGI programs. But
748 their usage is usually boring, so a lot of webmaster
749 don't use them. Even Apache's Action handler feature for
750 MIME-types is only appropriate when the CGI programs
751 don't need special URLs (actually <code>PATH_INFO</code>
752 and <code>QUERY_STRINGS</code>) as their input. First,
753 let us configure a new file type with extension
754 <code>.scgi</code> (for secure CGI) which will be processed
755 by the popular <code>cgiwrap</code> program. The problem
756 here is that for instance if we use a Homogeneous URL Layout
757 (see above) a file inside the user homedirs has the URL
758 <code>/u/user/foo/bar.scgi</code>. But
759 <code>cgiwrap</code> needs the URL in the form
760 <code>/~user/foo/bar.scgi/</code>. The following rule
761 solves the problem:</p>
763 <div class="example"><pre>
764 RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ...
765 ... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3 [NS,<strong>T=application/x-http-cgi</strong>]
768 <p>Or assume we have some more nifty programs:
769 <code>wwwlog</code> (which displays the
770 <code>access.log</code> for a URL subtree) and
771 <code>wwwidx</code> (which runs Glimpse on a URL
772 subtree). We have to provide the URL area to these
773 programs so they know on which area they have to act on.
774 But usually this is ugly, because they are all the times
775 still requested from that areas, i.e. typically we would
776 run the <code>swwidx</code> program from within
777 <code>/u/user/foo/</code> via hyperlink to</p>
779 <div class="example"><pre>
780 /internal/cgi/user/swwidx?i=/u/user/foo/
783 <p>which is ugly. Because we have to hard-code
784 <strong>both</strong> the location of the area
785 <strong>and</strong> the location of the CGI inside the
786 hyperlink. When we have to reorganize the area, we spend a
787 lot of time changing the various hyperlinks.</p>
793 <p>The solution here is to provide a special new URL format
794 which automatically leads to the proper CGI invocation.
795 We configure the following:</p>
797 <div class="example"><pre>
798 RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/
799 RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
802 <p>Now the hyperlink to search at
803 <code>/u/user/foo/</code> reads only</p>
805 <div class="example"><pre>
809 <p>which internally gets automatically transformed to</p>
811 <div class="example"><pre>
812 /internal/cgi/user/wwwidx?i=/u/user/foo/
815 <p>The same approach leads to an invocation for the
816 access log CGI program when the hyperlink
817 <code>:log</code> gets used.</p>
821 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
822 <div class="section">
823 <h2><a name="on-the-fly-content" id="on-the-fly-content">On-the-fly Content-Regeneration</a></h2>
828 <dt>Description:</dt>
831 <p>Here comes a really esoteric feature: Dynamically
832 generated but statically served pages, i.e. pages should be
833 delivered as pure static pages (read from the filesystem
834 and just passed through), but they have to be generated
835 dynamically by the webserver if missing. This way you can
836 have CGI-generated pages which are statically served unless
837 one (or a cronjob) removes the static contents. Then the
838 contents gets refreshed.</p>
844 This is done via the following ruleset:
846 <div class="example"><pre>
847 RewriteCond %{REQUEST_FILENAME} <strong>!-s</strong>
848 RewriteRule ^page\.<strong>html</strong>$ page.<strong>cgi</strong> [T=application/x-httpd-cgi,L]
851 <p>Here a request to <code>page.html</code> leads to a
852 internal run of a corresponding <code>page.cgi</code> if
853 <code>page.html</code> is still missing or has filesize
854 null. The trick here is that <code>page.cgi</code> is a
855 usual CGI script which (additionally to its <code>STDOUT</code>)
856 writes its output to the file <code>page.html</code>.
857 Once it was run, the server sends out the data of
858 <code>page.html</code>. When the webmaster wants to force
859 a refresh the contents, he just removes
860 <code>page.html</code> (usually done by a cronjob).</p>
864 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
865 <div class="section">
866 <h2><a name="autorefresh" id="autorefresh">Document With Autorefresh</a></h2>
871 <dt>Description:</dt>
874 <p>Wouldn't it be nice while creating a complex webpage if
875 the webbrowser would automatically refresh the page every
876 time we write a new version from within our editor?
883 <p>No! We just combine the MIME multipart feature, the
884 webserver NPH feature and the URL manipulation power of
885 <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>. First, we establish a new
886 URL feature: Adding just <code>:refresh</code> to any
887 URL causes this to be refreshed every time it gets
888 updated on the filesystem.</p>
890 <div class="example"><pre>
891 RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1
894 <p>Now when we reference the URL</p>
896 <div class="example"><pre>
897 /u/foo/bar/page.html:refresh
900 <p>this leads to the internal invocation of the URL</p>
902 <div class="example"><pre>
903 /internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
906 <p>The only missing part is the NPH-CGI script. Although
907 one would usually say "left as an exercise to the reader"
908 ;-) I will provide this, too.</p>
910 <div class="example"><pre>
913 ## nph-refresh -- NPH/CGI script for auto refreshing pages
914 ## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
918 # split the QUERY_STRING variable
919 @pairs = split(/&/, $ENV{'QUERY_STRING'});
920 foreach $pair (@pairs) {
921 ($name, $value) = split(/=/, $pair);
922 $name =~ tr/A-Z/a-z/;
923 $name = 'QS_' . $name;
924 $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
925 eval "\$$name = \"$value\"";
927 $QS_s = 1 if ($QS_s eq '');
928 $QS_n = 3600 if ($QS_n eq '');
930 print "HTTP/1.0 200 OK\n";
931 print "Content-type: text/html\n\n";
932 print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
936 print "HTTP/1.0 200 OK\n";
937 print "Content-type: text/html\n\n";
938 print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
942 sub print_http_headers_multipart_begin {
943 print "HTTP/1.0 200 OK\n";
944 $bound = "ThisRandomString12345";
945 print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
946 &print_http_headers_multipart_next;
949 sub print_http_headers_multipart_next {
950 print "\n--$bound\n";
953 sub print_http_headers_multipart_end {
954 print "\n--$bound--\n";
959 $len = length($buffer);
960 print "Content-type: text/html\n";
961 print "Content-length: $len\n\n";
967 local(*FP, $size, $buffer, $bytes);
968 ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
969 $size = sprintf("%d", $size);
970 open(FP, "&lt;$file");
971 $bytes = sysread(FP, $buffer, $size);
976 $buffer = &readfile($QS_f);
977 &print_http_headers_multipart_begin;
978 &displayhtml($buffer);
981 local($file) = $_[0];
984 ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
988 $mtimeL = &mystat($QS_f);
990 for ($n = 0; $n &lt; $QS_n; $n++) {
992 $mtime = &mystat($QS_f);
993 if ($mtime ne $mtimeL) {
996 $buffer = &readfile($QS_f);
997 &print_http_headers_multipart_next;
998 &displayhtml($buffer);
1000 $mtimeL = &mystat($QS_f);
1007 &print_http_headers_multipart_end;
1016 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
1017 <div class="section">
1018 <h2><a name="mass-virtual-hosting" id="mass-virtual-hosting">Mass Virtual Hosting</a></h2>
1023 <dt>Description:</dt>
1026 <p>The <code class="directive"><a href="../mod/core.html#virtualhost"><VirtualHost></a></code> feature of Apache is nice
1027 and works great when you just have a few dozens
1028 virtual hosts. But when you are an ISP and have hundreds of
1029 virtual hosts to provide this feature is not the best
1036 <p>To provide this feature we map the remote webpage or even
1037 the complete remote webarea to our namespace by the use
1038 of the <dfn>Proxy Throughput</dfn> feature (flag <code>[P]</code>):</p>
1040 <div class="example"><pre>
1044 www.vhost1.dom:80 /path/to/docroot/vhost1
1045 www.vhost2.dom:80 /path/to/docroot/vhost2
1047 www.vhostN.dom:80 /path/to/docroot/vhostN
1050 <div class="example"><pre>
1055 # use the canonical hostname on redirects, etc.
1059 # add the virtual host in front of the CLF-format
1060 CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
1063 # enable the rewriting engine in the main server
1066 # define two maps: one for fixing the URL and one which defines
1067 # the available virtual hosts with their corresponding
1069 RewriteMap lowercase int:tolower
1070 RewriteMap vhost txt:/path/to/vhost.map
1072 # Now do the actual virtual host mapping
1073 # via a huge and complicated single rule:
1075 # 1. make sure we don't map for common locations
1076 RewriteCond %{REQUEST_URI} !^/commonurl1/.*
1077 RewriteCond %{REQUEST_URI} !^/commonurl2/.*
1079 RewriteCond %{REQUEST_URI} !^/commonurlN/.*
1081 # 2. make sure we have a Host header, because
1082 # currently our approach only supports
1083 # virtual hosting through this header
1084 RewriteCond %{HTTP_HOST} !^$
1086 # 3. lowercase the hostname
1087 RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$
1089 # 4. lookup this hostname in vhost.map and
1090 # remember it only when it is a path
1091 # (and not "NONE" from above)
1092 RewriteCond ${vhost:%1} ^(/.*)$
1094 # 5. finally we can map the URL to its docroot location
1095 # and remember the virtual host for logging purposes
1096 RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}]
1102 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
1103 <div class="section">
1104 <h2><a name="host-deny" id="host-deny">Host Deny</a></h2>
1109 <dt>Description:</dt>
1112 <p>How can we forbid a list of externally configured hosts
1113 from using our server?</p>
1119 <p>For Apache >= 1.3b6:</p>
1121 <div class="example"><pre>
1123 RewriteMap hosts-deny txt:/path/to/hosts.deny
1124 RewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
1125 RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
1126 RewriteRule ^/.* - [F]
1129 <p>For Apache <= 1.3b6:</p>
1131 <div class="example"><pre>
1133 RewriteMap hosts-deny txt:/path/to/hosts.deny
1134 RewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
1135 RewriteRule !^NOT-FOUND/.* - [F]
1136 RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
1137 RewriteRule !^NOT-FOUND/.* - [F]
1138 RewriteRule ^NOT-FOUND/(.*)$ /$1
1141 <div class="example"><pre>
1145 ## ATTENTION! This is a map, not a list, even when we treat it as such.
1146 ## mod_rewrite parses it for key/value pairs, so at least a
1147 ## dummy value "-" must be present for each entry.
1157 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
1158 <div class="section">
1159 <h2><a name="proxy-deny" id="proxy-deny">Proxy Deny</a></h2>
1164 <dt>Description:</dt>
1167 <p>How can we forbid a certain host or even a user of a
1168 special host from using the Apache proxy?</p>
1174 <p>We first have to make sure <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
1175 is below(!) <code class="module"><a href="../mod/mod_proxy.html">mod_proxy</a></code> in the Configuration
1176 file when compiling the Apache webserver. This way it gets
1177 called <em>before</em> <code class="module"><a href="../mod/mod_proxy.html">mod_proxy</a></code>. Then we
1178 configure the following for a host-dependent deny...</p>
1180 <div class="example"><pre>
1181 RewriteCond %{REMOTE_HOST} <strong>^badhost\.mydomain\.com$</strong>
1182 RewriteRule !^http://[^/.]\.mydomain.com.* - [F]
1185 <p>...and this one for a user@host-dependent deny:</p>
1187 <div class="example"><pre>
1188 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>^badguy@badhost\.mydomain\.com$</strong>
1189 RewriteRule !^http://[^/.]\.mydomain.com.* - [F]
1194 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
1195 <div class="section">
1196 <h2><a name="special-authentication" id="special-authentication">Special Authentication Variant</a></h2>
1201 <dt>Description:</dt>
1204 <p>Sometimes a very special authentication is needed, for
1205 instance a authentication which checks for a set of
1206 explicitly configured users. Only these should receive
1207 access and without explicit prompting (which would occur
1208 when using the Basic Auth via <code class="module"><a href="../mod/mod_auth.html">mod_auth</a></code>).</p>
1214 <p>We use a list of rewrite conditions to exclude all except
1217 <div class="example"><pre>
1218 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend1@client1.quux-corp\.com$</strong>
1219 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend2</strong>@client2.quux-corp\.com$
1220 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend3</strong>@client3.quux-corp\.com$
1221 RewriteRule ^/~quux/only-for-friends/ - [F]
1226 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
1227 <div class="section">
1228 <h2><a name="referer-deflector" id="referer-deflector">Referer-based Deflector</a></h2>
1233 <dt>Description:</dt>
1236 <p>How can we program a flexible URL Deflector which acts
1237 on the "Referer" HTTP header and can be configured with as
1238 many referring pages as we like?</p>
1244 <p>Use the following really tricky ruleset...</p>
1246 <div class="example"><pre>
1247 RewriteMap deflector txt:/path/to/deflector.map
1249 RewriteCond %{HTTP_REFERER} !=""
1250 RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
1251 RewriteRule ^.* %{HTTP_REFERER} [R,L]
1253 RewriteCond %{HTTP_REFERER} !=""
1254 RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
1255 RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
1258 <p>... in conjunction with a corresponding rewrite
1261 <div class="example"><pre>
1266 http://www.badguys.com/bad/index.html -
1267 http://www.badguys.com/bad/index2.html -
1268 http://www.badguys.com/bad/index3.html http://somewhere.com/
1271 <p>This automatically redirects the request back to the
1272 referring page (when "<code>-</code>" is used as the value
1273 in the map) or to a specific URL (when an URL is specified
1274 in the map as the second argument).</p>
1279 <div class="bottomlang">
1280 <p><span>Available Languages: </span><a href="../en/rewrite/rewrite_guide_advanced.html" title="English"> en </a></p>
1281 </div><div id="footer">
1282 <p class="apache">Copyright 2008 The Apache Software Foundation.<br />Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p>
1283 <p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="../faq/">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p></div>