From 27b409c78b1a3aa7f32eeee3baa31d7a035f3a82 Mon Sep 17 00:00:00 2001 From: Chris Pepper Date: Thu, 17 Apr 2008 04:13:22 +0000 Subject: [PATCH] General cleanup of rewrite guide. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@648946 13f79535-47bb-0310-9956-ffa450edef68 --- .../manual/rewrite/rewrite_guide_advanced.xml | 128 +++++++++--------- 1 file changed, 64 insertions(+), 64 deletions(-) diff --git a/docs/manual/rewrite/rewrite_guide_advanced.xml b/docs/manual/rewrite/rewrite_guide_advanced.xml index a3169a8d14..1ba639d8e5 100644 --- a/docs/manual/rewrite/rewrite_guide_advanced.xml +++ b/docs/manual/rewrite/rewrite_guide_advanced.xml @@ -35,13 +35,13 @@ solve each problem by configuring URL rewriting rulesets.

ATTENTION: Depending on your server configuration - it may be necessary to slightly change the examples for your - situation, e.g. adding the [PT] flag when - additionally using mod_alias and + it may be necessary to adjust the examples for your + situation, e.g., adding the [PT] flag if + using mod_alias and mod_userdir, etc. Or rewriting a ruleset - to fit in .htaccess context instead + to work in .htaccess context instead of per-server context. Always try to understand what a - particular ruleset really does before you use it. This + particular ruleset really does before you use it; this avoids many problems. @@ -56,30 +56,30 @@ examples
- Webcluster through Homogeneous URL Layout + Web Cluster with Consistent URL Space
Description:

We want to create a homogeneous and consistent URL - layout over all WWW servers on a Intranet webcluster, i.e. - all URLs (per definition server local and thus server - dependent!) become actually server independent! - What we want is to give the WWW namespace a consistent - server-independent layout: no URL should have to include - any physically correct target server. The cluster itself - should drive us automatically to the physical target - host.

+ layout across all WWW servers on an Intranet web cluster, i.e., + all URLs (by definition server-local and thus + server-dependent!) become server independent! + What we want is to give the WWW namespace a single consistent + layout: no URL should refer to + any particular target server. The cluster itself + should connect users automatically to a physical target + host as needed, invisibly.

Solution:
-

First, the knowledge of the target servers come from - (distributed) external maps which contain information - where our users, groups and entities stay. They have the - form

+

First, the knowledge of the target servers comes from + (distributed) external maps which contain information on + where our users, groups, and entities reside. They have the + form:

 user1  server_of_user1
@@ -89,7 +89,7 @@ user2  server_of_user2
 
           

We put them into files map.xxx-to-host. Second we need to instruct all servers to redirect URLs - of the forms

+ of the forms:

 /u/user/anypath
@@ -105,8 +105,8 @@ http://physical-host/g/group/anypath
 http://physical-host/e/entity/anypath
 
-

when the URL is not locally valid to a server. The - following ruleset does this for us by the help of the map +

when any URL path need not be valid on every server. The + following ruleset does this for us with the help of the map files (assuming that server0 is a default server which will be used if a user has no entry in the map):

@@ -137,9 +137,9 @@ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
Description:
-

Some sites with thousands of users usually use a - structured homedir layout, i.e. each homedir is in a - subdirectory which begins for instance with the first +

Some sites with thousands of users use a + structured homedir layout, i.e. each homedir is in a + subdirectory which begins (for instance) with the first character of the username. So, /~foo/anypath is /home/f/foo/.www/anypath while /~bar/anypath is @@ -150,7 +150,7 @@ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\

We use the following ruleset to expand the tilde URLs - into exactly the above layout.

+ into the above layout.

 RewriteEngine on
@@ -176,7 +176,7 @@ RewriteRule   ^/~(([a-z])[a-z0-9]+)(.*)  /home/$2net.sw is
           my archive of freely available Unix software packages,
           which I started to collect in 1992. It is both my hobby
-          and job to to this, because while I'm studying computer
+          and job to do this, because while I'm studying computer
           science I have also worked for many years as a system and
           network administrator in my spare time. Every week I need
           some sort of software so I created a deep hierarchy of
@@ -205,11 +205,11 @@ drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/
           the world via a nice Web interface. "Nice" means that I
           wanted to offer an interface where you can browse
           directly through the archive hierarchy. And "nice" means
-          that I didn't wanted to change anything inside this
+          that I didn't want to change anything inside this
           hierarchy - not even by putting some CGI scripts at the
-          top of it. Why? Because the above structure should be
-          later accessible via FTP as well, and I didn't want any
-          Web or CGI stuff to be there.

+ top of it. Why? Because the above structure should later be + accessible via FTP as well, and I didn't want any + Web or CGI stuff mixed in there.

Solution:
@@ -237,8 +237,8 @@ drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/

The DATA/ subdirectory holds the above - directory structure, i.e. the real - net.sw stuff and gets + directory structure, i.e. the real + net.sw stuff, and gets automatically updated via rdist from time to time. The second part of the problem remains: how to link these two structures together into one smooth-looking URL @@ -247,7 +247,7 @@ drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/ for the various URLs. Here is the solution: first I put the following into the per-directory configuration file in the DocumentRoot - of the server to rewrite the announced URL + of the server to rewrite the public URL path /net.sw/ to the internal path /e/netsw:

@@ -297,7 +297,7 @@ RewriteRule (.*) netsw-lsdir.cgi/$1
  1. Notice the L (last) flag and no - substitution field ('-') in the forth part
  2. + substitution field ('-') in the fourth part
  3. Notice the ! (not) character and the C (chain) flag at the first rule @@ -312,7 +312,7 @@ RewriteRule (.*) netsw-lsdir.cgi/$1
    - Redirect Failing URLs To Other Webserver + Redirect Failing URLs to Another Webserver
    Description:
    @@ -321,18 +321,18 @@ RewriteRule (.*) netsw-lsdir.cgi/$1

    A typical FAQ about URL rewriting is how to redirect failing requests on webserver A to webserver B. Usually this is done via ErrorDocument CGI-scripts in Perl, but + >ErrorDocument CGI scripts in Perl, but there is also a mod_rewrite solution. - But notice that this performs more poorly than using an + But note that this performs more poorly than using an ErrorDocument - CGI-script!

    + CGI script!

Solution:

The first solution has the best performance but less - flexibility, and is less error safe:

+ flexibility, and is less safe:

 RewriteEngine on
@@ -343,7 +343,7 @@ RewriteRule   ^(.+)                             http://webserverBThe problem here is that this will only work for pages
           inside the DocumentRoot. While you can add more
           Conditions (for instance to also handle homedirs, etc.)
-          there is better variant:

+ there is a better variant:

 RewriteEngine on
@@ -353,12 +353,12 @@ RewriteRule   ^(.+)          http://webserverB.dom/$1
 
           

This uses the URL look-ahead feature of mod_rewrite. The result is that this will work for all types of URLs - and is a safe way. But it does a performance impact on - the webserver, because for every request there is one + and is safe. But it does have a performance impact on + the web server, because for every request there is one more internal subrequest. So, if your webserver runs on a powerful CPU, use this one. If it is a slow machine, use - the first approach or better a ErrorDocument CGI-script.

+ the first approach or better an ErrorDocument CGI script.

@@ -376,17 +376,17 @@ RewriteRule ^(.+) http://webserverB.dom/$1 Network) under http://www.perl.com/CPAN? This does a redirect to one of several FTP servers around - the world which carry a CPAN mirror and is approximately - near the location of the requesting client. Actually this - can be called an FTP access multiplexing service. While - CPAN runs via CGI scripts, how can a similar approach + the world which each carry a CPAN mirror and (theoretically) + near the requesting client. Actually this + can be called an FTP access multiplexing service. + CPAN runs via CGI scripts, but how could a similar approach be implemented via mod_rewrite?

Solution:
-

First we notice that from version 3.0.0 +

First we notice that as of version 3.0.0, mod_rewrite can also use the "ftp:" scheme on redirects. And second, the location approximation can be done by a @@ -428,9 +428,9 @@ com ftp://ftp.cxan.com/CxAN/

At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent - content, i.e. one has to provide a maximum version for the - latest Netscape variants, a minimum version for the Lynx - browsers and a average feature version for all others.

+ content, i.e., one has to provide one version for + current browsers, a different version for the Lynx and text-mode + browsers, and another for other browsers.

Solution:
@@ -443,9 +443,9 @@ com ftp://ftp.cxan.com/CxAN/ begins with "Mozilla/3", the page foo.html is rewritten to foo.NS.html and the rewriting stops. If the browser is "Lynx" or "Mozilla" of - version 1 or 2 the URL becomes foo.20.html. + version 1 or 2, the URL becomes foo.20.html. All other browsers receive page foo.32.html. - This is done by the following ruleset:

+ This is done with the following ruleset:

 RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/3.*
@@ -475,13 +475,13 @@ RewriteRule ^foo\.html$         foo.32.html          [L
           the mirror program which actually maintains an
           explicit up-to-date copy of the remote data on the local
           machine. For a webserver we could use the program
-          webcopy which acts similar via HTTP. But both
+          webcopy which runs via HTTP. But both
           techniques have one major drawback: The local copy is
-          always just as up-to-date as often we run the program. It
+          always just as up-to-date as the last time we ran the program. It
           would be much better if the mirror is not a static one we
           have to establish explicitly. Instead we want a dynamic
           mirror with data which gets updated automatically when
-          there is need (updated data on the remote host).

+ there is need (updated on the remote host).

Solution:
@@ -605,7 +605,7 @@ RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom

The simplest method for load-balancing is to use the DNS round-robin feature of BIND. Here you just configure www[0-9].foo.com - as usual in your DNS with A(address) records, e.g.

+ as usual in your DNS with A(address) records, e.g.,

 www0   IN  A       1.2.3.1
@@ -631,13 +631,13 @@ www   IN  A       1.2.3.5
               - but in a slightly permutated/rotated order every time.
               This way the clients are spread over the various
               servers. But notice that this is not a perfect load
-              balancing scheme, because DNS resolve information
+              balancing scheme, because DNS resolution information
               gets cached by the other nameservers on the net, so
               once a client has resolved www.foo.com
-              to a particular wwwN.foo.com, all
+              to a particular wwwN.foo.com, all its
               subsequent requests also go to this particular name
               wwwN.foo.com. But the final result is
-              ok, because the total sum of the requests are really
+              okay, because the requests are collectively
               spread over the various webservers.

@@ -668,7 +668,7 @@ www IN CNAME www0.foo.com.

entry in the DNS. Then we convert www0.foo.com to a proxy-only server, - i.e. we configure this machine so all arriving URLs + i.e., we configure this machine so all arriving URLs are just pushed through the internal proxy to one of the 5 other servers (www1-www5). To accomplish this we first establish a ruleset which @@ -766,7 +766,7 @@ RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ... subtree). We have to provide the URL area to these programs so they know on which area they have to act on. But usually this is ugly, because they are all the times - still requested from that areas, i.e. typically we would + still requested from that areas, i.e., typically we would run the swwidx program from within /u/user/foo/ via hyperlink to

@@ -823,7 +823,7 @@ HREF="*"

Here comes a really esoteric feature: Dynamically - generated but statically served pages, i.e. pages should be + generated but statically served pages, i.e., pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can -- 2.40.0