From 27b409c78b1a3aa7f32eeee3baa31d7a035f3a82 Mon Sep 17 00:00:00 2001
From: Chris Pepper
[PT]
flag when
- additionally using [PT]
flag if
+ using .htaccess
context instead
+ to work in .htaccess
context instead
of per-server context. Always try to understand what a
- particular ruleset really does before you use it. This
+ particular ruleset really does before you use it; this
avoids many problems.We want to create a homogeneous and consistent URL - layout over all WWW servers on a Intranet webcluster, i.e. - all URLs (per definition server local and thus server - dependent!) become actually server independent! - What we want is to give the WWW namespace a consistent - server-independent layout: no URL should have to include - any physically correct target server. The cluster itself - should drive us automatically to the physical target - host.
+ layout across all WWW servers on an Intranet web cluster, i.e., + all URLs (by definition server-local and thus + server-dependent!) become server independent! + What we want is to give the WWW namespace a single consistent + layout: no URL should refer to + any particular target server. The cluster itself + should connect users automatically to a physical target + host as needed, invisibly.First, the knowledge of the target servers come from - (distributed) external maps which contain information - where our users, groups and entities stay. They have the - form
+First, the knowledge of the target servers comes from + (distributed) external maps which contain information on + where our users, groups, and entities reside. They have the + form:
user1 server_of_user1 @@ -89,7 +89,7 @@ user2 server_of_user2We put them into files
+ of the forms:map.xxx-to-host
. Second we need to instruct all servers to redirect URLs - of the forms- /u/user/anypath @@ -105,8 +105,8 @@ http://physical-host/g/group/anypath http://physical-host/e/entity/anypathwhen the URL is not locally valid to a server. The - following ruleset does this for us by the help of the map +
when any URL path need not be valid on every server. The + following ruleset does this for us with the help of the map files (assuming that server0 is a default server which will be used if a user has no entry in the map):
@@ -137,9 +137,9 @@ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
Some sites with thousands of users usually use a - structured homedir layout, i.e. each homedir is in a - subdirectory which begins for instance with the first +
Some sites with thousands of users use a
+ structured homedir layout, i.e. each homedir is in a
+ subdirectory which begins (for instance) with the first
character of the username. So, /~foo/anypath
is /home/f/foo/.www/anypath
while /~bar/anypath
is
@@ -150,7 +150,7 @@ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
We use the following ruleset to expand the tilde URLs - into exactly the above layout.
+ into the above layout.RewriteEngine on @@ -176,7 +176,7 @@ RewriteRule ^/~(([a-z])[a-z0-9]+)(.*) /home/$2net.sw is my archive of freely available Unix software packages, which I started to collect in 1992. It is both my hobby - and job to to this, because while I'm studying computer + and job to do this, because while I'm studying computer science I have also worked for many years as a system and network administrator in my spare time. Every week I need some sort of software so I created a deep hierarchy of @@ -205,11 +205,11 @@ drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/ the world via a nice Web interface. "Nice" means that I wanted to offer an interface where you can browse directly through the archive hierarchy. And "nice" means - that I didn't wanted to change anything inside this + that I didn't want to change anything inside this hierarchy - not even by putting some CGI scripts at the - top of it. Why? Because the above structure should be - later accessible via FTP as well, and I didn't want any - Web or CGI stuff to be there. + top of it. Why? Because the above structure should later be + accessible via FTP as well, and I didn't want any + Web or CGI stuff mixed in there.
The DATA/
subdirectory holds the above
- directory structure, i.e. the real
- net.sw stuff and gets
+ directory structure, i.e. the real
+ net.sw stuff, and gets
automatically updated via rdist
from time to
time. The second part of the problem remains: how to link
these two structures together into one smooth-looking URL
@@ -247,7 +247,7 @@ drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/
for the various URLs. Here is the solution: first I put
the following into the per-directory configuration file
in the /net.sw/
to the internal path
/e/netsw
:
L
(last) flag and no
- substitution field ('-
') in the forth part-
') in the fourth part
!
(not) character and
the C
(chain) flag at the first rule
@@ -312,7 +312,7 @@ RewriteRule (.*) netsw-lsdir.cgi/$1
A typical FAQ about URL rewriting is how to redirect
failing requests on webserver A to webserver B. Usually
this is done via
The first solution has the best performance but less - flexibility, and is less error safe:
+ flexibility, and is less safe:RewriteEngine on @@ -343,7 +343,7 @@ RewriteRule ^(.+) http://webserverBThe problem here is that this will only work for pages inside theDocumentRoot . While you can add more Conditions (for instance to also handle homedirs, etc.) - there is better variant: + there is a better variant: RewriteEngine on @@ -353,12 +353,12 @@ RewriteRule ^(.+) http://webserverB.dom/$1This uses the URL look-ahead feature of
+ the first approach or better anmod_rewrite . The result is that this will work for all types of URLs - and is a safe way. But it does a performance impact on - the webserver, because for every request there is one + and is safe. But it does have a performance impact on + the web server, because for every request there is one more internal subrequest. So, if your webserver runs on a powerful CPU, use this one. If it is a slow machine, use - the first approach or better aErrorDocument CGI-script.ErrorDocument CGI script.
First we notice that from version 3.0.0 +
First we notice that as of version 3.0.0,
ftp:
" scheme on redirects.
And second, the location approximation can be done by a
@@ -428,9 +428,9 @@ com ftp://ftp.cxan.com/CxAN/
At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent - content, i.e. one has to provide a maximum version for the - latest Netscape variants, a minimum version for the Lynx - browsers and a average feature version for all others.
+ content, i.e., one has to provide one version for + current browsers, a different version for the Lynx and text-mode + browsers, and another for other browsers.foo.html
is rewritten to foo.NS.html
and the
rewriting stops. If the browser is "Lynx" or "Mozilla" of
- version 1 or 2 the URL becomes foo.20.html
.
+ version 1 or 2, the URL becomes foo.20.html
.
All other browsers receive page foo.32.html
.
- This is done by the following ruleset:
+ This is done with the following ruleset:
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.* @@ -475,13 +475,13 @@ RewriteRule ^foo\.html$ foo.32.html [L themirror
program which actually maintains an explicit up-to-date copy of the remote data on the local machine. For a webserver we could use the program -webcopy
which acts similar via HTTP. But both +webcopy
which runs via HTTP. But both techniques have one major drawback: The local copy is - always just as up-to-date as often we run the program. It + always just as up-to-date as the last time we ran the program. It would be much better if the mirror is not a static one we have to establish explicitly. Instead we want a dynamic mirror with data which gets updated automatically when - there is need (updated data on the remote host). + there is need (updated on the remote host).Solution: @@ -605,7 +605,7 @@ RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.domThe simplest method for load-balancing is to use the DNS round-robin feature of
+ as usual in your DNS with A(address) records, e.g.,BIND
. Here you just configurewww[0-9].foo.com
- as usual in your DNS with A(address) records, e.g. www0 IN A 1.2.3.1 @@ -631,13 +631,13 @@ www IN A 1.2.3.5 - but in a slightly permutated/rotated order every time. This way the clients are spread over the various servers. But notice that this is not a perfect load - balancing scheme, because DNS resolve information + balancing scheme, because DNS resolution information gets cached by the other nameservers on the net, so once a client has resolvedwww.foo.com
- to a particularwwwN.foo.com
, all + to a particularwwwN.foo.com
, all its subsequent requests also go to this particular namewwwN.foo.com
. But the final result is - ok, because the total sum of the requests are really + okay, because the requests are collectively spread over the various webservers. @@ -668,7 +668,7 @@ www IN CNAME www0.foo.com.entry in the DNS. Then we convert
@@ -823,7 +823,7 @@ HREF="*"www0.foo.com
to a proxy-only server, - i.e. we configure this machine so all arriving URLs + i.e., we configure this machine so all arriving URLs are just pushed through the internal proxy to one of the 5 other servers (www1-www5
). To accomplish this we first establish a ruleset which @@ -766,7 +766,7 @@ RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ... subtree). We have to provide the URL area to these programs so they know on which area they have to act on. But usually this is ugly, because they are all the times - still requested from that areas, i.e. typically we would + still requested from that areas, i.e., typically we would run theswwidx
program from within/u/user/foo/
via hyperlink toHere comes a really esoteric feature: Dynamically - generated but statically served pages, i.e. pages should be + generated but statically served pages, i.e., pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can -- 2.40.0