From 66399d03be39a4010ca9abc62fee7a1ecab4982b Mon Sep 17 00:00:00 2001
From: Vincent Bray We want to create a homogeneous and consistent URL
- layout across all WWW servers on an Intranet web cluster, i.e.,
+ layout across all WWW servers on an Intranet web cluster, i.e.,
all URLs (by definition server-local and thus
server-dependent!) become server independent!
What we want is to give the WWW namespace a single consistent
@@ -320,7 +320,7 @@ RewriteRule (.*) netsw-lsdir.cgi/$1
Do you know the great CPAN (Comprehensive Perl Archive
Network) under http://www.perl.com/CPAN?
- This does a redirect to one of several FTP servers around
- the world which each carry a CPAN mirror and (theoretically)
- near the requesting client. Actually this
- can be called an FTP access multiplexing service.
+ CPAN automatically redirects browsers to one of many FTP
+ servers around the world (generally one near the requesting
+ client); each server carries a full CPAN mirror. This is
+ effectively an FTP access multiplexing service.
CPAN runs via CGI scripts, but how could a similar approach
be implemented via At least for important top-level pages it is sometimes
necessary to provide the optimum of browser dependent
- content, i.e., one has to provide one version for
+ content, i.e., one has to provide one version for
current browsers, a different version for the Lynx and text-mode
browsers, and another for other browsers. Assume there are nice webpages on remote hosts we want
+ Assume there are nice web pages on remote hosts we want
to bring into our namespace. For FTP servers we would use
the To provide this feature we map the remote webpage or even
- the complete remote webarea to our namespace by the use
+ To provide this feature we map the remote web page or even
+ the complete remote web area to our namespace by the use
of the Proxy Throughput feature
(flag This is a tricky way of virtually running a corporate
- (external) Internet webserver
+ (external) Internet web server
( First, we have to make sure that our firewall still
- protects the internal webserver and that only the
- external webserver is allowed to retrieve data from it.
- For a packet-filtering firewall we could for instance
+ First, we must make sure that our firewall still
+ protects the internal web server and only the
+ external web server is allowed to retrieve data from it.
+ On a packet-filtering firewall, for instance, we could
configure a firewall ruleset like the following: There are a lot of possible solutions for this problem.
- We will discuss first a commonly known DNS-based variant
- and then the special one with There are many possible solutions for this problem.
+ We will first discuss a common DNS-based method,
+ and then one based on The simplest method for load-balancing is to use
- the DNS round-robin feature of Then you additionally add the following entry: Then you additionally add the following entries: Now when entry in the DNS. Then we convert
There is a hardware solution available, too. Cisco
- has a beast called LocalDirector which does a load
- balancing at the TCP/IP level. Actually this is some
- sort of a circuit level gateway in front of a
- webcluster. If you have enough money and really need
- a solution with high performance, use this one. There are more sophisticated solutions, as well. Cisco,
+ F5, and several other companies sell hardware load
+ balancers (typically used in pairs for redundancy), which
+ offer sophisticated load balancing and auto-failover
+ features. There are software packages which offer similar
+ features on commodity hardware, as well. If you have
+ enough money or need, check these out. The lb-l mailing list is a
+ good place to research. On the net there are a lot of nifty CGI programs. But
- their usage is usually boring, so a lot of webmaster
+ On the net there are many nifty CGI programs. But
+ their usage is usually boring, so a lot of webmasters
don't use them. Even Apache's Action handler feature for
MIME-types is only appropriate when the CGI programs
don't need special URLs (actually which is ugly. Because we have to hard-code
+ which is ugly, because we have to hard-code
both the location of the area
and the location of the CGI inside the
- hyperlink. When we have to reorganize the area, we spend a
+ hyperlink. When we have to reorganize, we spend a
lot of time changing the various hyperlinks. Here comes a really esoteric feature: Dynamically
- generated but statically served pages, i.e., pages should be
+ generated but statically served pages, i.e., pages should be
delivered as pure static pages (read from the filesystem
and just passed through), but they have to be generated
- dynamically by the webserver if missing. This way you can
- have CGI-generated pages which are statically served unless
- one (or a cronjob) removes the static contents. Then the
+ dynamically by the web server if missing. This way you can
+ have CGI-generated pages which are statically served unless an
+ admin (or a Here a request to Here a request for [PT]
flag if
+ situation, e.g., adding the [PT]
flag if
using mod_alias
and
mod_userdir
, etc. Or rewriting a ruleset
to work in .htaccess
context instead
@@ -43,7 +43,7 @@
Web Cluster with Consistent URL Space
Structured Homedirs
Filesystem Reorganization
Redirect Failing URLs to Another Webserver
Redirect Failing URLs to Another Web Server
Archive Access Multiplexer
Browser Dependent Content
Dynamic Mirror
Redirect Failing URLs to Another Webserver
+Redirect Failing URLs to Another Web Server
@@ -364,7 +364,7 @@ RewriteRule ^(.+) http://webserverB.dom/$1
The result is that this will work for all types of URLs
and is safe. But it does have a performance impact on
the web server, because for every request there is one
- more internal subrequest. So, if your webserver runs on a
+ more internal subrequest. So, if your web server runs on a
powerful CPU, use this one. If it is a slow machine, use
the first approach or better an ErrorDocument
CGI script.
@@ -382,10 +382,10 @@ RewriteRule ^(.+) http://webserverB.dom/$1
mod_rewrite
?mirror
program which actually maintains an
explicit up-to-date copy of the remote data on the local
- machine. For a webserver we could use the program
+ machine. For a web server we could use the program
webcopy
which runs via HTTP. But both
- techniques have one major drawback: The local copy is
- always just as up-to-date as the last time we ran the program. It
- would be much better if the mirror is not a static one we
+ techniques have a major drawback: The local copy is
+ always only as up-to-date as the last time we ran the program. It
+ would be much better if the mirror was not a static one we
have to establish explicitly. Instead we want a dynamic
- mirror with data which gets updated automatically when
- there is need (updated on the remote host).[P]
):www.quux-corp.dom
), while actually keeping
- and maintaining its data on a (internal) Intranet webserver
+ and maintaining its data on an (internal) Intranet web server
(www2.quux-corp.dom
) which is protected by a
- firewall. The trick is that on the external webserver we
- retrieve the requested data on-the-fly from the internal
+ firewall. The trick is that the external web server retrieves
+ the requested data on-the-fly from the internal
one.
@@ -601,18 +601,18 @@ RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom
mod_rewrite
:mod_rewrite
:
BIND
.
+ DNS round-robin.
Here you just configure www[0-9].foo.com
- as usual in your DNS with A(address) records, e.g.,
www0 IN A 1.2.3.1
@@ -623,7 +623,7 @@ www4 IN A 1.2.3.5
www5 IN A 1.2.3.6
www IN A 1.2.3.1
@@ -635,17 +635,19 @@ www IN A 1.2.3.5
www.foo.com
gets
resolved, BIND
gives out www0-www5
- - but in a slightly permutated/rotated order every time.
+ - but in a permutated (rotated) order every time.
This way the clients are spread over the various
servers. But notice that this is not a perfect load
- balancing scheme, because DNS resolution information
- gets cached by the other nameservers on the net, so
+ balancing scheme, because DNS resolutions are
+ cached by clients and other nameservers, so
once a client has resolved www.foo.com
to a particular wwwN.foo.com
, all its
- subsequent requests also go to this particular name
- wwwN.foo.com
. But the final result is
- okay, because the requests are collectively
- spread over the various webservers.lbnamed
which can be found at
http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html.
- It is a Perl 5 program in conjunction with auxilliary
- tools which provides a real load-balancing for
+ It is a Perl 5 program which, in conjunction with auxilliary
+ tools, provides real load-balancing via
DNS.
www0.foo.com
to a proxy-only server,
- i.e., we configure this machine so all arriving URLs
- are just pushed through the internal proxy to one of
+ i.e., we configure this machine so all arriving URLs
+ are simply passed through its internal proxy to one of
the 5 other servers (www1-www5
). To
accomplish this we first establish a ruleset which
contacts a load balancing script lb.pl
@@ -716,19 +718,23 @@ while (<STDIN>) {
www0.foo.com
still is overloaded? The
answer is yes, it is overloaded, but with plain proxy
throughput requests, only! All SSI, CGI, ePerl, etc.
- processing is completely done on the other machines.
- This is the essential point.PATH_INFO
@@ -754,9 +760,9 @@ while (<STDIN>) {
.scgi
(for secure CGI) which will be processed
by the popular cgiwrap
program. The problem
here is that for instance if we use a Homogeneous URL Layout
- (see above) a file inside the user homedirs has the URL
- /u/user/foo/bar.scgi
. But
- cgiwrap
needs the URL in the form
+ (see above) a file inside the user homedirs might have a URL
+ like /u/user/foo/bar.scgi
, but
+ cgiwrap
needs URLs in the form
/~user/foo/bar.scgi/
. The following rule
solves the problem:access.log
for a URL subtree) and
wwwidx
(which runs Glimpse on a URL
subtree). We have to provide the URL area to these
- programs so they know on which area they have to act on.
- But usually this is ugly, because they are all the times
- still requested from that areas, i.e., typically we would
+ programs so they know which area they are really working with.
+ But usually this is complicated, because they may still be
+ requested by the alternate URL form, i.e., typically we would
run the swwidx
program from within
/u/user/foo/
via hyperlink to
@@ -780,10 +786,10 @@ RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
/internal/cgi/user/swwidx?i=/u/user/foo/
cron
job) removes the static contents. Then the
contents gets refreshed.page.html
leads to a
+ page.html
leads to an
internal run of a corresponding page.cgi
if
- page.html
is still missing or has filesize
+ page.html
is missing or has filesize
null. The trick here is that page.cgi
is a
- usual CGI script which (additionally to its STDOUT
)
+ CGI script which (additionally to its STDOUT
)
writes its output to the file page.html
.
- Once it was run, the server sends out the data of
+ Once it has completed, the server sends out
page.html
. When the webmaster wants to force
- a refresh the contents, he just removes
- page.html
(usually done by a cronjob).page.html
(typically from cron
).
Wouldn't it be nice while creating a complex webpage if - the webbrowser would automatically refresh the page every - time we write a new version from within our editor? +
Wouldn't it be nice, while creating a complex web page, if + the web browser would automatically refresh the page every + time we save a new version from within our editor? Impossible?
No! We just combine the MIME multipart feature, the
- webserver NPH feature and the URL manipulation power of
+ web server NPH feature, and the URL manipulation power of
mod_rewrite
. First, we establish a new
URL feature: Adding just :refresh
to any
- URL causes this to be refreshed every time it gets
+ URL causes the 'page' to be refreshed every time it is
updated on the filesystem.
@@ -1024,18 +1030,17 @@ exit(0);
The <VirtualHost>
feature of Apache is nice
- and works great when you just have a few dozens
+ and works great when you just have a few dozen
virtual hosts. But when you are an ISP and have hundreds of
- virtual hosts to provide this feature is not the best
- choice.
To provide this feature we map the remote webpage or even
- the complete remote webarea to our namespace by the use
- of the Proxy Throughput feature (flag [P]
):
To provide this feature we map the remote web page or even
+ the complete remote web area to our namespace using the
+ Proxy Throughput feature (flag [P]
):
## @@ -1173,7 +1178,7 @@ bsdti1.sdm.de -
We first have to make sure mod_rewrite
is below(!) mod_proxy
in the Configuration
- file when compiling the Apache webserver. This way it gets
+ file when compiling the Apache web server. This way it gets
called before mod_proxy
. Then we
configure the following for a host-dependent deny...
Sometimes a very special authentication is needed, for - instance a authentication which checks for a set of +
Sometimes very special authentication is needed, for
+ instance authentication which checks for a set of
explicitly configured users. Only these should receive
access and without explicit prompting (which would occur
- when using the Basic Auth via mod_auth
).
mod_auth_basic
).