From: Chris Pepper We want to create a homogeneous and consistent URL
- layout across all WWW servers on an Intranet web cluster, i.e.,
+ layout across all WWW servers on an Intranet web cluster, i.e.,
all URLs (by definition server-local and thus
server-dependent!) become server independent!
What we want is to give the WWW namespace a single consistent
@@ -312,7 +312,7 @@ RewriteRule (.*) netsw-lsdir.cgi/$1
Do you know the great CPAN (Comprehensive Perl Archive
Network) under http://www.perl.com/CPAN?
- This does a redirect to one of several FTP servers around
- the world which each carry a CPAN mirror and (theoretically)
- near the requesting client. Actually this
- can be called an FTP access multiplexing service.
+ CPAN automatically redirects browsers to one of many FTP
+ servers around the world (generally one near the requesting
+ client); each server carries a full CPAN mirror. This is
+ effectively an FTP access multiplexing service.
CPAN runs via CGI scripts, but how could a similar approach
be implemented via At least for important top-level pages it is sometimes
necessary to provide the optimum of browser dependent
- content, i.e., one has to provide one version for
+ content, i.e., one has to provide one version for
current browsers, a different version for the Lynx and text-mode
browsers, and another for other browsers. Assume there are nice webpages on remote hosts we want
+ Assume there are nice web pages on remote hosts we want
to bring into our namespace. For FTP servers we would use
the [PT]
flag if
+ situation, e.g., adding the [PT]
flag if
using .htaccess
context instead
@@ -63,7 +63,7 @@ examples
mirror
program which actually maintains an
explicit up-to-date copy of the remote data on the local
- machine. For a webserver we could use the program
+ machine. For a web server we could use the program
webcopy
which runs via HTTP. But both
- techniques have one major drawback: The local copy is
- always just as up-to-date as the last time we ran the program. It
- would be much better if the mirror is not a static one we
+ techniques have a major drawback: The local copy is
+ always only as up-to-date as the last time we ran the program. It
+ would be much better if the mirror was not a static one we
have to establish explicitly. Instead we want a dynamic
- mirror with data which gets updated automatically when
- there is need (updated on the remote host).
To provide this feature we map the remote webpage or even - the complete remote webarea to our namespace by the use +
To provide this feature we map the remote web page or even
+ the complete remote web area to our namespace by the use
of the Proxy Throughput feature
(flag [P]
):
This is a tricky way of virtually running a corporate
- (external) Internet webserver
+ (external) Internet web server
(www.quux-corp.dom
), while actually keeping
- and maintaining its data on a (internal) Intranet webserver
+ and maintaining its data on an (internal) Intranet web server
(www2.quux-corp.dom
) which is protected by a
- firewall. The trick is that on the external webserver we
- retrieve the requested data on-the-fly from the internal
+ firewall. The trick is that the external web server retrieves
+ the requested data on-the-fly from the internal
one.
First, we have to make sure that our firewall still - protects the internal webserver and that only the - external webserver is allowed to retrieve data from it. - For a packet-filtering firewall we could for instance +
First, we must make sure that our firewall still + protects the internal web server and only the + external web server is allowed to retrieve data from it. + On a packet-filtering firewall, for instance, we could configure a firewall ruleset like the following:
@@ -594,18 +594,18 @@ RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom
There are a lot of possible solutions for this problem.
- We will discuss first a commonly known DNS-based variant
- and then the special one with
There are many possible solutions for this problem.
+ We will first discuss a common DNS-based method,
+ and then one based on
The simplest method for load-balancing is to use
- the DNS round-robin feature of BIND
.
+ DNS round-robin.
Here you just configure www[0-9].foo.com
- as usual in your DNS with A(address) records, e.g.,
www0 IN A 1.2.3.1 @@ -616,7 +616,7 @@ www4 IN A 1.2.3.5 www5 IN A 1.2.3.6
Then you additionally add the following entry:
+Then you additionally add the following entries:
www IN A 1.2.3.1 @@ -628,17 +628,19 @@ www IN A 1.2.3.5Now when
+ spread over the various web servers.www.foo.com
gets resolved,BIND
gives outwww0-www5
- - but in a slightly permutated/rotated order every time. + - but in a permutated (rotated) order every time. This way the clients are spread over the various servers. But notice that this is not a perfect load - balancing scheme, because DNS resolution information - gets cached by the other nameservers on the net, so + balancing scheme, because DNS resolutions are + cached by clients and other nameservers, so once a client has resolvedwww.foo.com
to a particularwwwN.foo.com
, all its - subsequent requests also go to this particular name -wwwN.foo.com
. But the final result is + subsequent requests will continue to go to the same + IP (and thus a single server), rather than being + distributed across the other available servers. But the + over result is okay, because the requests are collectively - spread over the various webservers.
lbnamed
which can be found at
http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html.
- It is a Perl 5 program in conjunction with auxilliary
- tools which provides a real load-balancing for
+ It is a Perl 5 program which, in conjunction with auxilliary
+ tools, provides real load-balancing via
DNS.
entry in the DNS. Then we convert
www0.foo.com
to a proxy-only server,
- i.e., we configure this machine so all arriving URLs
- are just pushed through the internal proxy to one of
+ i.e., we configure this machine so all arriving URLs
+ are simply passed through its internal proxy to one of
the 5 other servers (www1-www5
). To
accomplish this we first establish a ruleset which
contacts a load balancing script lb.pl
@@ -710,19 +712,24 @@ while (<STDIN>) {
www0.foo.com
still is overloaded? The
answer is yes, it is overloaded, but with plain proxy
throughput requests, only! All SSI, CGI, ePerl, etc.
- processing is completely done on the other machines.
- This is the essential point.
+ processing is handled done on the other machines.
+ For a complicated site, this may work well. The biggest
+ risk here is that www0 is now a single point of failure --
+ if it crashes, the other servers are inaccessible.
There is a hardware solution available, too. Cisco - has a beast called LocalDirector which does a load - balancing at the TCP/IP level. Actually this is some - sort of a circuit level gateway in front of a - webcluster. If you have enough money and really need - a solution with high performance, use this one.
+ Dedicated Load Balancers + +There are more sophisticated solutions, as well. Cisco, + F5, and several other companies sell hardware load + balancers (typically used in pairs for redundancy), which + offer sophisticated load balancing and auto-failover + features. There are software packages which offer similar + features on commodity hardware, as well. If you have + enough money or need, check these out. The lb-l mailing list is a + good place to research.
On the net there are a lot of nifty CGI programs. But - their usage is usually boring, so a lot of webmaster +
On the net there are many nifty CGI programs. But
+ their usage is usually boring, so a lot of webmasters
don't use them. Even Apache's Action handler feature for
MIME-types is only appropriate when the CGI programs
don't need special URLs (actually PATH_INFO
@@ -748,9 +755,9 @@ while (<STDIN>) {
.scgi
(for secure CGI) which will be processed
by the popular cgiwrap
program. The problem
here is that for instance if we use a Homogeneous URL Layout
- (see above) a file inside the user homedirs has the URL
- /u/user/foo/bar.scgi
. But
- cgiwrap
needs the URL in the form
+ (see above) a file inside the user homedirs might have a URL
+ like /u/user/foo/bar.scgi
, but
+ cgiwrap
needs URLs in the form
/~user/foo/bar.scgi/
. The following rule
solves the problem:
access.log
for a URL subtree) and
wwwidx
(which runs Glimpse on a URL
subtree). We have to provide the URL area to these
- programs so they know on which area they have to act on.
- But usually this is ugly, because they are all the times
- still requested from that areas, i.e., typically we would
+ programs so they know which area they are really working with.
+ But usually this is complicated, because they may still be
+ requested by the alternate URL form, i.e., typically we would
run the swwidx
program from within
/u/user/foo/
via hyperlink to
@@ -774,10 +781,10 @@ RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
/internal/cgi/user/swwidx?i=/u/user/foo/
which is ugly. Because we have to hard-code +
which is ugly, because we have to hard-code both the location of the area and the location of the CGI inside the - hyperlink. When we have to reorganize the area, we spend a + hyperlink. When we have to reorganize, we spend a lot of time changing the various hyperlinks.
Here comes a really esoteric feature: Dynamically
- generated but statically served pages, i.e., pages should be
+ generated but statically served pages, i.e., pages should be
delivered as pure static pages (read from the filesystem
and just passed through), but they have to be generated
- dynamically by the webserver if missing. This way you can
- have CGI-generated pages which are statically served unless
- one (or a cronjob) removes the static contents. Then the
+ dynamically by the web server if missing. This way you can
+ have CGI-generated pages which are statically served unless an
+ admin (or a cron
job) removes the static contents. Then the
contents gets refreshed.
Here a request to page.html
leads to a
+
Here a request for page.html
leads to an
internal run of a corresponding page.cgi
if
- page.html
is still missing or has filesize
+ page.html
is missing or has filesize
null. The trick here is that page.cgi
is a
- usual CGI script which (additionally to its STDOUT
)
+ CGI script which (additionally to its STDOUT
)
writes its output to the file page.html
.
- Once it was run, the server sends out the data of
+ Once it has completed, the server sends out
page.html
. When the webmaster wants to force
- a refresh the contents, he just removes
- page.html
(usually done by a cronjob).
page.html
(typically from cron
).
@@ -865,9 +872,9 @@ RewriteRule ^page\.html$ page.cgi [
Wouldn't it be nice while creating a complex webpage if - the webbrowser would automatically refresh the page every - time we write a new version from within our editor? +
Wouldn't it be nice, while creating a complex web page, if + the web browser would automatically refresh the page every + time we save a new version from within our editor? Impossible?
No! We just combine the MIME multipart feature, the
- webserver NPH feature and the URL manipulation power of
+ web server NPH feature, and the URL manipulation power of
:refresh
to any
- URL causes this to be refreshed every time it gets
+ URL causes the 'page' to be refreshed every time it is
updated on the filesystem.
@@ -1019,18 +1026,17 @@ exit(0);
The
To provide this feature we map the remote webpage or even
- the complete remote webarea to our namespace by the use
- of the Proxy Throughput feature (flag [P]
):
To provide this feature we map the remote web page or even
+ the complete remote web area to our namespace using the
+ Proxy Throughput feature (flag [P]
):
## @@ -1168,7 +1174,7 @@ bsdti1.sdm.de -
We first have to make sure
Sometimes a very special authentication is needed, for - instance a authentication which checks for a set of +
Sometimes very special authentication is needed, for
+ instance authentication which checks for a set of
explicitly configured users. Only these should receive
access and without explicit prompting (which would occur
- when using the Basic Auth via