From: Nick Kew
Date: Sat, 3 Mar 2018 00:06:24 +0000 (+0000)
Subject: Improve mod_proxy_html docs: add topic sections for customisation and i18n
X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=d72803af617e4c0ca3ac4a1534fa87065e6d8b16;p=apache
Improve mod_proxy_html docs: add topic sections for customisation and i18n
problems that generate user and bugzilla questions.
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1825748 13f79535-47bb-0310-9956-ffa450edef68
---
diff --git a/docs/manual/mod/mod_proxy_html.xml b/docs/manual/mod/mod_proxy_html.xml
index 29594d1410..fde46e228f 100644
--- a/docs/manual/mod/mod_proxy_html.xml
+++ b/docs/manual/mod/mod_proxy_html.xml
@@ -47,12 +47,70 @@ rewritten to work through the gateway. mod_proxy_html serves to rewrite
<a href="http://appserver.example.com/foo/bar.html">foobar</a>
to
<a href="http://www.example.com/appserver/foo/bar.html">foobar</a>
making it accessible from outside.
-
-mod_proxy_html was originally developed at WebÞing, whose
-extensive documentation may be useful to users.
+Introduction
+mod_proxy_html was originally a third-party module for Apache HTTPD
+2.0.x and later versions. It was donated to the ASF in 2011 along with
+mod_xml2enc (see Internationalisation)
+and is a standard module in HTTPD 2.4 and development versions.
+
+Customised HTML Parsing
+Internally, mod_proxy_html uses the HTMLParser module from the
+third-party libxml2 library.
+Unlike other libxml2 parsers, HTMLParser deals with HTML without
+requiring it to be well-formed XML. In particular, it understands
+about implied tags - like a closing </p> - and inserts them
+into the stream of SAX events used by mod_proxy_html. It also has
+explicit knowledge of W3C standards HTML 4 and XHTML 1, and can
+correct certain errors in them.
+mod_proxy_html offers a range of options to control HTML parsing.
+Error correction can be enabled (to your choice of HTML standard)
+or disabled using ProxyHTMLDocType.
+And in response to popular demand, it can be configured
+to treat non-standard elements and attributes as links that may
+need rewriting, and to rewrite links in embedded non-HTML contents
+(stylesheets and scripts). Note that it is not suitable for external
+stylesheets or scripts: for those you should use another parser such as
+mod_substitute or mod_sed.
+The main customisation directives are ProxyHTMLLinks
+and ProxyHTMLEvents. By default these are set in
+configuration file proxy-html.conf, which also contains
+comments to help you customise your parser if required.
+For historical reasons, configuring mod_proxy_html to rewrite
+URLs in scripting events does not by default rewrite URLs in stylesheets.
+This can be changed by uncommenting the relevant line in
+proxy-html.conf as documented there.
+
+
+Internationalisation
+Internally, mod_proxy_html uses a smart HTML parser from the
+third-party libxml2 library.
+The parser uses Unicode (utf-8) internally. This makes it a
+somewhat-complex task to handle other encodings required to process
+many non-English-language websites. If this is not handled correctly,
+websites that display non-ASCII characters in encodings other than
+utf-8 (Unicode) will display incorrectly.
+From the first release in 2003 to the donation to Apache in 2011,
+internationalisation (i18n) support developed from near nothing to a
+sophisticated framework capable of applying rules from HTTP, HTML and XML
+to detect a document's encoding and handle it correctly. However,
+this processing was common to mod_proxy_html and other modules using
+libxml2, so it made sense to move it to a separate module rather than
+maintain it in multiple places. That module is mod_xml2enc,
+and must be loaded for i18n to work.
+The interaction of mod_proxy_html with mod_xml2enc is too complex to
+be configured using regular filter configuration, including
+mod_filter directives. Thus while mod_proxy_html can
+still be configured using regular filter directives, this will not support
+i18n at all. Instead, a new directive ProxyHTMLEnable
+has been introduced to configure both mod_proxy_html's filter and mod_xml2enc.
+It is recommended that you always use ProxyHTMLEnable even where i18n
+support is not required. Note that this is a change from earlier
+versions where filter directives were used to activate mod_proxy_html.
+
+
+
ProxyHTMLMeta
Turns on or off extra pre-parsing of metadata in HTML
@@ -376,7 +434,7 @@ size and avoid the need to resize the buffer dynamically during a request.
virtual hostdirectory
Version 2.4 and later; available as a third-party
-for earlier 2.x versions
+module for earlier 2.x versions
Specifies one or more attributes to treat as scripting events and
apply ProxyHTMLURLMaps to where enabled.
@@ -386,7 +444,10 @@ You can specify any number of attributes in one or more
one scope so that one overrides the other, you'll need to specify a complete
set in each of those scopes.
A default configuration is supplied in proxy-html.conf
-and defines the events in standard HTML 4 and XHTML 1.
+and defines the events in standard HTML 4 and XHTML 1. This can be
+extended to apply to URLs embedded in CSS stylesheet attributes
+by adding the style attribute to ProxyHTMLEvents, although
+this is not enabled in the shipped default.