From: Nick Kew Date: Sat, 3 Mar 2018 00:06:24 +0000 (+0000) Subject: Improve mod_proxy_html docs: add topic sections for customisation and i18n X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=d72803af617e4c0ca3ac4a1534fa87065e6d8b16;p=apache Improve mod_proxy_html docs: add topic sections for customisation and i18n problems that generate user and bugzilla questions. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1825748 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/docs/manual/mod/mod_proxy_html.xml b/docs/manual/mod/mod_proxy_html.xml index 29594d1410..fde46e228f 100644 --- a/docs/manual/mod/mod_proxy_html.xml +++ b/docs/manual/mod/mod_proxy_html.xml @@ -47,12 +47,70 @@ rewritten to work through the gateway. mod_proxy_html serves to rewrite <a href="http://appserver.example.com/foo/bar.html">foobar</a> to <a href="http://www.example.com/appserver/foo/bar.html">foobar</a> making it accessible from outside.

- -

mod_proxy_html was originally developed at WebÞing, whose -extensive documentation may be useful to users.

+
Introduction +

mod_proxy_html was originally a third-party module for Apache HTTPD +2.0.x and later versions. It was donated to the ASF in 2011 along with +mod_xml2enc (see Internationalisation) +and is a standard module in HTTPD 2.4 and development versions.

+
+
Customised HTML Parsing +

Internally, mod_proxy_html uses the HTMLParser module from the +third-party libxml2 library. +Unlike other libxml2 parsers, HTMLParser deals with HTML without +requiring it to be well-formed XML. In particular, it understands +about implied tags - like a closing </p> - and inserts them +into the stream of SAX events used by mod_proxy_html. It also has +explicit knowledge of W3C standards HTML 4 and XHTML 1, and can +correct certain errors in them.

+

mod_proxy_html offers a range of options to control HTML parsing. +Error correction can be enabled (to your choice of HTML standard) +or disabled using ProxyHTMLDocType. +And in response to popular demand, it can be configured +to treat non-standard elements and attributes as links that may +need rewriting, and to rewrite links in embedded non-HTML contents +(stylesheets and scripts). Note that it is not suitable for external +stylesheets or scripts: for those you should use another parser such as +mod_substitute or mod_sed. +The main customisation directives are ProxyHTMLLinks +and ProxyHTMLEvents. By default these are set in +configuration file proxy-html.conf, which also contains +comments to help you customise your parser if required.

+For historical reasons, configuring mod_proxy_html to rewrite +URLs in scripting events does not by default rewrite URLs in stylesheets. +This can be changed by uncommenting the relevant line in +proxy-html.conf as documented there. + +
+
Internationalisation +

Internally, mod_proxy_html uses a smart HTML parser from the +third-party libxml2 library. +The parser uses Unicode (utf-8) internally. This makes it a +somewhat-complex task to handle other encodings required to process +many non-English-language websites. If this is not handled correctly, +websites that display non-ASCII characters in encodings other than +utf-8 (Unicode) will display incorrectly.

+

From the first release in 2003 to the donation to Apache in 2011, +internationalisation (i18n) support developed from near nothing to a +sophisticated framework capable of applying rules from HTTP, HTML and XML +to detect a document's encoding and handle it correctly. However, +this processing was common to mod_proxy_html and other modules using +libxml2, so it made sense to move it to a separate module rather than +maintain it in multiple places. That module is mod_xml2enc, +and must be loaded for i18n to work.

+

The interaction of mod_proxy_html with mod_xml2enc is too complex to +be configured using regular filter configuration, including +mod_filter directives. Thus while mod_proxy_html can +still be configured using regular filter directives, this will not support +i18n at all. Instead, a new directive ProxyHTMLEnable +has been introduced to configure both mod_proxy_html's filter and mod_xml2enc. +It is recommended that you always use ProxyHTMLEnable even where i18n +support is not required. Note that this is a change from earlier +versions where filter directives were used to activate mod_proxy_html.

+ +
+ ProxyHTMLMeta Turns on or off extra pre-parsing of metadata in HTML @@ -376,7 +434,7 @@ size and avoid the need to resize the buffer dynamically during a request. virtual hostdirectory Version 2.4 and later; available as a third-party -for earlier 2.x versions +module for earlier 2.x versions

Specifies one or more attributes to treat as scripting events and apply ProxyHTMLURLMaps to where enabled. @@ -386,7 +444,10 @@ You can specify any number of attributes in one or more one scope so that one overrides the other, you'll need to specify a complete set in each of those scopes.

A default configuration is supplied in proxy-html.conf -and defines the events in standard HTML 4 and XHTML 1.

+and defines the events in standard HTML 4 and XHTML 1. This can be +extended to apply to URLs embedded in CSS stylesheet attributes +by adding the style attribute to ProxyHTMLEvents, although +this is not enabled in the shipped default.