From e88b1ad8ff78ef80029dbf638c3b17a19571a51e Mon Sep 17 00:00:00 2001 From: Michael Smith Date: Thu, 26 May 2005 23:29:25 +0000 Subject: [PATCH] Make language codes RFC compliant (closes #1208931; thanks to Bernd Groh for reporting). ::PROBLEM: Stylesheets output two-part language codes in the form "zh_CN". But underscores in language codes are actually neither RFC compliant nor compliant with the HTML 4.0 rec. The separator should be a hyphen. To quote the specs: Section 8.1.1, "Language Codes"[1], in the HTML 4.0 Rec. states that: [RFC1766] defines and explains the language codes that MUST be used in HTML documents. Briefly, language codes consist of a primary code and a possibly empty series of subcodes: language-code = primary-code ( "-" subcode )* And in RFC 1766, "Tags for the Identification of Languages"[2], the EBNF for "language tag" is given as: Language-Tag = Primary-tag *( "-" Subtag ) Primary-tag = 1*8ALPHA Subtag = 1*8ALPHA [1] http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1.1 [2] http://www.ietf.org/rfc/rfc1766.txt ::CAUSE: Stylesheets simply pass through language codes unaltered. So if users put "zh_CN" in their source, they will get "zh_CN" in their HTML output. ::FIX: Added a new boolean config parameter, "l10n.lang.value.rfc.compliant", set to 1 by default. If it is non-zero, any underscore in a language code will be converted to a hyphen in HTML output. If it is zero, the language code will be left as-is. ::AFFECTS: This change affects any HTML output that contains two-part language codes. --- xsl/common/l10n.xsl | 9 ++- xsl/html/param.ent | 1 + xsl/html/param.xweb | 2 + xsl/params/l10n.lang.value.rfc.compliant.xml | 60 ++++++++++++++++++++ 4 files changed, 71 insertions(+), 1 deletion(-) create mode 100644 xsl/params/l10n.lang.value.rfc.compliant.xml diff --git a/xsl/common/l10n.xsl b/xsl/common/l10n.xsl index 743b1868d..2e2cdfd0e 100644 --- a/xsl/common/l10n.xsl +++ b/xsl/common/l10n.xsl @@ -137,7 +137,14 @@ - + + + + + + + + diff --git a/xsl/html/param.ent b/xsl/html/param.ent index a331164f5..ed4397212 100644 --- a/xsl/html/param.ent +++ b/xsl/html/param.ent @@ -239,6 +239,7 @@ + diff --git a/xsl/html/param.xweb b/xsl/html/param.xweb index 9d412236e..f31d37ca5 100644 --- a/xsl/html/param.xweb +++ b/xsl/html/param.xweb @@ -406,6 +406,7 @@ url="http://docbook.sourceforge.net/projects/xsl/doc/tools/profiling.html">http: &l10n.gentext.language; &l10n.gentext.default.language; &l10n.gentext.use.xref.language; +&l10n.lang.value.rfc.compliant; The Stylesheet @@ -575,6 +576,7 @@ around all these parameters. + diff --git a/xsl/params/l10n.lang.value.rfc.compliant.xml b/xsl/params/l10n.lang.value.rfc.compliant.xml new file mode 100644 index 000000000..8168d4893 --- /dev/null +++ b/xsl/params/l10n.lang.value.rfc.compliant.xml @@ -0,0 +1,60 @@ + + +l10n.lang.value.rfc.compliant +boolean + + +l10n.lang.value.rfc.compliant +Make value of lang attribute RFC compliant? + + + + + + + + +Description + +If non-zero, ensure that the values for all lang attributes in HTML output are RFC +compliantSection 8.1.1, Language Codes, in the HTML 4.0 Recommendation states that: + +
[RFC1766] defines and explains the language codes +that must be used in HTML documents. +Briefly, language codes consist of a primary code and a possibly +empty series of subcodes: + +language-code = primary-code ( "-" subcode )* + +And in RFC 1766, Tags for the Identification +of Languages, the EBNF for "language tag" is given as: + +Language-Tag = Primary-tag *( "-" Subtag ) +Primary-tag = 1*8ALPHA +Subtag = 1*8ALPHA + +
+
. + +by taking any underscore characters in any lang values found in source documents, and +replacing them with hyphen characters in output HTML files. For +example, zh_CN in a source document becomes +zh-CN in the HTML output form that source. + + +This parameter does not cause any case change in lang values, because RFC 1766 +explicitly states that all "language tags" (as it calls them) "are +to be treated as case insensitive". + +
+ +
+
-- 2.40.0