From: Norman Walsh Date: Mon, 4 Feb 2002 21:29:44 +0000 (+0000) Subject: Snapshot X-Git-Tag: release/1.79.1~6^2~5942 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=5debae562aa8ef6ce63db2e5fc316d7816c3b6d9;p=docbook-dsssl Snapshot --- diff --git a/xmlcharent/spec/entities.xml b/xmlcharent/spec/entities.xml index 3efb76810..c5ff47ec3 100644 --- a/xmlcharent/spec/entities.xml +++ b/xmlcharent/spec/entities.xml @@ -1,10 +1,12 @@ + %glyphs; + + @@ -27,7 +29,7 @@ ]>
-XML Character Entities Version 0.1 +XML Character Entities Version &version; OASIS DocBook Technical Committee @@ -39,11 +41,10 @@ -19 Nov 2001 - -$Id$ +04 Feb 2001 +$Id$ -2001 +20012002 The Organization for the Advancement of Structured Information Standards [OASIS]. All Rights Reserved. @@ -88,15 +89,17 @@ sent to Abstract -Non-normative Annex D of defines 19 -standard character entity sets. The SGML declarations for these -entities use the specific character data (SDATA) entity type. The -SDATA entity type is not supported in XML. This &standard; defines a -set of XML alternatives to the 19 standard character entity -sets. + +This &standard; defines XML encodings of the 19 standard +character entity sets defined in Non-normative Annex D of . + + Working Draft + 04 Feb 2001 + Working Draft 19 Nov 2001 @@ -104,8 +107,12 @@ sets. +This &standard; defines XML encodings of the standard SGML +character entity sets. + Non-normative Annex D of defines 19 -standard character entity sets ( +standard SGML character entity sets: + Added Latin 1 Added Latin 2 Greek Letters @@ -125,27 +132,414 @@ standard character entity sets ( Added Math Symbols: Negated Relations Added Math Symbols: Arrow Relations Added Math Symbols: Delimiters -). The SGML declarations for these entities use the -specific character data (SDATA) entity type. The SDATA entity type is -not supported in XML, so alternative XML declarations must be used. -This &standard; defines a set of XML alternatives to the 19 standard -character entity sets. +. The SGML declarations for these entities use the +specific character data (SDATA) entity type that is +not supported in XML, so alternative XML declarations are necessary. + -In XML, the specific character data of each entity can be expressed +In XML, the specific character data of most entities can be expressed as a character.
XML Character Entity Sets - -The Unicode reference glyphs in this document are examples -only. Some characters have more than one Unicode representation and -different Unicode characters may be appropriate in different -contexts. The glyph images offer only one of many possible -representations for the specified character. +The character entity sets defined by this &standard; are +summarized in through . + +In order to use these entities in a document, they must be declared. +Entities can be declared in the external subset or the internal subset, +as described in . An example document, with the +declaration in the internal subset, is shown in . + + +Declaring and Using the ISO Latin 1 Character Entity Set +<!DOCTYPE doc [ +<!ENTITY iso-lat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML" + "&baseuri;/isolat1.ent"> +]> +<doc> +<p>This document declares the ISO Latin 1 Character Entity Set, providing +access to the ISO Latin 1 entities, such as "&eacute;" and "&copy;".</p> +</doc> + + + +Non-validating XML Parsers may choose not to process externally +declared entities. This &standard; does not alter the semantics of XML +processors. If a processor does not see the declaration for an entity, +it will not be able to report the correct replacement text for that +entity. -
Added Latin 1 +
Multi-Character Replacements + +The replacement text of some entities includes more than a single +Unicode character. Some characters are composed with the +combining reverse solidus overlay (20E5) +and some are composed with a variation selector +(FE00, FE01, …). + + +
+ +
Duplicate Entities + +Historically, the inodot entity is multiply +defined in iso-lat2.ent and iso-amso.ent. If both entity sets are +included, some parsers will warn about redefinition of this entity. +The warning can be ignored. + +
+ +
Entities with no Mapping + +There are a small number of entities that have no + representation. These entities are all mapped to +the Unicode character FFFD, the +replacement character. + + + + + + + + + Entity Name + Entity Set + Description + + + + + fjlig + iso-pub.ent + Small fj ligature + + + gnap + iso-amsn.ent + Greater, not approximate + + + jnodot + iso-amso.ent + Small j, no dot + + + lnap + iso-amsn.ent + Less, not approximate + + + lpargt + iso-amsc.ent + Greater than, left arc + + + nsmid + iso-amsn.ent + Negated short mid + + + prnE + iso-amsn.ent + Precedes, not double equals + + + rpargt + iso-amsc.ent + Right paren, greater than + + + scnE + iso-amsn.ent + Succeeds, not double equals + + + smid + iso-amsr.ent + shortmid r + + + vsubnE + iso-amsn.ent + Subset not double equals, variant + + + + + +Users needing these characters will have to rely on the private use +area or other non-portable mechanisms to access them. + +
+ +
Entities with Substituted Mappings + +There are a few more for which there is no specific + representation but where a reasonable +substitution has been used: + + + + + + + + + Entity Name + Entity Set + Substitution + Description + + + + + bepsi + iso-amsr.ent + 220D + Back epsilon: such that + + + ges + iso-amsr.ent + 2265 + Greater-or-equal, slanted + + + gvnE + iso-amsn.ent + 2269 + Gt, vert, not double equals + + + iff + iso-tech.ent + 21D4 + If and only if + + + les + iso-amsr.ent + 2264 + Less-than-or-equal, slanted + + + lozf + iso-pub.ent + 2726 + Lozenge, filled + + + lvnE + iso-amsn.ent + 2268 + Less, vert, not double equals + + + nge + iso-amsn.ent + 2271 + Neither greater-than nor equal to + + + nle + iso-amsn.ent + 2270 + Not less-than-or-equal + + + npre + iso-amsn.ent + 22E0 + Not precedes, equals + + + nsce + iso-amsn.ent + 22E1 + Not succeeds, equals + + + nspar + iso-amsn.ent + 2226 + Not short parallel + + + pre + iso-amsr.ent + 227C + Precedes, equals + + + spar + iso-amsr.ent + 2225 + Short parallel + + + ssetmn + iso-amsb.ent + 2216 + Small set minus (reverse solidus) + + + star + iso-pub.ent + 22C6 + Star operator + + + starf + iso-pub.ent + 2605 + Black star + + + thkap + iso-amsr.ent + 2248 + Thick approximate + + + thksim + iso-amsr.ent + 223C + Thick similar + + + vsubne + iso-amsn.ent + 228A + Subset, not equals, variant + + + vsupnE + iso-amsn.ent + 228B + Subset not double equals, variant + + + vsupne + iso-amsn.ent + 228B + Superset, not equals, variant + + + xhArr + iso-amsa.ent + 2194 + Long left and right double arr + + + xharr + iso-amsa.ent + 2194 + Long left and right arr + + + xlArr + iso-amsa.ent + 21D0 + Long left double arrow + + + xrArr + iso-amsa.ent + 21D2 + Long right double arr + + + ssmile + iso-amsr.ent + 2323 + Small smile + + + sfrown + iso-amsr.ent + 2322 + Small frown + + + + + +Users needing alternate glyphs for these characters will have to +rely on redefining them to use the private use area or other +non-portable mechanisms to access them. + +
+ +
+ +
XML Character Elements + +Named XML entities (except for the five +predefined entities) cannot +be used if they are not declared. Entity declaration requires either an external +or an internal subset. Some classes of applications forbid the occurrence of +markup declarations in documents. For these documents, named character entities are +inaccessible. + +In this section, we introduce an XML vocabulary with the semantics of +character entity reference. This &standard; defines the semantics of elements +and attributes declared in the &charns; +namespace. + +This namespace contains exactly one element, char. +The char element has two attributes, +entity and +name. They are mutually exclusive. + +The entity attribute identifies +characters by their character entity names. (The set of valid names +is the closed set of names associated with character entity sets defined +by this &standard;.) Case is significant in entity names. + +The name attribute identifies +characters by their Unicode character names. (The set of valid names +is the set of character names published in the +specification, or any later version of that specification.) Case is +insignificant in character names. + +The definition of this namespace is +shown in figure . + +
+The RELAX NG Definition of the <literal>&charns;</literal> Namespace + + + + + + +
+ + +shows a sample document using this mechanism. + + +Declaring and Using the ISO Latin 1 Character Entity Set +<doc xmlns:e="&charns;"> +<p>This document uses the character names element to access +character entities, such as "<e:char name="eacute"/>" and +"<e:char name="COPYRIGHT SIGN"/>".</p> +</doc> + + +The character names element is limited to contexts where elements may +occur. In particular, elements may not occur in XML attribute values. Note, +however, that internationalization requirements such as bidirectional language +support and Ruby already require structure in arbitrary contexts. It is probably +an error to use attributes for human-readable content. + +
+Added Latin 1 Identifiers for this entity set: @@ -160,9 +554,9 @@ The following character entities are defined in this entity set: &iso-lat1-table; -
+ -
Added Latin 2 +Added Latin 2 Identifiers for this entity set: @@ -176,9 +570,9 @@ The following character entities are defined in this entity set: &iso-lat2-table; -
+ -
Greek Letters +Greek Letters Identifiers for this entity set: @@ -192,9 +586,9 @@ The following character entities are defined in this entity set: &iso-grk1-table; -
+ -
Monotoniko Greek +Monotoniko Greek Identifiers for this entity set: @@ -208,9 +602,9 @@ The following character entities are defined in this entity set: &iso-grk2-table; -
+ -
Russian Cyrillic +Russian Cyrillic Identifiers for this entity set: @@ -224,9 +618,9 @@ The following character entities are defined in this entity set: &iso-cyr1-table; -
+ -
Non-Russian Cyrillic +Non-Russian Cyrillic Identifiers for this entity set: @@ -240,9 +634,9 @@ The following character entities are defined in this entity set: &iso-cyr2-table; -
+ -
Numeric and Special Graphic +Numeric and Special Graphic Identifiers for this entity set: @@ -256,9 +650,9 @@ The following character entities are defined in this entity set: &iso-num-table; -
+ -
Diacritical Marks +Diacritical Marks Identifiers for this entity set: @@ -272,9 +666,9 @@ The following character entities are defined in this entity set: &iso-dia-table; -
+ -
Publishing +Publishing Identifiers for this entity set: @@ -288,9 +682,9 @@ The following character entities are defined in this entity set: &iso-pub-table; -
+ -
Box and Line Drawing +Box and Line Drawing Identifiers for this entity set: @@ -304,9 +698,9 @@ The following character entities are defined in this entity set: &iso-box-table; -
+ -
General Technical +General Technical Identifiers for this entity set: @@ -320,9 +714,9 @@ The following character entities are defined in this entity set: &iso-tech-table; -
+ -
Greek Symbols +Greek Symbols Identifiers for this entity set: @@ -336,9 +730,9 @@ The following character entities are defined in this entity set: &iso-grk3-table; -
+ -
Alternative Greek Symbols +Alternative Greek Symbols Identifiers for this entity set: @@ -352,9 +746,9 @@ The following character entities are defined in this entity set: &iso-grk4-table; -
+ -
Added Math Symbols: Ordinary +Added Math Symbols: Ordinary Identifiers for this entity set: @@ -368,9 +762,9 @@ The following character entities are defined in this entity set: &iso-amso-table; -
+ -
Added Math Symbols: Binary Operators +Added Math Symbols: Binary Operators Identifiers for this entity set: @@ -384,9 +778,9 @@ The following character entities are defined in this entity set: &iso-amsb-table; -
+ -
Added Math Symbols: Relations +Added Math Symbols: Relations Identifiers for this entity set: @@ -400,9 +794,9 @@ The following character entities are defined in this entity set: &iso-amsr-table; -
+ -
Added Math Symbols: Negated Relations +Added Math Symbols: Negated Relations Identifiers for this entity set: @@ -416,9 +810,9 @@ The following character entities are defined in this entity set: &iso-amsn-table; -
+ -
Added Math Symbols: Arrow Relations +Added Math Symbols: Arrow Relations Identifiers for this entity set: @@ -432,9 +826,9 @@ The following character entities are defined in this entity set: &iso-amsa-table; -
+ -
Added Math Symbols: Delimiters +Added Math Symbols: Delimiters Identifiers for this entity set: @@ -448,10 +842,16 @@ The following character entities are defined in this entity set: &iso-amsc-table; -
- + -Unicode Glyphs +Unicode Glyphs + +The Unicode reference glyphs in this document are examples +only. Some characters have more than one Unicode representation and +different Unicode characters may be appropriate in different +contexts. The glyph images offer only one of many possible +representations for the specified character. + Most of the glyphs this reference are from the TmsPF Roman font by @@ -475,13 +875,12 @@ Technical Report #8, which describes Unicode Version 2.1. Dennis Evans -Patricia Gee-Best Dick Hamilton Nancy (Paisner) Harrison Sabine Ocker -Michael Sabrio Michael Smith -Norman Walsh (Chair,Editor) +Bob Stayton +Norman Walsh (Chair) @@ -491,7 +890,9 @@ Technical Report #8, which describes Unicode Version 2.1. Normative - + + +