From: Peter Eisentraut Date: Tue, 7 Sep 2010 18:54:08 +0000 (+0000) Subject: Clarify that surrogate pairs are not encoded in UTF-8 directly X-Git-Tag: REL9_0_0~18 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=f48fb5d823d1b62e57ab6a41928baccc10f4b559;p=postgresql Clarify that surrogate pairs are not encoded in UTF-8 directly --- diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 562568c4af..5cace96a45 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1,4 +1,4 @@ - + SQL Syntax @@ -236,12 +236,15 @@ U&"d!0061t!+000061" UESCAPE '!' The Unicode escape syntax works only when the server encoding is - UTF8. When other server encodings are used, only code points in - the ASCII range (up to \007F) can be specified. - Both the 4-digit and the 6-digit form can be used to specify - UTF-16 surrogate pairs to compose characters with code points - larger than U+FFFF (although the availability of - the 6-digit form technically makes this unnecessary). + UTF8. When other server encodings are used, only code + points in the ASCII range (up to \007F) can be + specified. Both the 4-digit and the 6-digit form can be used to + specify UTF-16 surrogate pairs to compose characters with code + points larger than U+FFFF, although the availability of the + 6-digit form technically makes this unnecessary. (When surrogate + pairs are used when the server encoding is UTF8, they + are first combined into a single code point that is then encoded + in UTF-8.) @@ -431,13 +434,15 @@ SELECT 'foo' 'bar'; The Unicode escape syntax works fully only when the server - encoding is UTF-8. When other server encodings are used, only - code points in the ASCII range (up to \u007F) can be - specified. Both the 4-digit and the 8-digit form can be used to - specify UTF-16 surrogate pairs to compose characters with code - points larger than U+FFFF (although the - availability of the 8-digit form technically makes this - unnecessary). + encoding is UTF8. When other server encodings are + used, only code points in the ASCII range (up + to \u007F) can be specified. Both the 4-digit and + the 8-digit form can be used to specify UTF-16 surrogate pairs to + compose characters with code points larger than U+FFFF, although + the availability of the 8-digit form technically makes this + unnecessary. (When surrogate pairs are used when the server + encoding is UTF8, they are first combined into a + single code point that is then encoded in UTF-8.) @@ -518,13 +523,15 @@ U&'d!0061t!+000061' UESCAPE '!' The Unicode escape syntax works only when the server encoding is - UTF8. When other server encodings are used, only code points in - the ASCII range (up to \007F) can be - specified. - Both the 4-digit and the 6-digit form can be used to specify - UTF-16 surrogate pairs to compose characters with code points - larger than U+FFFF (although the availability - of the 6-digit form technically makes this unnecessary). + UTF8. When other server encodings are used, only + code points in the ASCII range (up to \007F) + can be specified. Both the 4-digit and the 6-digit form can be + used to specify UTF-16 surrogate pairs to compose characters with + code points larger than U+FFFF, although the availability of the + 6-digit form technically makes this unnecessary. (When surrogate + pairs are used when the server encoding is UTF8, they + are first combined into a single code point that is then encoded + in UTF-8.)