From 579a358e61292774a9ab57fe7e92441777e48be4 Mon Sep 17 00:00:00 2001 From: Georg Brandl Date: Fri, 18 Sep 2009 21:35:59 +0000 Subject: [PATCH] #6930: clarify description about byteorder handling in UTF decoder routines. --- Doc/c-api/unicode.rst | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst index 1249ed725f..4ab1c21bfe 100644 --- a/Doc/c-api/unicode.rst +++ b/Doc/c-api/unicode.rst @@ -414,10 +414,13 @@ These are the UTF-32 codec APIs: *byteorder == 0: native order *byteorder == 1: big endian - and then switches if the first four bytes of the input data are a byte order mark - (BOM) and the specified byte order is native order. This BOM is not copied into - the resulting Unicode string. After completion, *\*byteorder* is set to the - current byte order at the end of input data. + If ``*byteorder`` is zero, and the first four bytes of the input data are a + byte order mark (BOM), the decoder switches to this byte order and the BOM is + not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or + ``1``, any byte order mark is copied to the output. + + After completion, *\*byteorder* is set to the current byte order at the end + of input data. In a narrow build codepoints outside the BMP will be decoded as surrogate pairs. @@ -442,8 +445,7 @@ These are the UTF-32 codec APIs: .. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) Return a Python bytes object holding the UTF-32 encoded value of the Unicode - data in *s*. If *byteorder* is not ``0``, output is written according to the - following byte order:: + data in *s*. Output is written according to the following byte order:: byteorder == -1: little endian byteorder == 0: native byte order (writes a BOM mark) @@ -487,10 +489,14 @@ These are the UTF-16 codec APIs: *byteorder == 0: native order *byteorder == 1: big endian - and then switches if the first two bytes of the input data are a byte order mark - (BOM) and the specified byte order is native order. This BOM is not copied into - the resulting Unicode string. After completion, *\*byteorder* is set to the - current byte order at the. + If ``*byteorder`` is zero, and the first two bytes of the input data are a + byte order mark (BOM), the decoder switches to this byte order and the BOM is + not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or + ``1``, any byte order mark is copied to the output (where it will result in + either a ``\ufeff`` or a ``\ufffe`` character). + + After completion, *\*byteorder* is set to the current byte order at the end + of input data. If *byteorder* is *NULL*, the codec starts in native order mode. @@ -520,8 +526,7 @@ These are the UTF-16 codec APIs: .. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) Return a Python string object holding the UTF-16 encoded value of the Unicode - data in *s*. If *byteorder* is not ``0``, output is written according to the - following byte order:: + data in *s*. Output is written according to the following byte order:: byteorder == -1: little endian byteorder == 0: native byte order (writes a BOM mark) -- 2.50.1