bpo-28749: Fixed the documentation of the mapping codec APIs. (#487) (#715)

author Serhiy Storchaka <storchaka@gmail.com>

Sun, 19 Mar 2017 18:26:42 +0000 (20:26 +0200)

committer GitHub <noreply@github.com>

Sun, 19 Mar 2017 18:26:42 +0000 (20:26 +0200)
author Serhiy Storchaka <storchaka@gmail.com>
Sun, 19 Mar 2017 18:26:42 +0000 (20:26 +0200)
committer GitHub <noreply@github.com>
Sun, 19 Mar 2017 18:26:42 +0000 (20:26 +0200)
diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst

index 256fef3b2774a92b14d872147457392f086507a0..54d0373881d60ba55dd694370b8983c42f2aa2e3 100644 (file)
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@@ -1388,77 +1388,78 @@ Character Map Codecs
  This codec is special in that it can be used to implement many different codecs
  (and this is in fact what was done to obtain most of the standard codecs
  included in the :mod:`encodings` package). The codec uses mapping to encode and
-decode characters.
-
-Decoding mappings must map single string characters to single Unicode
-characters, integers (which are then interpreted as Unicode ordinals) or ``None``
-(meaning "undefined mapping" and causing an error).
-
-Encoding mappings must map single Unicode characters to single string
-characters, integers (which are then interpreted as Latin-1 ordinals) or ``None``
-(meaning "undefined mapping" and causing an error).
-
-The mapping objects provided must only support the __getitem__ mapping
-interface.
-
-If a character lookup fails with a LookupError, the character is copied as-is
-meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
-resp. Because of this, mappings only need to contain those mappings which map
-characters to different code points.
+decode characters.  The mapping objects provided must support the
+:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
  
  These are the mapping codec APIs:
  
-.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
+.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
                                PyObject *mapping, const char *errors)
  
-   Create a Unicode object by decoding *size* bytes of the encoded string *s* using
-   the given *mapping* object.  Return *NULL* if an exception was raised by the
-   codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
-   dictionary mapping byte or a unicode string, which is treated as a lookup table.
-   Byte values greater that the length of the string and U+FFFE "characters" are
-   treated as "undefined mapping".
+   Create a Unicode object by decoding *size* bytes of the encoded string *s*
+   using the given *mapping* object.  Return *NULL* if an exception was raised
+   by the codec.
+
+   If *mapping* is *NULL*, Latin-1 decoding will be applied.  Else
+   *mapping* must map bytes ordinals (integers in the range from 0 to 255)
+   to Unicode strings, integers (which are then interpreted as Unicode
+   ordinals) or ``None``.  Unmapped data bytes -- ones which cause a
+   :exc:`LookupError`, as well as ones which get mapped to ``None``,
+   ``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
+   an error.
  
  
  .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
  
-   Encode a Unicode object using the given *mapping* object and return the result
-   as Python string object.  Error handling is "strict".  Return *NULL* if an
+   Encode a Unicode object using the given *mapping* object and return the
+   result as a bytes object.  Error handling is "strict".  Return *NULL* if an
     exception was raised by the codec.
  
-The following codec API is special in that maps Unicode to Unicode.
-
+   The *mapping* object must map Unicode ordinal integers to bytes objects,
+   integers in the range from 0 to 255 or ``None``.  Unmapped character
+   ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
+   ``None`` are treated as "undefined mapping" and cause an error.
  
-.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
-                              PyObject *table, const char *errors)
-
-   Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
-   character mapping *table* to it and return the resulting Unicode object.  Return
-   *NULL* when an exception was raised by the codec.
  
-   The *mapping* table must map Unicode ordinal integers to Unicode ordinal
-   integers or ``None`` (causing deletion of the character).
+.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
+                              PyObject *mapping, const char *errors)
  
-   Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
-   and sequences work well.  Unmapped character ordinals (ones which cause a
-   :exc:`LookupError`) are left untouched and are copied as-is.
+   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
+   *mapping* object and return the result as a bytes object.  Return *NULL* if
+   an exception was raised by the codec.
  
     .. deprecated-removed:: 3.3 4.0
        Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
-      :c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
-      <codec-registry>`
+      :c:func:`PyUnicode_AsCharmapString` or
+      :c:func:`PyUnicode_AsEncodedString`.
  
  
-.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
+The following codec API is special in that maps Unicode to Unicode.
+
+.. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
                                PyObject *mapping, const char *errors)
  
-   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
-   *mapping* object and return a Python string object. Return *NULL* if an
-   exception was raised by the codec.
+   Translate a Unicode object using the given *mapping* object and return the
+   resulting Unicode object.  Return *NULL* if an exception was raised by the
+   codec.
+
+   The *mapping* object must map Unicode ordinal integers to Unicode strings,
+   integers (which are then interpreted as Unicode ordinals) or ``None``
+   (causing deletion of the character).  Unmapped character ordinals (ones
+   which cause a :exc:`LookupError`) are left untouched and are copied as-is.
+
+
+.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
+                              PyObject *mapping, const char *errors)
+
+   Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
+   character *mapping* table to it and return the resulting Unicode object.
+   Return *NULL* when an exception was raised by the codec.
  
     .. deprecated-removed:: 3.3 4.0
        Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
-      :c:func:`PyUnicode_AsCharmapString` or
-      :c:func:`PyUnicode_AsEncodedString`.
+      :c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
+      <codec-registry>`
  
  
  MBCS codecs for Windows
diff --git a/Include/unicodeobject.h b/Include/unicodeobject.h

index 9308a6aa96af9009290ae22dfc9f144e80579767..0accc1d5c2ddadd81e3f0ad695f4e98c8d0bbc2c 100644 (file)
--- a/Include/unicodeobject.h
+++ b/Include/unicodeobject.h
@@ -1570,50 +1570,41 @@ PyAPI_FUNC(PyObject*) PyUnicode_EncodeASCII(
  
     This codec uses mappings to encode and decode characters.
  
-   Decoding mappings must map single string characters to single
-   Unicode characters, integers (which are then interpreted as Unicode
-   ordinals) or None (meaning "undefined mapping" and causing an
-   error).
-
-   Encoding mappings must map single Unicode characters to single
-   string characters, integers (which are then interpreted as Latin-1
-   ordinals) or None (meaning "undefined mapping" and causing an
-   error).
-
-   If a character lookup fails with a LookupError, the character is
-   copied as-is meaning that its ordinal value will be interpreted as
-   Unicode or Latin-1 ordinal resp. Because of this mappings only need
-   to contain those mappings which map characters to different code
-   points.
+   Decoding mappings must map byte ordinals (integers in the range from 0 to
+   255) to Unicode strings, integers (which are then interpreted as Unicode
+   ordinals) or None.  Unmapped data bytes (ones which cause a LookupError)
+   as well as mapped to None, 0xFFFE or '\ufffe' are treated as "undefined
+   mapping" and cause an error.
+
+   Encoding mappings must map Unicode ordinal integers to bytes objects,
+   integers in the range from 0 to 255 or None.  Unmapped character
+   ordinals (ones which cause a LookupError) as well as mapped to
+   None are treated as "undefined mapping" and cause an error.
  
  */
  
  PyAPI_FUNC(PyObject*) PyUnicode_DecodeCharmap(
      const char *string,         /* Encoded string */
      Py_ssize_t length,          /* size of string */
-    PyObject *mapping,          /* character mapping
-                                   (char ordinal -> unicode ordinal) */
+    PyObject *mapping,          /* decoding mapping */
      const char *errors          /* error handling */
      );
  
  PyAPI_FUNC(PyObject*) PyUnicode_AsCharmapString(
      PyObject *unicode,          /* Unicode object */
-    PyObject *mapping           /* character mapping
-                                   (unicode ordinal -> char ordinal) */
+    PyObject *mapping           /* encoding mapping */
      );
  
  #ifndef Py_LIMITED_API
  PyAPI_FUNC(PyObject*) PyUnicode_EncodeCharmap(
      const Py_UNICODE *data,     /* Unicode char buffer */
      Py_ssize_t length,          /* Number of Py_UNICODE chars to encode */
-    PyObject *mapping,          /* character mapping
-                                   (unicode ordinal -> char ordinal) */
+    PyObject *mapping,          /* encoding mapping */
      const char *errors          /* error handling */
      );
  PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
      PyObject *unicode,          /* Unicode object */
-    PyObject *mapping,          /* character mapping
-                                   (unicode ordinal -> char ordinal) */
+    PyObject *mapping,          /* encoding mapping */
      const char *errors          /* error handling */
      );
  #endif
@@ -1622,8 +1613,8 @@ PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
     character mapping table to it and return the resulting Unicode
     object.
  
-   The mapping table must map Unicode ordinal integers to Unicode
-   ordinal integers or None (causing deletion of the character).
+   The mapping table must map Unicode ordinal integers to Unicode strings,
+   Unicode ordinal integers or None (causing deletion of the character).
  
     Mapping tables may be dictionaries or sequences. Unmapped character
     ordinals (ones which cause a LookupError) are left untouched and
@@ -1915,8 +1906,8 @@ PyAPI_FUNC(PyObject*) PyUnicode_RSplit(
  /* Translate a string by applying a character mapping table to it and
     return the resulting Unicode object.
  
-   The mapping table must map Unicode ordinal integers to Unicode
-   ordinal integers or None (causing deletion of the character).
+   The mapping table must map Unicode ordinal integers to Unicode strings,
+   Unicode ordinal integers or None (causing deletion of the character).
  
     Mapping tables may be dictionaries or sequences. Unmapped character
     ordinals (ones which cause a LookupError) are left untouched and
author	Serhiy Storchaka <storchaka@gmail.com>
	Sun, 19 Mar 2017 18:26:42 +0000 (20:26 +0200)
committer	GitHub <noreply@github.com>
	Sun, 19 Mar 2017 18:26:42 +0000 (20:26 +0200)
Doc/c-api/unicode.rst		patch \| blob \| history
Include/unicodeobject.h		patch \| blob \| history