Describe 'surrogateescape' in the documentation.

author Andrew Kuchling <amk@amk.ca>

Sun, 16 Jun 2013 16:58:48 +0000 (12:58 -0400)

committer Andrew Kuchling <amk@amk.ca>

Sun, 16 Jun 2013 16:58:48 +0000 (12:58 -0400)
author Andrew Kuchling <amk@amk.ca>
Sun, 16 Jun 2013 16:58:48 +0000 (12:58 -0400)
committer Andrew Kuchling <amk@amk.ca>
Sun, 16 Jun 2013 16:58:48 +0000 (12:58 -0400)
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst

index 0d38253cd250344b014e8488e5a48e45ed7445fc..e80fc3a33d2503855307768e4b704e5aef914ec3 100644 (file)
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -78,7 +78,11 @@ It defines the following functions:
       reference (for encoding only)
     * ``'backslashreplace'``: replace with backslashed escape sequences (for
       encoding only)
-   * ``'surrogateescape'``: replace with surrogate U+DCxx, see :pep:`383`
+   * ``'surrogateescape'``: on decoding, replace with code points in the Unicode
+     Private Use Area ranging from U+DC80 to U+DCFF.  These private code
+     points will then be turned back into the same bytes when the
+     ``surrogateescape`` error handler is used when encoding the data.
+     (See :pep:`383` for more.)
  
     as well as any other error handling name defined via :func:`register_error`.
  
diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst

index 3059e178152d1f406e3a2472185eeacac9d709cd..04fb95ee2f4365625389d2d07b5c0a661419dc40 100644 (file)
--- a/Doc/library/functions.rst
+++ b/Doc/library/functions.rst
@@ -895,16 +895,36 @@ are always available.  They are listed here in alphabetical order.
     the list of supported encodings.
  
     *errors* is an optional string that specifies how encoding and decoding
-   errors are to be handled--this cannot be used in binary mode.  Pass
-   ``'strict'`` to raise a :exc:`ValueError` exception if there is an encoding
-   error (the default of ``None`` has the same effect), or pass ``'ignore'`` to
-   ignore errors.  (Note that ignoring encoding errors can lead to data loss.)
-   ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
-   where there is malformed data.  When writing, ``'xmlcharrefreplace'``
-   (replace with the appropriate XML character reference) or
-   ``'backslashreplace'`` (replace with backslashed escape sequences) can be
-   used.  Any other error handling name that has been registered with
-   :func:`codecs.register_error` is also valid.
+   errors are to be handled--this cannot be used in binary mode.
+   A variety of standard error handlers are available, though any
+   error handling name that has been registered with
+   :func:`codecs.register_error` is also valid.  The standard names
+   are:
+
+   * ``'strict'`` to raise a :exc:`ValueError` exception if there is
+     an encoding error.  The default value of ``None`` has the same
+     effect.
+
+   * ``'ignore'`` ignores errors.  Note that ignoring encoding errors
+     can lead to data loss.
+
+   * ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
+     where there is malformed data.
+
+   * ``'surrogateescape'`` will represent any incorrect bytes as code
+     points in the Unicode Private Use Area ranging from U+DC80 to
+     U+DCFF.  These private code points will then be turned back into
+     the same bytes when the ``surrogateescape`` error handler is used
+     when writing data.  This is useful for processing files in an
+     unknown encoding.
+
+   * ``'xmlcharrefreplace'`` is only supported when writing to a file.
+     Characters not supported by the encoding are replaced with the
+     appropriate XML character reference ``&#nnn;``.
+
+   * ``'backslashreplace'`` (also only supported when writing)
+     replaces unsupported characters with Python's backslashed escape
+     sequences.
  
     .. index::
        single: universal newlines; open() built-in function
diff --git a/Lib/codecs.py b/Lib/codecs.py

index 48d4c9c73952ef448ba2c3bc0268e9d2da3903fe..6a6eb900c72ee7ea6d5b8d1aaa603c4f1ec2ca34 100644 (file)
--- a/Lib/codecs.py
+++ b/Lib/codecs.py
@@ -105,6 +105,7 @@ class Codec:
                      Python will use the official U+FFFD REPLACEMENT
                      CHARACTER for the builtin Unicode codecs on
                      decoding and '?' on encoding.
+         'surrogateescape' - replace with private codepoints U+DCnn.
           'xmlcharrefreplace' - Replace with the appropriate XML
                                 character reference (only for encoding).
           'backslashreplace'  - Replace with backslashed escape sequences
diff --git a/Modules/_io/_iomodule.c b/Modules/_io/_iomodule.c

index b5cd1769d4f441ace8e850101ac6dd71a41471ab..4a7e758cda5e657fbb1f525a4a9612f59c181cca 100644 (file)
--- a/Modules/_io/_iomodule.c
+++ b/Modules/_io/_iomodule.c
@@ -168,8 +168,8 @@ PyDoc_STRVAR(open_doc,
  "'strict' to raise a ValueError exception if there is an encoding error\n"
  "(the default of None has the same effect), or pass 'ignore' to ignore\n"
  "errors. (Note that ignoring encoding errors can lead to data loss.)\n"
-"See the documentation for codecs.register for a list of the permitted\n"
-"encoding error strings.\n"
+"See the documentation for codecs.register or run 'help(codecs.Codec)'\n"
+"for a list of the permitted encoding error strings.\n"
  "\n"
  "newline controls how universal newlines works (it only applies to text\n"
  "mode). It can be None, '', '\\n', '\\r', and '\\r\\n'.  It works as\n"
diff --git a/Modules/_io/textio.c b/Modules/_io/textio.c

index cff9c6e9372d025bf4f6d3ec079f4186ecd9277b..cd751c1400371aa40a41fec913f71c08d0a13a94 100644 (file)
--- a/Modules/_io/textio.c
+++ b/Modules/_io/textio.c
@@ -642,8 +642,9 @@ PyDoc_STRVAR(textiowrapper_doc,
      "encoding gives the name of the encoding that the stream will be\n"
      "decoded or encoded with. It defaults to locale.getpreferredencoding(False).\n"
      "\n"
-    "errors determines the strictness of encoding and decoding (see the\n"
-    "codecs.register) and defaults to \"strict\".\n"
+    "errors determines the strictness of encoding and decoding (see\n"
+    "help(codecs.Codec) or the documentation for codecs.register) and\n"
+    "defaults to \"strict\".\n"
      "\n"
      "newline controls how line endings are handled. It can be None, '',\n"
      "'\\n', '\\r', and '\\r\\n'.  It works as follows:\n"
author	Andrew Kuchling <amk@amk.ca>
	Sun, 16 Jun 2013 16:58:48 +0000 (12:58 -0400)
committer	Andrew Kuchling <amk@amk.ca>
	Sun, 16 Jun 2013 16:58:48 +0000 (12:58 -0400)
Doc/library/codecs.rst		patch \| blob \| history
Doc/library/functions.rst		patch \| blob \| history
Lib/codecs.py		patch \| blob \| history
Modules/_io/_iomodule.c		patch \| blob \| history
Modules/_io/textio.c		patch \| blob \| history