Victor Stinner [Wed, 29 Aug 2018 20:21:32 +0000 (22:21 +0200)]
bpo-34523: Support surrogatepass in locale codecs (GH-8995)
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.
Changes:
* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
the _Py_error_handler enum instead of "int surrogateescape" to pass
the error handler. These functions now return -3 if the error
handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
workaround anymore.
_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
chosen even before calling Py_Initialize().
_PyCoreConfig.filesystem_encoding is now the reference, instead of
Py_FileSystemDefaultEncoding, for the Python filesystem encoding.
Changes:
* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
system encoding.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
now use the interpreter configuration rather than
Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
private functions to only modify Py_FileSystemDefaultEncoding and
Py_FileSystemDefaultEncodeErrors in coreconfig.c.
* _Py_CoerceLegacyLocale() now takes an int rather than
_PyCoreConfig for the warning.
Victor Stinner [Wed, 29 Aug 2018 09:25:15 +0000 (11:25 +0200)]
bpo-34485, Windows: LC_CTYPE set to user preference (GH-8988)
On Windows, the LC_CTYPE is now set to the user preferred locale at
startup: _Py_SetLocaleFromEnv(LC_CTYPE) is now called during the
Python initialization. Previously, the LC_CTYPE locale was "C" at
startup, but changed when calling setlocale(LC_CTYPE, "") or
setlocale(LC_ALL, "").
pymain_read_conf() now also calls _Py_SetLocaleFromEnv(LC_CTYPE) to
behave as _Py_InitializeCore(). Moreover, it doesn't save/restore the
LC_ALL anymore.
On Windows, standard streams like sys.stdout now always use
surrogateescape error handler by default (ignore the locale).
Victor Stinner [Wed, 29 Aug 2018 07:58:12 +0000 (09:58 +0200)]
bpo-34485: stdout uses surrogateescape on POSIX locale (GH-8986)
Standard streams like sys.stdout now use the "surrogateescape" error
handler, instead of "strict", on the POSIX locale (when the C locale is not
coerced and the UTF-8 Mode is disabled).
Victor Stinner [Tue, 28 Aug 2018 22:16:53 +0000 (00:16 +0200)]
bpo-34485: Fix _Py_InitializeCore() for C locale coercion (GH-8979)
* _Py_InitializeCore() now sets the LC_CTYPE locale to the user
preferred locale before checking if the C locale should be coerced
or not in _PyCoreConfig_Read().
* Fix pymain_read_conf(): remember if the C locale has been coerced
when the configuration should be read again if the encoding has
changed.
Victor Stinner [Tue, 28 Aug 2018 21:26:33 +0000 (23:26 +0200)]
bpo-34485: Enhance init_sys_streams() (GH-8978)
Python now gets the locale encoding with C code to initialize the encoding
of standard streams like sys.stdout. Moreover, the encoding is now
initialized to the Python codec name to get a normalized encoding name and
to ensure that the codec is loaded. The change avoids importing
_bootlocale and _locale modules at startup by default.
When the PYTHONIOENCODING environment variable only contains an encoding,
the error handler is now is now set explicitly to "strict".
Rename also get_default_standard_stream_error_handler() to
get_stdio_errors().
Reduce the buffer to format the "cpXXX" string (Windows locale encoding).
Victor Stinner [Tue, 28 Aug 2018 15:27:36 +0000 (17:27 +0200)]
bpo-34403: On HP-UX, force ASCII for C locale (GH-8969)
On HP-UX with C or POSIX locale, sys.getfilesystemencoding() now returns
"ascii" instead of "roman8" (when the UTF-8 Mode is disabled and the C locale
is not coerced).
nl_langinfo(CODESET) announces "roman8" whereas it uses the Latin1
encoding in practice.
Victor Stinner [Tue, 28 Aug 2018 10:35:44 +0000 (12:35 +0200)]
bpo-34527: POSIX locale enables the UTF-8 Mode (GH-8972)
* The UTF-8 Mode is now also enabled by the "POSIX" locale, not only
by the "C" locale.
* On FreeBSD, Py_DecodeLocale() and Py_EncodeLocale() now also forces
the ASCII encoding if the LC_CTYPE locale is "POSIX", not only if
the LC_CTYPE locale is "C".
* test_utf8_mode.test_cmd_line() checks also that the command line
arguments are decoded from UTF-8 when the the UTF-8 Mode is enabled
with POSIX locale or C locale.
Elias Zamaria [Mon, 27 Aug 2018 06:59:28 +0000 (23:59 -0700)]
bpo-32968: Make modulo and floor division involving Fraction and float consistent with other operations (#5956)
Make mixed-type `%` and `//` operations involving `Fraction` and `float` objects behave like all other mixed-type arithmetic operations: first the `Fraction` object is converted to a `float`, then the `float` operation is performed as normal. This fixes some surprising corner cases, like `Fraction('1/3') % inf` giving a NaN.
Alexey Izbyshev [Fri, 24 Aug 2018 04:39:45 +0000 (07:39 +0300)]
closes bpo-34468: Objects/rangeobject.c: Fix an always-false condition in range_repr() (GH-8880)
Also, propagate the error from PyNumber_AsSsize_t() because we don't care
only about OverflowError which is not reported if the second argument is NULL.
Paul Ganssle [Thu, 23 Aug 2018 15:06:20 +0000 (11:06 -0400)]
bpo-34454: fix .fromisoformat() methods crashing on inputs with surrogate code points (GH-8862)
The current C implementations **crash** if the input includes a surrogate
Unicode code point, which is not possible to encode in UTF-8.
Important notes:
1. It is possible to pass a non-UTF-8 string as a separator to the
`.isoformat()` methods.
2. The pure-Python `datetime.fromisoformat()` implementation accepts
strings with a surrogate as the separator.
In `datetime.fromisoformat()`, in the special case of non-UTF-8 separators,
this implementation will take a performance hit by making a copy of the
input string and replacing the separator with 'T'.
Co-authored-by: Alexey Izbyshev <izbyshev@ispras.ru> Co-authored-by: Paul Ganssle <paul@ganssle.io>
Michael Osipov [Thu, 23 Aug 2018 13:27:19 +0000 (15:27 +0200)]
bpo-34412: Make signal.strsignal() work on HP-UX (GH-8786)
Introduce a configure check for strsignal(3) which defines HAVE_STRSIGNAL for
signalmodule.c. Add some common signals on HP-UX. This change applies for
Windows and HP-UX.
Victor Stinner [Thu, 23 Aug 2018 10:23:46 +0000 (12:23 +0200)]
bpo-34207: Fix pymain_read_conf() for UTF-8 Mode (GH-8868)
bpo-34170, bpo-34207: pymain_read_conf() now sets Py_UTF8Mode to
config->utf8_mode. pymain_read_conf() calls indirectly
Py_DecodeLocale() and Py_EncodeLocale() which depend on Py_UTF8Mode.
Berker Peksag [Wed, 15 Aug 2018 10:03:41 +0000 (13:03 +0300)]
bpo-34384: Fix os.readlink() on Windows (GH-8740)
os.readlink() now accepts path-like and bytes objects on Windows.
Previously, support for path-like and bytes objects was only
implemented on Unix.
This commit also merges Unix and Windows implementations of
os.readlink() in one function and adds basic unit tests to increase
test coverage of the function.
Stéphane Wirtel [Thu, 9 Aug 2018 15:05:31 +0000 (17:05 +0200)]
bpo-34324: Doc README wrong directory name for venv (GH-8650)
In the documentation, the `env` directory is specified when we execute
the `make venv` command. But in the code, `make venv` will create the
virtualenv inside the `venv` directory (defined by `VENVDIR`)