closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558)
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
Tal Einat [Tue, 3 Sep 2019 20:52:58 +0000 (23:52 +0300)]
bpo-38022: IDLE: upgrade help.html to sphinx 2.x HTML5 output (GH-15664)
The HTML5 output from Sphinx 2.x adds '<p>' tags within list elements. Using a new prevtag attribute, ignore these instead of emitting unwanted '\n\n'.
Also stop looking for 'first' classes on tags (no longer present) and fix the bug of double-spacing instead of single spacing after <pre> blocks.
Extending the hover delay in test_tooltip should avoid spurious test_idle failures.
One longer delay instead of two shorter delays results in a net speedup.
bpo-15999: Clean up of handling boolean arguments. (GH-15610)
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
Ashwin Ramaswami [Sat, 31 Aug 2019 15:25:35 +0000 (10:25 -0500)]
bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239)
Fixes a case in which email._header_value_parser.get_unstructured hangs the system for some invalid headers. This covers the cases in which the header contains either:
- a case without trailing whitespace
- an invalid encoded word
Fix a ctypes regression of Python 3.8. When a ctypes.Structure is
passed by copy to a function, ctypes internals created a temporary
object which had the side effect of calling the structure finalizer
(__del__) twice. The Python semantics requires a finalizer to be
called exactly once. Fix ctypes internals to no longer call the
finalizer twice.
Create a new internal StructParam_Type which is only used by
_ctypes_callproc() to call PyMem_Free(ptr) on Py_DECREF(argument).
StructUnionType_paramfunc() creates such object.
Nick Coghlan [Thu, 29 Aug 2019 13:26:53 +0000 (23:26 +1000)]
bpo-37947: Avoid double-decrement in symtable recursion counting (GH-15593)
With `symtable_visit_expr` now correctly adjusting the recursion depth for named
expressions, `symtable_handle_namedexpr` should be leaving it alone.
Also adds a new check to `PySymtable_BuildObject` that raises `SystemError`
if a successful first symbol analysis pass fails to keep the stack depth
accounting clean.
Greg Price [Tue, 27 Aug 2019 18:16:31 +0000 (11:16 -0700)]
bpo-37936: Remove some .gitignore rules that were intended locally. (GH-15542)
These appeared in commit c5ae169e1. The comment on them, as well as
the presence among them of a rule for the .gitignore file itself,
indicate that the author intended these lines to remain only in their
own local working tree -- not to get committed even to their own repo,
let alone merged upstream.
They did nevertheless get committed, because it turns out that Git
takes no notice of what .gitignore says about files that it's already
tracking... for example, this .gitignore file itself.
Give effect to these lines' original intention, by deleting them. :-)
Git tip, for reference: the `.git/info/exclude` file is a handy way
to do exactly what these lines were originally intended to do. A
related handy file is `~/.config/git/ignore`. See gitignore(5),
aka `git help ignore`, for details.
Nick Coghlan [Sun, 25 Aug 2019 13:45:40 +0000 (23:45 +1000)]
bpo-37757: Disallow PEP 572 cases that expose implementation details (GH-15131)
- drop TargetScopeError in favour of raising SyntaxError directly
as per the updated PEP 572
- comprehension iteration variables are explicitly local, but
named expression targets in comprehensions are nonlocal or
global. Raise SyntaxError as specified in PEP 572
- named expression targets in the outermost iterable of a
comprehension have an ambiguous target scope. Avoid resolving
that question now by raising SyntaxError. PEP 572
originally required this only for cases where the bound name
conflicts with the iteration variable in the comprehension,
but CPython can't easily restrict the exception to that case
(as it doesn't know the target variable names when visiting
the outermost iterator expression)
These were caused by keeping around a reference to the Squeezer
instance and calling it's load_font() upon config changes, which
sometimes happened even if the shell window no longer existed.
This change completely removes that mechanism, instead having the
editor window properly update its width attribute, which can then
be used by Squeezer.