Cut tricky `goto` that isn't needed, in _PyBytes_DecodeEscape. (GH-15825)
This is the sort of `goto` that requires the reader to stare hard at
the code to unpick what it's doing.
On doing so, the answer is... not very much!
* It jumps from the bottom of the loop to almost the top; the effect
is to bypass the loop condition `s < end` and also the
`if`-condition `*s != '\\'`, acting as if both are true.
* We've just decremented `s`, after incrementing it in the `switch`
condition. So it has the same value as when `s == end` failed.
Before that was another increment... and before that we had
`s < end`. So `s < end` true, then increment, then `s == end`
false... that means `s < end` is still true.
* Also this means `s` points to the same character as it did for the
`switch` condition. And there was a `case '\\'`, which we didn't
hit -- so `*s != '\\'` is also true.
* That means this has no effect on the behavior! The most it might do
is an optimization -- we get to skip those two checks, because (as
just proven above) we know they're true.
* But gosh, this is the *invalid escape sequence* path. This does not
seem like the kind of code path that calls for extreme optimization
tricks.
So, take the `goto` and the label out.
Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code. Or perhaps it won't! That'll be OK.
But then, crucially, if some future edit to this loop causes the
reasoning above to *stop* holding true... the compiler will adjust
this jump accordingly. One of us fallible humans might not.
bpo-36502: Update link to UAX #44, the Unicode doc on the UCD. (GH-15301)
The link we have points to the version from Unicode 6.0.0, dated 2010.
There have been numerous updates to it since then:
https://www.unicode.org/reports/tr44/#Modifications
Change the link to one that points to the current version. Also, use HTTPS.
bpo-35941: Fix performance regression in new code (GH-12610)
Accumulate certificates in a set instead of doing a costly list contain
operation. A Windows cert store can easily contain over hundred
certificates. The old code would result in way over 5,000 comparison
operations
Signed-off-by: Christian Heimes <christian@python.org>
In debug mode, visit_decref() now calls _PyObject_IsFreed() to ensure
that the object is not freed. If it's freed, the program fails with
an assertion error and Python dumps informations about the freed
object.
bpo-37758: Cut always-constant conditionals on sys.maxunicode. (GH-15302)
Since PEP 393 in Python 3.3, this value is always 0x10ffff, the
maximum codepoint in Unicode; there's no longer such a thing as a
UCS-2 build of Python, which couldn't properly represent some
characters.
There are a couple of spots left where we still condition on the value
of this constant. Take them out.
Victor Stinner [Mon, 9 Sep 2019 14:55:58 +0000 (16:55 +0200)]
bpo-38006: Avoid closure in weakref.WeakValueDictionary (GH-15641)
weakref.WeakValueDictionary defines a local remove() function used as
callback for weak references. This function was created with a
closure. Modify the implementation to avoid the closure.
Mark files as executable that are meant as scripts. (GH-15354)
This is the converse of GH-15353 -- in addition to plenty of
scripts in the tree that are marked with the executable bit
(and so can be directly executed), there are a few that have
a leading `#!` which could let them be executed, but it doesn't
do anything because they don't have the executable bit set.
Here's a command which finds such files and marks them. The
first line finds files in the tree with a `#!` line *anywhere*;
the next-to-last step checks that the *first* line is actually of
that form. In between we filter out files that already have the
bit set, and some files that are meant as fragments to be
consumed by one or another kind of preprocessor.
bpo-26185: Fix repr() on empty ZipInfo object (#13441)
* bpo-26185: Fix repr() on empty ZipInfo object
It was failing on AttributeError due to inexistant
but required attributes file_size and compress_size.
They are now initialized to 0 in ZipInfo.__init__().
* Remove useless hasattr() in ZipInfo._open_to_write()
* Completely remove file_size setting in _open_to_write().
bpo-37702: Fix SSL's certificate-store leak on Windows (GH-15632)
ssl_collect_certificates function in _ssl.c has a memory leak.
Calling CertOpenStore() and CertAddStoreToCollection(), a store's refcnt gets incremented by 2.
But CertCloseStore() is called only once and the refcnt leaves 1.
bpo-37705: Improve the implementation of winerror_to_errno() (GH-15623)
winerror_to_errno() is no longer automatically generated.
Do not rely on the old _dosmapperr() function.
Add ERROR_NO_UNICODE_TRANSLATION (1113) -> EILSEQ.
bpo-37936: Avoid ignoring files that we actually do track. (GH-15451)
There were about 14 files that are actually in the repo but that are
covered by the rules in .gitignore.
Git itself takes no notice of what .gitignore says about files that
it's already tracking... but the discrepancy can be confusing to a
human that adds a new file unexpectedly covered by these rules, as
well as to non-Git software that looks at .gitignore but doesn't
implement this wrinkle in its semantics. (E.g., `rg`.)
Several of these are from rules that apply more broadly than
intended: for example, `Makefile` applies to `Doc/Makefile` and
`Tools/freeze/test/Makefile`, whereas `/Makefile` means only the
`Makefile` at the repo's root.
And the `Modules/Setup` rule simply wasn't updated after 961d54c5c.
bpo-37445: Include FORMAT_MESSAGE_IGNORE_INSERTS in FormatMessageW() calls (GH-14462)
If FormatMessageW() is passed the FORMAT_MESSAGE_FROM_SYSTEM flag without FORMAT_MESSAGE_IGNORE_INSERTS, it will fail if there are insert sequences in the message definition.
Restart lines now always start with '=' and never end with ' ' and fill the width of the window unless that would require ending with ' ', which could be wrapped by itself and possible confusing the user.
closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558)
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
Tal Einat [Tue, 3 Sep 2019 20:52:58 +0000 (23:52 +0300)]
bpo-38022: IDLE: upgrade help.html to sphinx 2.x HTML5 output (GH-15664)
The HTML5 output from Sphinx 2.x adds '<p>' tags within list elements. Using a new prevtag attribute, ignore these instead of emitting unwanted '\n\n'.
Also stop looking for 'first' classes on tags (no longer present) and fix the bug of double-spacing instead of single spacing after <pre> blocks.
Extending the hover delay in test_tooltip should avoid spurious test_idle failures.
One longer delay instead of two shorter delays results in a net speedup.
bpo-15999: Clean up of handling boolean arguments. (GH-15610)
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
Ashwin Ramaswami [Sat, 31 Aug 2019 15:25:35 +0000 (10:25 -0500)]
bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239)
Fixes a case in which email._header_value_parser.get_unstructured hangs the system for some invalid headers. This covers the cases in which the header contains either:
- a case without trailing whitespace
- an invalid encoded word