Pablo Galindo [Thu, 22 Aug 2019 01:38:39 +0000 (02:38 +0100)]
Refactor Parser/pgen and add documentation and explanations (GH-15373)
* Refactor Parser/pgen and add documentation and explanations
To improve the readability and maintainability of the parser
generator perform the following transformations:
* Separate the metagrammar parser in its own class to simplify
the parser generator logic.
* Create separate classes for DFAs and NFAs and move methods that
act exclusively on them from the parser generator to these
classes.
* Add docstrings and comment documenting the process to go from
the grammar file into NFAs and then DFAs. Detail some of the
algorithms and give some background explanations of some concepts
that will helps readers not familiar with the parser generation
process.
* Select more descriptive names for some variables and variables.
* PEP8 formatting and quote-style homogenization.
The output of the parser generator remains the same (Include/graminit.h
and Python/graminit.c remain untouched by running the new parser generator).
bsiem [Wed, 21 Aug 2019 23:00:39 +0000 (01:00 +0200)]
bpo-37482: Fix email address name with encoded words and special chars (GH-14561)
Special characters in email address header display names are normally
put within double quotes. However, encoded words (=?charset?x?...?=) are
not allowed withing double quotes. When the header contains a word with
special characters and another word that must be encoded, the first one
must also be encoded.
In the next example, the display name in the From header is quoted and
therefore the comma is allowed; in the To header, the comma is not
within quotes and not encoded, which is not allowed and therefore
rejected by some mail servers.
From: "Foo Bar, France" <foo@example.com>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>
Brett Cannon [Wed, 21 Aug 2019 22:58:01 +0000 (15:58 -0700)]
bpo-37663: have venv activation scripts all consistently use __VENV_PROMPT__ for prompt customization (GH-14941)
The activation scripts generated by venv were inconsistent in how they changed the shell's prompt. Some used `__VENV_PROMPT__` exclusively, some used `__VENV_PROMPT__` if it was set even though by default `__VENV_PROMPT__` is always set and the fallback matched the default, and one ignored `__VENV_PROMPT__` and used `__VENV_NAME__` instead (and even used a differing format to the default prompt). This change now has all activation scripts use `__VENV_PROMPT__` only and relies on the fact that venv sets that value by default.
The color of the customization is also now set in fish to the blue from the Python logo for as hex color support is built into that shell (much like PowerShell where the built-in green color is used).
Steve Dower [Wed, 21 Aug 2019 22:27:33 +0000 (15:27 -0700)]
bpo-37834: Normalise handling of reparse points on Windows (GH-15231)
bpo-37834: Normalise handling of reparse points on Windows
* ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed)
* nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point)
* nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour)
* nt.readlink() will read destinations for symlinks and junction points only
bpo-1311: os.path.exists('nul') now returns True on Windows
* nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
Added back mention that ensure_future actually scheduled obj. This documentation just mentions what ensure_future returns, so I did not realize that ensure_future also schedules obj.
Fix codecs.lookup() to normalize the encoding name the same way
than encodings.normalize_encoding(), except that codecs.lookup()
also converts the name to lower case.
Victor Stinner [Wed, 21 Aug 2019 12:40:42 +0000 (13:40 +0100)]
bpo-37851: faulthandler allocates its stack on demand (GH-15358)
The faulthandler module no longer allocates its alternative stack at
Python startup. Now the stack is only allocated at the first
faulthandler usage.
faulthandler no longer ignores memory allocation failure when
allocating the stack. sigaltstack() failure now raises an OSError
exception, rather than being ignored.
The alternative stack is no longer used if sigaction() is
not available. In practice, sigaltstack() should only be available
when sigaction() is avaialble, so this change should have no effect
in practice.
faulthandler.dump_traceback_later() internal locks are now only
allocated at the first dump_traceback_later() call, rather than
always being allocated at Python startup.
* Write a message when killing a worker process
* Put a timeout on the second popen.communicate() call
(after killing the process)
* Put a timeout on popen.wait() call
* Catch popen.kill() and popen.wait() exceptions
Greg Price [Wed, 21 Aug 2019 04:53:59 +0000 (21:53 -0700)]
Unmark files as executable that can't actually be executed. (GH-15353)
There are plenty of legitimate scripts in the tree that begin with a
`#!`, but also a few that seem to be marked executable by mistake.
Found them with this command -- it gets executable files known to Git,
filters to the ones that don't start with a `#!`, and then unmarks
them as executable:
$ git ls-files --stage \
| perl -lane 'print $F[3] if (!/^100644/)' \
| while read f; do
head -c2 "$f" | grep -qxF '#!' \
|| chmod a-x "$f"; \
done
Looking at the list by hand confirms that we didn't sweep up any
files that should have the executable bit after all. In particular
* The `.psd` files are images from Photoshop.
* The `.bat` files sure look like things that can be run.
But we have lots of other `.bat` files, and they don't have
this bit set, so it must not be needed for them.
Greg Price [Wed, 21 Aug 2019 03:50:51 +0000 (20:50 -0700)]
bpo-35518: Skip test that relies on a deceased network service. (GH-15349)
If this service had thoroughly vanished, we could just ignore the
test until someone gets around to either recreating such a service
or redesigning the test to somehow work locally. The
`support.transient_internet` mechanism catches the failure to
resolve the domain name, and skips the test.
But in fact the domain snakebite.net does still exist, as do its
nameservers -- and they can be quite slow to reply. As a result
this test can easily take 20-30s before it gets auto-skipped.
Jeroen Demeyer [Fri, 16 Aug 2019 10:41:27 +0000 (12:41 +0200)]
bpo-37540: vectorcall: keyword names must be strings (GH-14682)
The fact that keyword names are strings is now part of the vectorcall and `METH_FASTCALL` protocols. The biggest concrete change is that `_PyStack_UnpackDict` now checks that and raises `TypeError` if not.
Alex Gaynor [Thu, 15 Aug 2019 12:31:28 +0000 (08:31 -0400)]
Replace usage of the obscure PEM_read_bio_X509_AUX with the more standard PEM_read_bio_X509 (GH-15303)
X509_AUX is an odd, note widely used, OpenSSL extension to the X509 file format. This function doesn't actually use any of the extra metadata that it parses, so just use the standard API.
faulthandler now allocates a dedicated stack of SIGSTKSZ*2 bytes,
instead of just SIGSTKSZ bytes. Calling the previous signal handler
in faulthandler signal handler uses more than SIGSTKSZ bytes of stack
memory on some platforms.
Artem Khramov [Wed, 14 Aug 2019 21:21:48 +0000 (03:21 +0600)]
bpo-37811: FreeBSD, OSX: fix poll(2) usage in sockets module (GH-15202)
FreeBSD implementation of poll(2) restricts the timeout argument to be
either zero, or positive, or equal to INFTIM (-1).
Unless otherwise overridden, socket timeout defaults to -1. This value
is then converted to milliseconds (-1000) and used as argument to the
poll syscall. poll returns EINVAL (22), and the connection fails.
This bug was discovered during the EINTR handling testing, and the
reproduction code can be found in
https://bugs.python.org/issue23618 (see connect_eintr.py,
attached). On GNU/Linux, the example runs as expected.
This change is trivial:
If the supplied timeout value is negative, truncate it to -1.
Greg Price [Wed, 14 Aug 2019 11:05:19 +0000 (04:05 -0700)]
bpo-36502: Correct documentation of str.isspace() (GH-15019)
The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.
Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.
Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so. The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.
Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
Greg Price [Wed, 14 Aug 2019 02:28:38 +0000 (19:28 -0700)]
bpo-37760: Factor out standard range-expanding logic in makeunicodedata. (GH-15248)
Much like the lower-level logic in commit ef2af1ad4, we had
4 copies of this logic, written in a couple of different ways.
They're all implementing the same standard, so write it just once.
Josh Holland [Tue, 13 Aug 2019 19:05:09 +0000 (20:05 +0100)]
bpo-37814: Document the empty tuple type annotation syntax (GH-15208)
https://bugs.python.org/issue37814:
> The empty tuple syntax in type annotations, `Tuple[()]`, is not obvious from the examples given in the documentation (I naively expected `Tuple[]` to work); it has been documented in PEP 484 and in mypy, but not in the documentation for the typing module.
Greg Price [Tue, 13 Aug 2019 05:59:30 +0000 (22:59 -0700)]
bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129)
The `expand` option was introduced in 2000 in commit fad27aee1.
It appears to have been always set since it was committed, and
what it does is tell the code to do something essential. So,
just always do that, and cut the option.
Also cut the `linebreakprops` option, which isn't consulted anymore.
Greg Price [Tue, 13 Aug 2019 05:58:01 +0000 (22:58 -0700)]
bpo-37758: Clean out vestigial script-bits from test_unicodedata. (GH-15126)
This file started life as a script, before conversion to a
`unittest` test file. Clear out some legacies of that conversion
that are a bit confusing about how it works.
Most notably, it's unlikely there's still a good reason to try
to recover from `unicodedata` failing to import -- as there was
when that logic was first added, when the module was very new.
So take that out entirely. Keep `self.db` working, though, to
avoid a noisy diff.
Greg Price [Tue, 13 Aug 2019 05:20:56 +0000 (22:20 -0700)]
bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130)
There were 10 copies of this, and almost as many distinct versions of
exactly how it was written. They're all implementing the same
standard. Pull them out to the top, so the more interesting logic
that remains becomes easier to read.
Derek Keeler [Mon, 12 Aug 2019 20:06:02 +0000 (13:06 -0700)]
bpo-37354: Make Powershell Activate.ps1 script static to allow for signing (GH-14967)
- Remove use of replacement text in the script
- Make use of the pyvenv.cfg file for prompt value.
- Add parameters to allow more flexibility
- Make use of the current path, and assumptions about where env puts things, to compensate
- Make the script a bit more 'idiomatic' Powershell
- Add script documentation (Get-Help .\.venv\Scripts\Activate.ps1 shows PS help page now
Gregory P. Smith [Sat, 10 Aug 2019 07:19:07 +0000 (00:19 -0700)]
bpo-32912: Revert SyntaxWarning on invalid escape sequences. (GH-15195)
DeprecationWarning will continue to be emitted for invalid escape
sequences in string and bytes literals just as it did in 3.7.
SyntaxWarning may be emitted in the future. But per mailing list
discussion, we don't yet know when because we haven't settled on how to
do so in a non-disruptive manner.
Ngalim Siregar [Fri, 9 Aug 2019 14:22:16 +0000 (21:22 +0700)]
bpo-37642: Update acceptable offsets in timezone (GH-14878)
This fixes an inconsistency between the Python and C implementations of
the datetime module. The pure python version of the code was not
accepting offsets greater than 23:59 but less than 24:00. This is an
accidental legacy of the original implementation, which was put in place
before tzinfo allowed sub-minute time zone offsets.