Alex Dowad [Wed, 7 Oct 2020 20:30:34 +0000 (22:30 +0200)]
Don't pass invalid JIS X 0212, JIS X 0213, and Windows-CP932 characters through
Similarly to JIS X 0208, mbstring would pass kuten codes which are not mapped
in the JIS X 0212, JIS X 0213, or CP932 character sets through silently when
converting to another Japanese encoding.
Alex Dowad [Wed, 7 Oct 2020 20:12:27 +0000 (22:12 +0200)]
Don't pass invalid JIS X 0208 characters through
Many Japanese encodings, such as JIS7/8, Shift JIS, ISO-2022-JP, EUC-JP, and
so on encode characters from the JIS X 0208 character set. JIS X 0208 is based
on the concept of a 94x94 table, with numbered rows and columns. However,
more than a thousand of the cells in that table are empty; JIS X 0208 does not
actually use all 94x94=8,836 possible kuten codes.
mbstring had a dubious feature whereby, if a Japanese string contained one of
these 'unmapped' kuten codes, and it was being converted to another Japanese
encoding which was also based on JIS X 0208, the non-existent character would
be silently passed through, and the unmapped kuten code would be re-encoded
using the normal encoding method of the target text encoding.
Again, this _only_ happened if converting the text with the funky kuten code
to a Japanese encoding. If one tried converting it to Unicode, mbstring would
treat that as an error.
If somebody, somewhere, made their own private extension to JIS X 0208, and
used the regular Japanese encodings like Shift JIS and EUC-JP to encode this
private character set, then this feature might conceivably be useful. But how
likely is that? If someone is using Shift JIS, EUC-JP, ISO-2022-JP, etc. to
encode a funky version of JIS X 0208 with extra characters added, then that
should be treated as a separate text encoding.
The code which flags such characters with MBFL_WCSPLANE_JIS0208 is retained
solely for error reporting in `mbfl_filt_conv_illegal_output`.
Alex Dowad [Sun, 4 Oct 2020 20:29:34 +0000 (22:29 +0200)]
Enhance handling of CP932 text encoding
- Don't allow control characters to appear in the middle of a multi-byte
character. (This was a strange feature of mbstring; it doesn't make much
sense, and iconv doesn't allow it.)
- Treat truncated multi-byte characters as an error.
Nikita Popov [Wed, 25 Nov 2020 14:57:11 +0000 (15:57 +0100)]
Reindent ext/mysqli tests
Reindent ext/mysqli tests on PHP-7.4, so they match with the
indentation on PHP-8.0. Otherwise merging test changes across
branches is very unpleasant.
Nikita Popov [Wed, 25 Nov 2020 11:25:07 +0000 (12:25 +0100)]
Fix ref source management during unserialization
Only register the slot for adding ref sources later if we didn't
immediately register one. Also avoids leaking a ref source if
it is added early and the assignment fails.
Calvin Buckley [Tue, 24 Nov 2020 19:45:34 +0000 (15:45 -0400)]
sockets: Fix variable/macro name collision on AIX
The name "rem_size" is used by a macro in a system header on AIX,
specifically `sys/xmem.h`. Without changing the name, you get the
name mangled like so:
```
In file included from /usr/include/sys/uio.h:92:0,
from /QOpenSys/pkgs/lib/gcc/powerpc-ibm-aix6.1.0.0/6.3.0/include-fixed-7.1/sys/socket.h:83,
from /usr/include/sys/syslog.h:151,
from /usr/include/syslog.h:29,
from /home/calvin/rpmbuild/BUILD/php-8.0.0RC5/main/php_syslog.h:27,
from /home/calvin/rpmbuild/BUILD/php-8.0.0RC5/main/php.h:318,
from /home/calvin/rpmbuild/BUILD/php-8.0.0RC5/ext/sockets/sendrecvmsg.c:17:
/home/calvin/rpmbuild/BUILD/php-8.0.0RC5/ext/sockets/sendrecvmsg.c: In function 'zif_socket_cmsg_space':
/home/calvin/rpmbuild/BUILD/php-8.0.0RC5/ext/sockets/sendrecvmsg.c:298:10: error: expected '=', ',', ';', 'asm' or '__attribute__' before '.' token
size_t rem_size = ZEND_LONG_MAX - entry->size;
^
/home/calvin/rpmbuild/BUILD/php-8.0.0RC5/ext/sockets/sendrecvmsg.c:298:10: error: expected expression before '.' token
/home/calvin/rpmbuild/BUILD/php-8.0.0RC5/ext/sockets/sendrecvmsg.c:299:18: error: 'u2' undeclared (first use in this function)
size_t n_max = rem_size / entry->var_el_size;
^
/home/calvin/rpmbuild/BUILD/php-8.0.0RC5/ext/sockets/sendrecvmsg.c:299:18: note: each undeclared identifier is reported only once for each function it appears in
```
...because of the declaration in `sys/xmem.h`:
```
```
This just renames the variable so that it won't trip on this
definition. Tested to fix the build on IBM i PASE.
Nikita Popov [Tue, 24 Nov 2020 14:52:41 +0000 (15:52 +0100)]
Fixed bug #80377
Make sure the $PHP_THREAD_SAFETY variable is always available
when configuring extensions. It was previously available for
phpized extensions, but for in-tree builds it was being set
too late.
Then, use $PHP_THREAD_SAFETY instead of $enable_zts to check for
ZTS in bundled extensions, which makes sure these checks also
work for phpize builds.
Nikita Popov [Tue, 24 Nov 2020 14:52:41 +0000 (15:52 +0100)]
Fixed bug #80377
Use $PHP_THREAD_SAFETY instead of $enable_zts to check for ZTS.
This variable is also available for phpize builds, while enable_zts
is only present for in-tree builds.
Nikita Popov [Tue, 24 Nov 2020 10:46:03 +0000 (11:46 +0100)]
Use pkg-config for libargon2
We already tried this in PHP 7.4, but ran into issues, because
alpine did not support pkg-config for libargon2 (or had a broken
pc file, not sure). The Alpine issue has been resolved in the
meantime, so let's give this another try.
Nikita Popov [Tue, 24 Nov 2020 10:31:53 +0000 (11:31 +0100)]
Fixed bug #80404
For a division like [1..1]/[2..2] produce [0..1] as a result, which
would be the integer envelope of the floating-point result.
The implementation is pretty ugly (we're now taking min/max across
eight values...) but I couldn't come up with a more elegant way
to handle this that doesn't make things a lot more complex (the
division sign handling is the annoying issue here).
Nikita Popov [Thu, 19 Nov 2020 11:46:52 +0000 (12:46 +0100)]
Use MIN/MAX when dumping RANGE[]
It's very common that one of the bounds is LONG_MIN or LONG_MAX.
Dump them as MIN/MAX instead of the int representation in that
case, as it makes the dump less noisy.
Fix #72964: White space not unfolded for CC/Bcc headers
`\r\n` does only terminate a header, if not followed by `\t` or ` `.
We have to cater to that when determining the end position of the
respective headers.
Fix #80345: PHPIZE configuration has outdated PHP_RELEASE_VERSION
We must not redefine the version "constants" for phpize builds, because
these have already generated in phpize.js, from where we pass these
variables forward to configure.js.
We also add `PHP_EXTRA_VERSION` and `PHP_VERSION_STRING` to the files
for completeness.