Nikita Popov [Thu, 15 Oct 2020 14:42:59 +0000 (16:42 +0200)]
Simplify and fix generator tree management
This makes a number of related changes to the generator tree
management, that should hopefully make it easier to understand,
more robust and faster for the common linear-chain case. Fixes
https://bugs.php.net/bug.php?id=80240, which was the original
motivation here.
* Generators now only add a ref to their direct parent.
* Nodes only store their children, not their leafs, which avoids
any need for leaf updating. This means it's no longer possible
to fetch the child for a certain leaf, which is something we
only needed in one place (update_current). If multi-children
nodes are involved, this will require doing a walk in the other
direction (from leaf to root). It does not affect the common
case of single-child nodes.
* The root/leaf pointers are now seen as a pair. One leaf generator
can point to the current root. If a different leaf generator is
used, we'll move the root pointer over to that one. Again, this
is a cache to make the common linear chain case fast, trees may
need to scan up the parent link.
Nikita Popov [Wed, 21 Oct 2020 15:03:54 +0000 (17:03 +0200)]
Update bcmath.scale when calling bcscale()
We should keep the value of bcmath.scale and the internal
bc_precision global synchronized.
Probably more important than the ability to retrieve bcmath.scale
via ini_get(), this also makes sure that the set scale does not
leak into the next request, as it currently does.
Fix #80242: imap_mail_compose() segfaults for multipart with rfc822
libc-client expects `TYPEMESSAGE` with an explicit subtype of `RFC822`
to have a `nested.msg` (otherwise there will be a segfault during
free), but not to have any `contents.text.data` (this will leak
otherwise).
In libc-client 2007f `data` is declared as `unsigned char *`; there may
be variants which declare it as `void *`, but in any case picky
compilers may warn about a pointer type mismatch in the conditional
(and error with `-W-error`), so we're adding a `char *` cast for good
measure.
The original fix for that bug[1] broke the formerly working composition
of message/rfc822 messages, which results in a segfault when freeing
the message body now. While `imap_mail_compose()` does not really
support composition of meaningful message/rfc822 messages (although
libc-client appears to support that), some code may still use this to
compose partial messages, and using string manipulation to create the
final message.
The point is that libc-client expects `TYPEMESSAGE` with an explicit
subtype of `RFC822` to have a `nested.msg` (otherwise there will be a
segfault during free), but not to have any `contents.text.data` (this
will leak otherwise).
Nikita Popov [Tue, 20 Oct 2020 08:50:50 +0000 (10:50 +0200)]
Fix CCM tag length setting for old OpenSSL versions
While OpenSSL 1.1 allows unconditionally setting the CCM tag length
even for decryption, some older versions apparently do not. As such,
we do need to treat CCM and OCB separately after all.
Nikita Popov [Wed, 14 Oct 2020 11:03:03 +0000 (13:03 +0200)]
Fix bug #79983: Add support for OCB mode
OCB mode ciphers were already exposed to openssl_encrypt/decrypt,
but misbehaved, because they were not treated as AEAD ciphers.
From that perspective, OCB should be treated the same way as GCM.
In OpenSSL 1.1 the necessary controls were unified under
EVP_CTRL_AEAD_* (and OCB is only supported since OpenSSL 1.1).
This macro is defined to zero as of PHP 5.0.0, and as the comment
indicates, is no longer relevant. Thus, we remove the definition and
all usages from the core and bundled extensions.
Alex Dowad [Fri, 16 Oct 2020 20:03:27 +0000 (22:03 +0200)]
Do not pass invalid ISO-8859-{3,6,7,8} characters through silently
mbstring has a bad habit of passing invalid characters through silently
when converting to the same (or a "compatible") encoding.
For example, if you give it an invalid JIS X 0208 kuten code encoded with SJIS,
and try to convert that to EUC-JP, mbstring will just quietly re-encode the
invalid code in the EUC-JP representation.
At the same, some parts of the code (like `mb_check_encoding`) assume that
invalid characters will be treated as... well, invalid. Let's unbreak things
by actually catching errors and reporting them, instead of swallowing them.
Alex Dowad [Sat, 19 Sep 2020 18:34:13 +0000 (20:34 +0200)]
Add identify filter for ISO-8859-6 (Latin/Arabic)
Note that some text encoding conversion libraries, such as Solaris iconv
and FreeBSD iconv, map 0x30-0x39 to the Arabic script numerals rather than
the 'regular' Roman numerals. (That is, to Unicode codepoints 0x660-0x669.)
Further, Windows CP28596 adds more mappings to use the unused bytes in
ISO-8859-6.
Alex Dowad [Sat, 19 Sep 2020 18:27:55 +0000 (20:27 +0200)]
Add identify filter for ISO-8859-3 (Latin-3)
There are some bytes in this encoding which are not mapped to any character.
Notably, MicroSoft added their own mappings for these 'unused' bits in their
version of Latin-3, called CP28593.
Alex Dowad [Mon, 7 Sep 2020 06:42:16 +0000 (08:42 +0200)]
Add identify filter for ISO-8859-16 (Latin-10) encoding
Interestingly, it looks like the original author intended to add an identify filter
for this encoding, but never did so. The needed struct is there, but was never added
to the list of identify filters in mbfl_ident.c.