If Zip operations fails due to PHP error conditions before libzip even
has been called, there is no meaningful indication what failed; the
functions just return false, and the Zip status indicated that no error
did occur. Therefore we raise `E_WARNING` in these cases.
Nikita Popov [Thu, 5 Nov 2020 10:58:31 +0000 (11:58 +0100)]
Fix multiple trait fixup
If a trait method is inherited, preloading trait fixup might be
performed on it multiple times. Usually this is fine, because
the opcodes pointer will have already been updated, and will thus
not be found in the xlat table.
However, it can happen that the new opcodes pointer is the same
as one of the old opcodes pointers, if the pointer has been reused
by the allocator. In this case we will look up the wrong op array
and overwrite the trait method with an unrelated trait method.
We fix this by indexing the xlat table not by the opcodes pointer,
but by the refcount pointer. The refcount pointer is not changed
during optimization, and accurately represents which op arrays
should use the same opcodes.
Fixes bug #80307. The test case does not reproduce the bug, because
this depends on a lot of "luck" with the allocator. The test case
merely illustrates a case where orig_op_array would have been NULL
in the original code.
Fix #80266: parse_url silently drops port number 0
As of commit 81b2f3e[1], `parse_url()` accepts URLs with a zero port,
but does not report that port, what is wrong in hindsight.
Since the port number is stored as `unsigned short` there is no way to
distinguish between port zero and no port. For BC reasons, we thus
introduce `parse_url_ex2()` which accepts an output parameter that
allows that distinction, and use the new function to fix the behavior.
The introduction of `parse_url_ex2()` has been suggested by Nikita.
Nikita Popov [Wed, 4 Nov 2020 13:51:44 +0000 (14:51 +0100)]
Assert that references are not persisted
There should not be any need to persist references, and it's unlikely
that persisting a reference will behave correctly at runtime, because
we don't have a concept of an immutable reference.
Nikita Popov [Tue, 3 Nov 2020 15:45:13 +0000 (16:45 +0100)]
Don't disable early binding during preloading script
We should only disable early binding during the opcache_compile_file()
calls, not inside the preloading script or anything it includes.
The right condition to check for is whether we compile the file
without execution, as declaring classes is "execution".
Nikita Popov [Tue, 3 Nov 2020 13:15:05 +0000 (14:15 +0100)]
Fix variance checks on resolved union types
This is a bit annoying: When preloading is used, types might be
resolved during inheritance checks, so we need to deal with CE
types rather than just NAME types everywhere.
Nikita Popov [Tue, 3 Nov 2020 10:50:14 +0000 (11:50 +0100)]
Don't ignore internal classes during preloading
When preloading, it's fine to make use of internal class information,
as we do not support Windows. It is also necessary to allow proper
variance checks against internal classes.
Alex Dowad [Sun, 18 Oct 2020 15:49:57 +0000 (17:49 +0200)]
Fix mbstring support for ARMSCII-8
- Identify filter was completely wrong.
- Respect `mb_substitute_character` rather than converting invalid bytes to
Unicode 0xFFFD (generic replacement character).
- Don't convert Unicode 0xFFFD to a valid ARMSCII-8 character.
- When converting ARMSCII-8 to ARMSCII-8, don't pass invalid bytes through
silently.
Alex Dowad [Sat, 29 Aug 2020 17:12:28 +0000 (19:12 +0200)]
Optimize (AND FIX) mb_check_encoding (cut execution time by 50%+)
Previously, `mb_check_encoding` did an awful lot of unneeded work. In order to
determine whether a string was valid or not, it would convert the whole string
into wchar (code points), which required dynamically allocating a (potentially
large) buffer. Then it would turn right around and convert that big 'ol buffer
of code points back to the original encoding again. Finally, it would check
whether any invalid bytes were detected during that long and onerous process.
The thing is, mbstring _already_ has machinery for detecting whether a string
is valid in a certain encoding or not, and it doesn't require copying any data
around or allocating buffers. Better yet, it can fail fast when an invalid byte
is found. Why not use it? It's sure a lot faster!
Further, the legacy code was also badly broken. Why? Because aside from
checking whether illegal characters were detected, it would also check whether
the conversion to and from wchars was lossless. But, some encodings have
more than one valid encoding for the same character. In such cases, it is
not possible to make the conversion to and from wchars lossless for every
valid character. So `mb_check_encoding` would actually reject good strings
in a lot of encodings!
Alex Dowad [Sun, 18 Oct 2020 13:30:03 +0000 (15:30 +0200)]
Fix mbstring support for CP1254 encoding
One funny thing: while the original author used Unicode 0xFFFD (generic
replacement character) for invalid bytes in CP1251 and CP1252, for CP1254
they used 0xFFFE, which is not a valid Unicode codepoint at all, but is a
reversed byte-order mark. Probably this was by mistake.
Anyways,
- Fixed identify filter, which was completely wrong.
- Don't convert Unicode 0xFFFE to a random (but valid) CP1254 byte.
- When converting CP1254 to CP1254, don't pass invalid bytes through silently.
Alex Dowad [Sun, 18 Oct 2020 13:12:11 +0000 (15:12 +0200)]
Fix mbstring support for CP1251 encoding
- Identify filter was as wrong as wrong can be.
- Invalid CP1251 byte 0x98 was converted to Unicode 0xFFFD (generic
replacement character), rather than respecting `mb_substitute_character`.
- Unicode 0xFFFD was converted to some random CP1251 byte.
- When converting CP1251 to CP1251, don't pass invalid bytes through silently.