Fix #80345: PHPIZE configuration has outdated PHP_RELEASE_VERSION
We must not redefine the version "constants" for phpize builds, because
these have already generated in phpize.js, from where we pass these
variables forward to configure.js.
We also add `PHP_EXTRA_VERSION` and `PHP_VERSION_STRING` to the files
for completeness.
Nikita Popov [Thu, 19 Nov 2020 09:29:32 +0000 (10:29 +0100)]
Export zend_is_callable_at_frame
Export the zend_is_callable_impl() function as
zend_is_callable_at_frame() for use by extension. As twose pointed
out, an extension may want to retrieve fcc for a private method.
Nikita Popov [Wed, 18 Nov 2020 15:43:45 +0000 (16:43 +0100)]
Fix curl_multi_getcontent() parameter name
While the function name starts with curl_multi_*, the function
actually accepts a CurlHandle. As such, it should also use just
$handle as the parameter name.
`configure` for `phpize` builds on Windows creates Makefile and
config.pickle.h and includes the latter via the command line option
`/FI`. That implies that config.pickle.h is always included before
config.w32.h, which means that standard definitions always override
extension specific definitions, while it should be the other way round.
Therefore, we change the inclusion order by including config.pickle.h
at the end of config.w32.h if the former is available, and also make
sure to avoid any potential C4005 warnings by `#undef`ining the macros
before defining them.
Nikita Popov [Tue, 17 Nov 2020 09:18:37 +0000 (10:18 +0100)]
Fix incorrectly optimized out live range
For x ? y : z style structures, the live range starts at z, but
may also hold the value of y. Make sure that the refcounting check
takes this into account, by checking the type of a potential phi
user.
Fix #74558: Can't rebind closure returned by Closure::fromCallable()
Failure to rebind such closures is not necessarily related to them
being created by `ReflectionFunctionAbstract::getClosure()`, so we fix
the error message.
Strip trailing line breaks and periods from Windows error messages
PHP error messages should not contain line breaks, so we remove these
from the Windows specific error messages. We also remove trailing
periods for the same reason.
Nikita Popov [Thu, 12 Nov 2020 14:09:18 +0000 (15:09 +0100)]
Don't assume libmysqlclient library name
By simply dropping the additional checks, in line with the general
guideline of trusting the output of config scripts (this should
be migrated to pkg-config though).
Also drop the code for manually adding -z if mysql_config does not
-- that's not our problem.
Nikita Popov [Wed, 11 Nov 2020 10:51:20 +0000 (11:51 +0100)]
Retain reference to share handle from curl handle
Not keeping a reference will not result in use after free, because
curl protects against it, but it will result in a memory leak,
because curl_share_cleanup() will fail. We should make sure that
the share handle object stays alive as long as the curl handles
use it.
Alex Dowad [Fri, 4 Sep 2020 20:17:59 +0000 (22:17 +0200)]
Enhance mbstring support for UCS-2 text
- For consistency with UTF-16, UTF-32, and UCS-4, strip leading byte
order marks.
- Treat it as an error if string is truncated (i.e. has an odd number
of bytes).
Alex Dowad [Mon, 9 Nov 2020 19:40:08 +0000 (21:40 +0200)]
Unicode -> SJIS-mac conversion doesn't reject valid codepoints after a bad transcoding hint
To give the background on this issue, here is an excerpt from JAPANESE.txt,
from the Unicode Consortium:
Apple has defined a block of 32 corporate characters as "transcoding
hints." These are used in combination with standard Unicode characters
to force them to be treated in a special way for mapping to other
encodings; they have no other effect. Sixteen of these transcoding
hints are "grouping hints" - they indicate that the next 2-4 Unicode
characters should be treated as a single entity for transcoding. The
other sixteen transcoding hints are "variant tags" - they are like
combining characters, and can follow a standard Unicode (or a sequence
consisting of a base character and other combining characters) to
cause it to be treated in a special way for transcoding. These always
terminate a combining-character sequence.
The transcoding coding hints used in this mapping table are:
0xF860 group next 2 characters as a single entity for transcoding
0xF861 group next 3 characters as a single entity for transcoding
0xF862 group next 4 characters as a single entity for transcoding
0xF87A variant tag for "negative" (i.e. black & white reversed)
0xF87E variant tag for vertical form
0xF87F variant tag for other alternate form
For example, the Apple addition character 0x85AB is Roman numeral
thirteen. There is no single Unicode for this (although there are
standard Unicodes for Roman numerals 1-12). Using the grouping hint
0xF862 in combination with standard Unicodes, we can map this as
0xF862+0x0058+0x0049+0x0049+0x0049 (i.e. X + I + I + I).
Our SJIS-mac conversion code actually recognizes some special sequences
which start with an Apple 'transcoding hint'. However, if a transcoding
hint is misplaced and is not followed by one of the expected sequences,
we can just emit one error marker for the bad transcoding hint and then
process the following codepoint as normal.
Alex Dowad [Sat, 19 Sep 2020 13:16:51 +0000 (15:16 +0200)]
SJIS-mac encoding conversion: Stop the carnage of innocent Unicode codepoints
When converting Unicode to MacJapanese, some special sequences of Unicode
codepoints are collapsed into a single SJIS character. When the implementation
sees a codepoint which *might* begin such a sequence, it is cached and examined
again after the next codepoint arrives.
If it turns out that it wasn't one of the 'special' sequences, then a 'fallback'
conversion table is consulted to convert the cached codepoint. Then we re-enter
the regular conversion code to convert the immediately following codepoint.
BUT, local variables need to be reinitialized properly when doing this!
Because the locals weren't reinitialized, the sad result was that some codepoints
would get chopped up into bit salad and emitted as something totally bogus
(which might not even be valid SJIS-mac text at all).
Alex Dowad [Sat, 19 Sep 2020 12:27:14 +0000 (14:27 +0200)]
Convert Unicode halfwidth Yen sign to MacJapanese halfwidth Yen sign
Since 1993, Unicode has had a specific codepoint for a fullwidth Yen sign.
Likewise, MacJapanese has separate kuten codes for halfwidth and fullwidth
Yen signs. But mbstring mapped _both_ Yen sign codepoints to the
MacJapanese fullwidth Yen sign.
It's probably more appropriate to map the 'ordinary' Yen sign to the
MacJapanese halfwidth Yen sign. Besides, this means that the conversion
between Unicode and MacJapanese is closer to being lossless and reversible.
Alex Dowad [Wed, 9 Sep 2020 19:18:54 +0000 (21:18 +0200)]
SJIS-mac encoding conversion: handle invalid (or truncated) 2nd byte for Kanji correctly
Also, don't accept 1st bytes above 0xED, since none of the possible 2-byte
sequences starting with 0xEE and above are actually mapped to any character.
Alex Dowad [Thu, 17 Sep 2020 20:34:59 +0000 (22:34 +0200)]
Don't mangle non-Japanese chars which appear after a 'combining' kana in SJIS-2004
Unicode has 'combining' characters which join with another following character.
Japanese hiragana and katakana with the 'two dots' voice mark can be represented
in this way, with one Unicode character for the 'base' kana and another one which
adds the voice mark.
In SJIS-2004, however, there are dedicated characters for voiced and unvoiced
kana. So some special checks are done to identify sequences of Unicode characters
which need to be 'collapsed' into a single SJIS-2004 character.
If a kana is immediately followed by some other unrelated character, like a
Cyrillic letter, then the cached kana should be output 'as is' and we
proceed with encoding the unrelated character. When doing this, though,
we need to re-initialize local variables, or else the unrelated character
will be mangled in some cases.
Alex Dowad [Tue, 8 Sep 2020 20:57:28 +0000 (22:57 +0200)]
SJIS-2004 encoding conversion: handle invalid (or truncated) 2nd byte for Kanji correctly
If the 2nd byte of a 2-byte character is invalid, then mb_substitute_character()
should be respected. Instead, what mbstring was doing was 'swallowing' the
first byte, then emitting the 2nd byte as if it was an ASCII character.
Likewise, if the 2nd byte is missing, instead of just keeping quiet, report an
illegal character as specified by mb_substitute_character().