granicus.if.org Git - php/commit

author	Nikita Popov <nikita.ppv@gmail.com>
	Thu, 3 Aug 2017 19:05:27 +0000 (21:05 +0200)
committer	Nikita Popov <nikita.ppv@gmail.com>
	Thu, 3 Aug 2017 19:12:41 +0000 (21:12 +0200)
commit	a8a9e93e9a902ffd4099e3ba2a7a269da09120c5
tree	c47e9dd3525e778050e958c410d667ccea06e638	tree \| snapshot
parent	355743600d2531b2ce4f4f048883ee34a0697b4e	commit \| diff

Revert/fix mb_substitute_character() codepoint checks

The introduced checks did not treat "non-Unicode" encodings correctly,
because they treated the passed integer as encoded in the internal
encoding in that case, while in actuality the substitute character
is always a Unicode codepoint.

Additionally checking the codepoint against the internal encoding
is not correct in any case, because the substitution character must
be mapped in the *target* encoding of the conversion, which does
not necessarily coincide with the internal encoding (the internal
encoding is the default *source* encoding, not *target* encoding).

This reverts the checks back to simple range checks, but in a way
that still resolves #69079: Characters outside the Basic
Multilingual Plane are now accepted and Surrogate Codepoints are
rejected. A distinction between UTF-8 and non-UTF-8 encodings is
not made for surrogate checks (as in the original patch), as
surrogates are always illegal on their own. Specifying a surrogate
as substitution character would only make sense if you could
specify a substitution string with more than one character --
however we do not support that.

ext/mbstring/mbstring.c		diff \| blob \| history
ext/mbstring/tests/bug69079.phpt		diff \| blob \| history
ext/mbstring/tests/bug69086.phpt		diff \| blob \| history
ext/mbstring/tests/mb_chr.phpt		diff \| blob \| history
ext/mbstring/tests/mb_substitute_character_basic.phpt		diff \| blob \| history
ext/mbstring/tests/mb_substitute_character_variation1.phpt		diff \| blob \| history