From 5e66b8f2326abd6fdcea1f2f73dd5f2b33ddfded Mon Sep 17 00:00:00 2001 From: Anatol Belski Date: Mon, 20 Jun 2016 18:02:51 +0200 Subject: [PATCH] notes to UPGRADING.INTERNALS --- UPGRADING.INTERNALS | 47 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/UPGRADING.INTERNALS b/UPGRADING.INTERNALS index 6e30f7cad8..476e7c255e 100644 --- a/UPGRADING.INTERNALS +++ b/UPGRADING.INTERNALS @@ -2,6 +2,8 @@ PHP 7.1 INTERNALS UPGRADE NOTES 0. Wiki Examples 1. Internal API changes + e. Codepage handling on Windows + f. Path handling on Windows 2. Build system changes a. Unix build system changes @@ -21,6 +23,51 @@ changes. See: https://wiki.php.net/phpng-upgrading 1. Internal API changes ======================== + e. Codepage handling on Windows + + A set of new APIs was introduced, which allows to handle codepage + conversions. The corresponding prototypes and macros are contained + in win32/codepage.h. + + Functions with php_win32_cp_* signatures provide handling for various + codepage aspects. Primarily they are in use at various places in the + I/O utils code and directly in the core where necessary, providing + conversions to/from UTF-16. Arbitrary conversions between codepages + are possible as well, whereby UTF-16 will be always an intermediate + state in this case. + + For input length arguments, the macro PHP_WIN32_CP_IGNORE_LEN can be + passed, then the API will calculate the length. For output length + arguments, the macro PHP_WIN32_CP_IGNORE_LEN_P can be passed, then + the API won't set the output length. + + The mapping between encodings and codepages is provided by the predefined + array of data contained in win32/cp_enc_map.c. To change the data, + a generator win32/cp_enc_map_gen.c needs to be edited and run. + + f. Path handling on Windows + + A set of new APIs was introduced, which allows to handle UTF-8 paths. The + corresponding prototypes and macros are contained in win32/ioutil.h. + + Functions with php_win32_ioutil_* signatures provide POSIX I/O analogues. + These functions are integrated in various places across the code base to + support Unicode filenames. While accepting char * arguments, internally + the conversion to wchar_t * happens. Internally almost no ANSI APIs are + used, but directly their wide equivalents. The string conversion rules + correspond to those already present in the core and depend on the current + encoding settings. Doing so allows to move away from the ANSI Windows API + with its dependency on the system OEM/ANSI codepage. + + Thanks to the wide API usage, the long paths are now supported as well. The + PHP_WIN32_IOUTIL_MAXPATHLEN macro is defined to 2048 bytes and will override + the MAXPATHLEN in files where the header is included. + + The most optimal use case for scripts is utilizing UTF-8 for any I/O + related functions. UTF-8 filenames are supported on any system disregarding + the system OEM/ANSI codepage. + + ======================== 2. Build system changes ======================== -- 2.40.0