Fredrik Roubert [Thu, 12 Mar 2020 21:45:00 +0000 (22:45 +0100)]
ICU-20803 Pass ByteSink to _canonicalize().
This eliminates the need for the fixed size scratch buffer inside of
locale_set_default_internal() and also eliminates the need for counting
bytes, something that ByteSink and CharString now will handle correctly,
when needed.
None of this should have any externally visible effect (apart from
removing the arbitrary size limit imposed by the fixed size scratch
buffer), it's all about cleaning up implementation internals.
Steven R. Loomis [Fri, 20 Mar 2020 21:41:01 +0000 (14:41 -0700)]
ICU-20976 GCC 8 fixes phase 1
Some initial fixes for GCC 8
- setup a GCC 8 buildbot with -Wextra
- rewrite ucol_sit to use CharString
- workaround for gcc7+ on mac
see https://github.com/arbor-sim/arbor/issues/562#issuecomment-409970434
- fix ucnv_2022 strcpy site
Fix the issue identified by Coverity.
The problem was in code handling the mapping from the table build time
representation of a set of status values for an RBBI rule to the corresponding
status data as saved in a binary RBBI rule file.
The problem was benign, the rbbi data built by the incorrect code would
would still operate correctly, although it might not byte-for-byte match
that built by ICU4C. (The problem was in Java only.)
contains shadow stack (SHSTK) and indirect branch tracking (IBT). When
CET is enabled, ELF object files must be marked with .note.gnu.property
section. GCC provides <cet.h> which can be included in assembly codes
to generate CET maker when compiling with -fcf-protection.
Steven R. Loomis [Sat, 31 Aug 2019 00:31:38 +0000 (17:31 -0700)]
ICU-20797 fix UBS compilation error and UBS in test code
Two issues here:
- fix 2 build issue in i18n when compiling with clang++ -fsanitize=undefined
the following two symbols were not exported (and they should be):
typeinfo for icu::CollationCacheEntry
typeinfo for icu::numparse::impl::CodePointMatcher
- remove undefined behavior warning in NumberFormatTestTuple.. minor, but very annoying
when repeated many times during every test run. Tends to mask real errors.
> numberformattesttuple.cpp:319:5: runtime error: member access within null pointer of type 'NumberFormatTestTuple'
Markus Scherer [Thu, 5 Mar 2020 23:03:42 +0000 (15:03 -0800)]
ICU-20915 LocaleMatcher no match: always getSupportedIndex()=-1; remove defaultLocaleIndex field; constructor check if locales are equivalent to default, not just equal; simplify locale sorting; minor builder & test deflaking
Jeff Genovy [Thu, 5 Mar 2020 22:33:13 +0000 (14:33 -0800)]
ICU-21000 Fix abort called by DateTimePatternGenerator::getDefaultHourCycle
If you call the API getDefaultHourCycle on an empty DateTimePatternGenerator
instance (ie: no locale) then it calls UPRV_UNREACHABLE which calls abort().
We should return an error code instead of aborting.
Yoshito Umaoka [Tue, 25 Feb 2020 02:48:56 +0000 (21:48 -0500)]
ICU-20975 BRS ICU 66rc - J API signature, API change report and serialization test data
- Added ICU 66.1 serialization test data and removed ICU 61.1 serialization test data.
- Added ICU 66 API signature file and removed ICU 56 API signature file
- Updated API change report
Andy Heninger [Fri, 14 Feb 2020 05:40:28 +0000 (21:40 -0800)]
ICU-20876 Regex Grapheme Cluster matching with Break Iterators.
Change the implementation of grapheme cluster matching in regex to use an ICU
break iterator instead of a little one-off state machine.
The old implementation had fallen behind the Unicode UAX-29 specification for
graphem clusters, and could not be easily updated.
The implementation follows the same general pattern that is used for finding
word boundaries with an ICU break iterator. In reviewing that code, a few
improvements to the handling of ICU error codes were also made.
Also note that this change adds a new dependency on Break Iteration. Regex
patterns that previously would work with ICU builds that were configured with
no break iteration will now fail. But only if they include \X for matching
grapheme cluster boundaries.
Jeff Genovy [Tue, 7 Jan 2020 09:38:21 +0000 (01:38 -0800)]
ICU-20322 On MinGW, move the DLLs to the "bin" directory.
This change builds on Vincent Torri's changes.
This installs the ICU DLL files in $prefix/bin instead of $prefix/lib.
Note: In order to disable this change in behavior you can edit
the "mh-mingw*" file(s). If you set the variable MINGW_MOVEDLLSTOBINDIR
to NO instead of YES, then it will retain the previous behavior of
installing the DLLs into the bin folder.
Andrew Paprocki [Tue, 12 Nov 2019 00:46:05 +0000 (19:46 -0500)]
ICU-20895 ICU_TIMEZONE_FILES_DIR_PREFIX_ENV_VAR
Adds `ICU_TIMEZONE_FILES_DIR_PREFIX_ENV_VAR`, similar to
`ICU_DATA_DIR_PREFIX_ENV_VAR`, that specifies an environment variable
to retrieve and prepend to the ICU time zone data file path.
Andy Heninger [Sun, 2 Feb 2020 04:20:37 +0000 (20:20 -0800)]
ICU-20939 Fix problem w regexp \b boundaries & UTF-8 text
In regular expressions, when testing for word boundaries with \b, the
boundaries were incorrect when in Unicode mode, meaning that an ICU word break
iterator is being used to find the boundaries, and the text being matched is
UTF-8 encoded.
The bug stemmed from a misunderstanding of how string indexes work with UText
and break iterators, leading to the inclusion of code to convert from UTF-8 to
UTF-16 indexing, when what was wanted was the original UTF-8 index everywhere.
Removing the indexing conversion fixes the problem.
Compiled regular expression patterns make use of several shared common
UnicodeSets. This change simplifies the creation and use of these
static UnicodeSets.
- Pointer fields to the static sets are removed from the compiled patterns,
and the static variables are accessed directly. The deleted pointers
were a hold-over from earlier code that did not use shared statics.
- The UnicodeSet pattern literals are changed from hex constants to
u"string literals".
- The size of fRuleSets (from regexst.h) is changed from a hard-coded 10
to the number of UnicodeSets actually required. Doing this required
a change to regexcst.pl to export the required size. Changing and
rerunning this perl code resulted in massive but benign changes to
the generated file regexcst.h, the result of perl having changed its
order of enumeration of hashes since the file was last regenerated.
- UnicodeSets are frozen when possible. Should result in faster matching.