Markus Scherer [Sat, 9 Feb 2019 22:20:56 +0000 (14:20 -0800)]
ICU-20467 get XLocaleMatcher ready for drop-in
Get XLocaleMatcher ready for replacing the LocaleMatcher code.
More simplifications beyond ICU-20330 PR #409, smaller data, some more optimizations.
New API ready to be moved over.
- less work for region partitions distance lookup:
- encode each array of single-character partition strings as one string
- look up each desired partition only once, not for each (desired, supported) pair
- look up the * fallback region distance only for the first mismatch, not for each non-matching pair
- skip region distance lookup if minRegionDistance>=remainingThreshold
- locale distance table: remove subtables that contain only *-* with default script/region distance
- mark intermediate subtag matches via last-character bit 7, not also with a match value
- likely subtags data: prune trailing *-only levels, and skip *-only script levels; likely subtags perf test
- likely subtags: skip_script=1; LSR.indexForRegion(ill-formed)=0 not negative
- likely subtags small optimization: array lookup for first letter of language subtag
- defaultDemotionPerDesiredLocale=distance(en, en-GB)
- favor=script: still reject a script mismatch
- if an explicit default locale is given, prefer that (by LSR), not the first supported locale
- XLocaleMatcher.Builder: copy supported locales into a List not a Set to preserve input indexes; duplicates are harmless
- match by LSR only, not exact locale match; results consistent with no fastpath, simpler, sometimes a little slower
- internal getBestMatch() returns just the suppIndex
- store the best desired locale & index in an LSR iterator
- make an LSR from Locale without ULocale detour
- adjust the XLocaleMatcher API as proposed; remove unused internal methods; clean up LocalePriorityList docs
Andy Heninger [Mon, 11 Mar 2019 23:36:33 +0000 (16:36 -0700)]
ICU-20488 mutex static constructor fixes.
Remove the dependencies from the ICU library code on static constructors
that were introduced by using std::mutex and condition variables. The
mutexes are lazily initialized by embedding them as local static variables
in getter functions, and relying on the C++ compiler/runtime to do thread
safe initialization of them.
ICU-20470 skip data/rules.mk regen for source tarball
- If icu/source/data/locales/root.txt missing, skip
python rules.mk generation.
- Also, create build directories properly as needed
- Also includes noise changes to configure
(configure was probably generated using unreleased
autoconf 2.70 or 2.69 + patches)
- eac8f4b31ab7395abb3a216aa17bafe7af6314ed did not
regen configure properly, so BUILDTOOL_OPTS is now
ICU_DATA_BUILDTOOL_OPTS
ICU-20470 skip data/rules.mk regen for source tarball
- If icu/source/data/locales/root.txt missing, skip
python rules.mk generation.
- Also, create build directories properly as needed
- Also includes noise changes to configure
(configure was probably generated using unreleased
autoconf 2.70 or 2.69 + patches)
- eac8f4b31ab7395abb3a216aa17bafe7af6314ed did not
regen configure properly, so BUILDTOOL_OPTS is now
ICU_DATA_BUILDTOOL_OPTS
Shane Carr [Tue, 26 Feb 2019 20:52:40 +0000 (12:52 -0800)]
ICU-10923 Fixing dependency graph and filter logic for collation.
- Fixes filterrb.cpp to check for wildcard when at a leaf.
- Adds additional verbose logging to genrb.
- Fixes filtration to add deps to dep_targets instead of dep_files.
- Separates dep_files to common_dep_files and specific_dep_files.
Methods implementead as 'inline' but not declared 'inline' cause clang++
to throw compilation warnings on Windows. This adds 'inline' to the
relevant method declarations.
Norbert Runge [Wed, 20 Feb 2019 23:53:03 +0000 (15:53 -0800)]
ICU-20390 Removes duplicate and obsolete .cpyskip.txt file
in tools/scripts/cpysearch/ directory. The actual .cpyskip.txt file is
now on top level of the repository. Updated readme.txt accordingly.
Fredrik Roubert [Wed, 20 Feb 2019 23:23:02 +0000 (00:23 +0100)]
ICU-20158 Pass ByteSink all the way to _uloc_(addLikely|minimize)Subtags().
This eliminates the need for scratch buffers in any code path that ends
with these functions and also eliminates the need for counting bytes,
something that ByteSink will now handle correctly when needed.
Existing calls to uloc_addLikelySubtags() and uloc_minimizeSubtags()
throughout ICU4C implementation code are also updated to instead use
either the Locale or ulocimp_* functions with the new API.
None of this should have any externally visible effect, it's all about
cleaning up implementation internals.
Norbert Runge [Fri, 8 Feb 2019 21:47:52 +0000 (13:47 -0800)]
ICU-20217 Interprets fuzzer data as UCHar* instead of UTF-8. The conversion
from assumed UTF-8 resulted in an extremely large percentage of Unicode
replacement characters in the data passed to the API under test.
ICU-20217 Uses fuzzer generated bytes to make random selection of locales, converters,
etc., replacing the random number generator. This way the fuzzer can control
the selections.
ICU-20217 Minor follow-ups from code review.
Removes fuzzer target break_iterator_utf32_fuzzer which does not perform
anything useful what the regular break iterator fuzzer target already performs.
ICU-20217 Fixes for-loop body.
ICU-20217 Uses am allocated buffer to pass head-truncated fuzzer data to the
API under test. The fuzzer may otherwise not detect buffer underflow.
by
ICU-20217 Typing fix.
ICU-20217 Fixing typing.
ICU-20217 Improve fuzzer targets, move truncated fuzzer data into a
new buffer to prevent that buffer underflow goes undetected.
ICU-20217 Fixes buffer management of fuzzer-provided data.