Fredrik Roubert [Wed, 20 Feb 2019 23:23:02 +0000 (00:23 +0100)]
ICU-20158 Pass ByteSink all the way to _uloc_(addLikely|minimize)Subtags().
This eliminates the need for scratch buffers in any code path that ends
with these functions and also eliminates the need for counting bytes,
something that ByteSink will now handle correctly when needed.
Existing calls to uloc_addLikelySubtags() and uloc_minimizeSubtags()
throughout ICU4C implementation code are also updated to instead use
either the Locale or ulocimp_* functions with the new API.
None of this should have any externally visible effect, it's all about
cleaning up implementation internals.
Norbert Runge [Fri, 8 Feb 2019 21:47:52 +0000 (13:47 -0800)]
ICU-20217 Interprets fuzzer data as UCHar* instead of UTF-8. The conversion
from assumed UTF-8 resulted in an extremely large percentage of Unicode
replacement characters in the data passed to the API under test.
ICU-20217 Uses fuzzer generated bytes to make random selection of locales, converters,
etc., replacing the random number generator. This way the fuzzer can control
the selections.
ICU-20217 Minor follow-ups from code review.
Removes fuzzer target break_iterator_utf32_fuzzer which does not perform
anything useful what the regular break iterator fuzzer target already performs.
ICU-20217 Fixes for-loop body.
ICU-20217 Uses am allocated buffer to pass head-truncated fuzzer data to the
API under test. The fuzzer may otherwise not detect buffer underflow.
by
ICU-20217 Typing fix.
ICU-20217 Fixing typing.
ICU-20217 Improve fuzzer targets, move truncated fuzzer data into a
new buffer to prevent that buffer underflow goes undetected.
ICU-20217 Fixes buffer management of fuzzer-provided data.
Jeff Genovy [Sun, 17 Feb 2019 21:06:38 +0000 (13:06 -0800)]
ICU-20419 Export internal StackUResourceBundle helper, so it can be used in the i18n library.
Replace all current usages of ures_initStackObject() in the i18n library with the StackUResourceBundle helper.
Jeff Genovy [Fri, 15 Feb 2019 05:24:37 +0000 (21:24 -0800)]
ICU-20210 ICU-20211 Cherry-pick fixes from CLDR to unblock exhaustive tests.
Cherry-pick cldrbug 11492: Bad symbols for NaN in sv, ksh, kl, se locales.
Cherry-pick cldrbug 11491: sd, month name for July uses character not in exemplars.
Updated the various ICU4J *.jar files as well.
Shane Carr [Fri, 25 Jan 2019 03:44:17 +0000 (19:44 -0800)]
ICU-11725 Promoting tech-previews in DecimalFormat to @draft.
- Changes Java DecimalFormat boolean get* methods to is*.
- Makes the new draft methods non-virtual.
- Removes obsolete template class in header file.
- Adds proper U_HIDE tags in unum.h and decimfmt.h
If a file with an input line larger than INT32_MAX (i.e. 2 GB) contains
an UTF8 character after that limit, escapesrc crashes on 64 bit systems
or does not remove incomplete files on 32 bit systems.
The issue is that an unchecked cast from size_t to int32_t can turn
negative, which results in negative offsets during array access.
This will eventually lead to an out of boundary read, which most likely
crashes the tool.
This patch sets a fixed limit on 1 GB to make sure that no side effects
occur if the line is exactly INT32_MAX or a few bytes less. It should
still be way more than anyone would really need.
If gencnval encounters an empty input file the function resolveAlias
triggers an out of boundary write due to uniqueAliasArr pointing to
a 0 byte reserved memory address.
This patch protects the code in question with a check for
knownAliasesCount being not 0.
Shane Carr [Mon, 11 Feb 2019 23:53:31 +0000 (15:53 -0800)]
ICU-20342 Adding FormattedDateInterval in C and C++.
- Adds first "span" field category
- Re-implements DateIntervalFormat#fallbackFormat to use FieldPositionHandler
- New temporary wiring in SimpleFormatter
Shane Carr [Sat, 9 Feb 2019 06:08:16 +0000 (22:08 -0800)]
ICU-13256 Implementing FormattedRelativeDateTime in C, C++, and Java.
- Adds additional logic to NumberStringBuilder.
- Extends logic of number::impl::Field type.
- Adds tests for RBNF support.
- Adds tests from ftang's original PR.
Markus Scherer [Tue, 8 Jan 2019 01:41:08 +0000 (17:41 -0800)]
ICU-20330 simplify LocaleMatcher code:
- widen API from LocalePriorityList to Iterable
- merge getBestMatch(multiple locales) and getBestMatch(single locale) into one function
- process desired locales incrementally, create fewer objects
- reject poor matches early: use bestDistance-demotion for threshold
- add API for java.util.Locale, convert incrementally
- new feature: tracks indexes of supported and desired locales which eliminates conversion of result objects in wrappers around getBestMatch() as shown by the java.util.Locale API here
- simpler data structures, more serialization-friendly (easier to port to C++)
- e.g., use a BytesTrie each for likelySubtags & locale distance, instead of layers of TreeMap
- un-hardcode locale matcher data; use modern resource bundle functions
- split builder code & runtime code into separate classes
- move LSR to simple top-level value class, cache regionIndex in LSR
- simpler handling of private use languages and pseudolocales
- simplify RegionMapper
- LocaleDistance builder: move the node distance into the DistanceTable, remove DistanceNode
- support distance rules with region codes, not just with variables
- enforce & use distance rule constraints:
- no rule with *,supported or desired,*
- no rule with language * and script/region non-*
- distance trie collapse a (desired, supported)=(ANY, ANY) pair into a single *
- look up each desired language only once for all supported LSRs
- remove layers-of-Maps compaction (trie builder compacts)
- remove unused XML printing
- remove other unused code
- make XLocaleMatcherTest.testPerf() exercise locale distance lookup code
Jeff Genovy [Mon, 15 Oct 2018 18:07:39 +0000 (11:07 -0700)]
ICU-20204 ICU4C: Use the CreateFileMapping API for both the UWP version and Win32 versions.
- CreateFileMappingW is marked for both desktop and UWP apps, so we can call that in both code paths.
- We can use the W version of the CreateFileMapping API instead of A version since we pass a NULL for the name anyways.
- We can call the same API CreateFile[A|W] from both the UWP and Win32 versions of the code, reducing one of the UWP forks.
- Add a work-around for older versions of the Windows 10 SDK UWP headers.
- Remove the code that was creating a custom security descriptor (but setting everything to NULL) and pass null to the API directly. This way we will get the default security descriptor instead of the NULL dacl.
- Change to use nullptr instead of NULL in C++ code.
Jeff Genovy [Mon, 4 Feb 2019 22:29:09 +0000 (14:29 -0800)]
ICU-13847 ICU-20381 Improve handling of errors (Out-of-Memory) in DecimalFormat class.
- Use move assignment for fields->formatter (LocalizedNumberFormatter) instead of creating new heap object every time.
- Add test cases for DecimalFormat object in invalid state.
- Protect against self-assignment in assignment operator.
- Fix segmentation fault when attempting to compare valid and invalid DecimalFormat objects.
- Changes based on review feedback from Shane.
- Fix minor typos in the public header file.
Norbert Runge [Fri, 1 Feb 2019 18:22:10 +0000 (10:22 -0800)]
ICU-20217 Replaces seed corpus zip files with the original txt files.
The problem is that Docker receives zip files only as LFS links when
cloning ICU from GitHub. Converting the txt files into zip files, which
is the required corpus format for the fuzzer, will be done by the oss-fuzz
build script.
ICU-20217 Adds fuzzer seed corpus files to the list of files that don't have
copyright notice.
Norbert Runge [Thu, 31 Jan 2019 22:22:38 +0000 (14:22 -0800)]
ICU-20386 Adds workaround to icu4c/source/data/Makefile.in: Help python to find
the buildtools directory it needs when running the ICU4C unit tests in an
out-of-source installation.
The change is a quick workaround for now for an issue that can have wide impact.