Fredrik Roubert [Wed, 2 Sep 2020 19:15:56 +0000 (21:15 +0200)]
ICU-21035 Delete meaningless calls to uloc_getKeywordValue().
These two calls to uloc_getKeywordValue() write to the buffer "id" which
then immediately after is overwritten by calls to idForLocale() so they
can simply be removed without any loss of functionality.
Fredrik Roubert [Wed, 2 Sep 2020 15:04:58 +0000 (17:04 +0200)]
ICU-21035 Pass ByteSink from Locale::getKeywordValue() to uloc_getKeywordValue().
This eliminates the need for a scratch buffer in Locale::getKeywordValue()
and also the need for counting bytes required in uloc_getKeywordValue(),
something that ByteSink will now handle correctly.
Move them UniqueCharStrings and CharStringMap from
loclikelysubtags.{h,cpp} to separate header files
so so we can reuse them to implement
https://github.com/unicode-org/icu/pull/1254
Jeff Genovy [Wed, 2 Sep 2020 00:53:58 +0000 (17:53 -0700)]
ICU-21108 Update to use/support VS2019, and add extra CI builds for VS2017 and VS2019.
Change to use the Windows 10 SDK for Win32 (Win7) builds on the VS2019 ADO images.
The Windows 10 SDK is backwards compatible to Windows 7, if WINVER and
_WIN32_WINNT are set before compiling.
The one single caller of _getVariantEx() is _getVariant() and this in
turn only has 3 callers, one of which is commented out (so it can be
deleted), one of which doesn't actually do anything (so it too can be
deleted) and one which can be replaced by inlining the use of
CheckedArrayByteSink.
This also allows _getVariantEx() to be renamed to _getVariant() as it's
the only such function left now.
Fix kilogram parsing: ignore 'kilogram' as a stem, we have 'gram'.
Failures in the added unit test before the fix: withSIPrefix resulted
in 'microkilogram', and kilogram's prefix was considered to be
"ONE" (i.e. 10^0).
Fredrik Roubert [Fri, 21 Aug 2020 19:45:07 +0000 (21:45 +0200)]
ICU-21035 Remove obsolete use of CharString::getAppendBuffer().
The complicated buffer allocation code is inherited from times past but
no longer serves any purpose, it's now possible to instead simply call
the CharString copy constructor.
By always calling the dynamic memory allocation implementation directly
instead, the fixed memory buffer boundary gets pushed one step further
towards the edges.
By always calling the dynamic memory allocation implementation directly
instead, the fixed memory buffer boundary gets pushed one step further
towards the edges.
By always calling the dynamic memory allocation implementation directly
instead, the fixed memory buffer boundary gets pushed one step further
towards the edges.
Yoshito Umaoka [Wed, 19 Aug 2020 19:08:25 +0000 (15:08 -0400)]
ICU-21219 Fix for Java version number overflow problem
Internal API VersionInfo.javaVersion() maps Java version number to 4 integer fields. Each field must be up to 255. However, recent OpenJDK 8 update exceed this range.
Luckily, we have only one reference in our code base for checking Java version. CharsetUTF16 uses maxBytePerChar = 4 for Java 5 and older, maxBytePerChar = 2 for newer Java version. Because we no longer support Java 5 runtime, we don't need this conditional check.
We don't have any other uses of VersionInfo.javaVersion(). Java's version range is not what we can control, so I decided to delete the internal use only API completely.
Frank Tang [Fri, 14 Aug 2020 23:39:26 +0000 (16:39 -0700)]
ICU-21159 Document U_USING_DEFAULT_WARNING in .h
Document the fact
uloc_getDisplay(Language|Script|Country|Variant|Keyword|KeywordValue)
would fallback with the code, case canonicalied in same cases, and
set the status to U_USING_DEFAULT_WARNING.
No change to the implementation behavior. Only complete the missing
comments and tweak line wrap, remove double spaces and add test to
validate this pre-existing behavior that I added the documents now.
Frank Tang [Wed, 29 Jul 2020 00:05:26 +0000 (17:05 -0700)]
ICU-20684 Fix uninitialized in isMatchAtCPBoundary
Downstream bug https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15505
Fix Fuzzer-detected Use-of-uninitialized-value in isMatchAtCPBoundary
To test to show the bug in the new test case, configure and build with
CFLAGS="-fsanitize=memory" CXXFLAGS="-fsanitize=memory" ./runConfigureICU \
--enable-debug --disable-release Linux --disable-layoutex
Andy Heninger [Wed, 8 Jul 2020 00:12:09 +0000 (17:12 -0700)]
ICU-21178 Add check for corrupt rbbitst.txt data.
In the test data from rbbitst.txt, two or more adjacent boundary markers with
no intervening test data were accepted, with no indication of a problem.
This situation occurred, as described in bug ICU-21178, with a bad import of
some test cases from CLDR. PR #1194 corrected the problem with the test data
in ICU4C. This PR adds code to flag this situation in the test data, and
also propagates the data fix to ICU4J's copy of rbbitst.txt.
Andy Heninger [Sat, 27 Jun 2020 00:52:40 +0000 (17:52 -0700)]
ICU-13590 RBBI, improve handling of concurrent look-ahead rules.
Change the mapping from rule number to boundary position to use a simple array
instead of a linear search lookup map.
Look-ahead rules have a preceding context, a boundary position, and following context.
In the implementation, when the preceding context matches, the potential boundary
position is saved. Then, if the following context proves to match, the saved boundary is
returned as an actual boundary.
Look-ahead rules are numbered, and the implementation maintains a map from
rule number to the tentative saved boundary position.
In an earlier improvement to the rule builder, the rule numbering was changed to be a
contiguous sequence, from the original sparse numbering. In anticipation of
changing the mapping from number to position to use a simple array.