allenwtsu [Mon, 10 Jan 2022 14:07:13 +0000 (22:07 +0800)]
ICU-21699 Fix CjkBreakEngine performance issue
1. vector.contains() uses sequential comparison, O(n).
As the vector size is great, the performance will be impacted.
Remove this unnecessary check, vector.contains(), in C++.
2. At Java's CjkBreakEngine, replace "vector.contains()" with "if(pos > previous)" to deal with duplicate breakpoint position.
This way, C++ and Java implementation will be synchronous.
Test: ant checkTest -Dtestclass='com.ibm.icu.dev.test.rbbi.RBBITest'
(RBBTest#TestBreakAllChars() can generate duplicate position for word break. It could pass with this change)
Andy Heninger [Sat, 20 Nov 2021 00:33:58 +0000 (16:33 -0800)]
ICU-21763 UVector cleanup in Formatting Code
Revise uses of UVector in Formatting related code to better handle memory
allocation failures. This is one of an ongoing series of commits to address
similar problems with UVector usage throughout ICU.
The changes primarily involve switching uses of UVector::addElementX() to the
new adoptElement() or addElement() functions, as appropriate, and using
LocalPointers for tracking memory ownership.
Andy Heninger [Fri, 12 Nov 2021 00:12:44 +0000 (16:12 -0800)]
ICU-21763 UVector cleanup, continued.
Revise uses of UVector in the next batch of files to better handle memory
allocation failures. This is one of an ongoing series of commits to address
similar problems with UVector usage throughout ICU.
The changes primarily involve switching uses of UVector::addElementX() to the
new adoptElement() or addElement() functions, as appropriate, and using
LocalPointers for tracking memory ownership.
Andy Heninger [Tue, 9 Nov 2021 20:53:59 +0000 (12:53 -0800)]
ICU-21763 UVector cleanup in Locale & Region Code
Revise uses of UVector in Locale and Region related code to better handle
memory allocation failures. This is one of an ongoing series of commits to
address similar problems with UVector usage throughout ICU.
The changes involve switching uses of UVector::addElementX() to the new
adoptElement() or addElement() functions, as appropriate, and using
LocalPointers for tracking memory ownership.
In Region::loadRegionData(), improved the overall error detection and recovery.
Andy Heninger [Sat, 30 Oct 2021 00:17:41 +0000 (17:17 -0700)]
ICU-21778 UnicodeString::clone error handling fix
Change UnicodeString::clone() to return a nullptr if the underlying copy
constructor produces a bogus string. This can happen if the copy constructor
encounters a memory allocation failure in allocating the copy's internal string
buffer, or if the string being copied was already bogus.
The change is consistent with other ICU clone functions, which are generally
defined to return nullptr in case of errors.
Andy Heninger [Sat, 2 Oct 2021 05:20:18 +0000 (22:20 -0700)]
ICU-21763 UVector cleanup in vtzone.cpp
Revise uses of UVector in vtzone.cpp to better handle memory allocation
failures. This is one of an ongoing series of commits to address similar
problems with UVector usage throughout ICU.
The changes primarily involve switching uses of UVector::addElementX() to the
new adoptElement() or addElement() functions, as appropriate, and using
LocalPointers for tracking memory ownership.
Andy Heninger [Mon, 11 Oct 2021 18:02:42 +0000 (11:02 -0700)]
ICU-21763 UVector cleanup in time zone code
Revise uses of UVector in time zone related code to better handle memory
allocation failures. This is one of an ongoing series of commits to address
similar problems with UVector usage throughout ICU.
The changes involve switching uses of UVector::addElementX() to the new
adoptElement() or addElement() functions, as appropriate, and using
LocalPointers for tracking memory ownership.
ICU-21649 Adds build and execution of Unicode update tools to GHA CI.
Checks that the build process completes without failure and that the
generated data is identical with the data currently in the repository.
ICU-21581 Creates a new workflow to be activated upon merge into main or
maintenance branches only and adds GHA CI automation of BRS task 'Test ICU4J
with only little-endian ICU4C data', cf.:
https://unicode-org.github.io/icu/processes/release/tasks/integration.html#test-icu4j-with-only-little-endian-icu4c-data.
ICU-21581 Adds copyright notice and comment to new GHA script.
Update ICU (main branch and upcoming version 70) halfway to 2021b.
- with Samoa & Jordan rule updates
- with corrected pre-1993 transitions in Malawi (?), Portugal, etc.
- without for now (due to release timing) renaming Pacific/Enderbury to Pacific/Kanton
- without merging many zones whose timestamps agree since 1970
Erik Torres [Wed, 22 Sep 2021 22:51:08 +0000 (22:51 +0000)]
ICU-21581 BRSRC 70.1 Version update and regenerate configure for v70.1
In this PR, I am updating the version number from 70.0.1 for the BRS task.
Previously, we had frontloaded part of this, so the diff in this PR are not as numerous.
It has also been decided that we should differentiate frontloaded tasks and RC tasks by having the version numbers being:
Frontload version number: XX.X.X (70.0.1 -> Major.minor.patch)
RC/GA version number: XX.X (70.1 -> Major.minor)
I've added some documentation for this, for future releases :)
Andy Heninger [Wed, 15 Sep 2021 21:51:48 +0000 (14:51 -0700)]
ICU-21662 UVector cleanup in rbtz.cpp
Revise uses of UVector in rbtz.cpp to better handle memory allocation failures.
This is one of an ongoing series of commits to address similar problems with
UVector usage throughout ICU.
The changes include
- Use LocalPointers and UVector deleter functions to simplify OOM checking and recovery.
- Fix RuleBasedTimeZone::addTransitionRule(rule) to have standard ICU adopt behavior
when errors occur, meaning automatic deletion of the incoming rule. This simplifies
both the implementation of the function and the code at the call sites.
- Update addTransitionRule() call sites. Includes modifying the Dangi calendar initializtion
to not silently ignore errors.
- struct Transition is changed to derive from UMemory, which allows the use of LocalPointers.
Steven R. Loomis [Sat, 18 Sep 2021 00:51:23 +0000 (19:51 -0500)]
ICU-21756 icu4j: port UnicodeKnownIssues.java from CLDR
- port of CLDR-14588
- use a fake Consumer<String>
- currently logs after each test class, not ideal but better
- Formerly ICU-12589 but that is not as related
- add unit test
Andy Heninger [Sat, 11 Sep 2021 23:57:03 +0000 (16:57 -0700)]
ICU-21662 UVector cleanup in basictz.cpp
Revise uses of UVector in basictz.cpp to better handle memory allocation failures.
This is one of an ongoing series of commits to address similar problems with
UVector usage throughout ICU.
The changes primarily involve switching to the use of LocalPointers for
the tracking of memory ownership, and to simplify cleanup in case of errors.
In the function BasicTimeZone::getTimeZoneRulesAfter(), also switched some additional
allocated memory to use LocalPointer or LocalMemory, for consistency in
memory handling.
Andy Heninger [Fri, 10 Sep 2021 21:59:46 +0000 (14:59 -0700)]
ICU-21662 UVector Error Handling in Regex
Clean up some oversights from commit 0ec329c. This was triggered by fuzz testing finding
a memory leak with the original commit, see https://oss-fuzz.com/testcase-detail/4656781834452992
I was unable to replicate the fuzzing failure, but reviewing the nearby code showed
some likely problems.
In this commit,
- Fix UStack::pop() to not delete the popped element when a deleter function is present.
This was a bug, but because there were no stacks with deleters, was not causing trouble.
- Change RegexCompile::compileSet() to delete the set if it cannot be added to the internal
vector of sets. I suspect this is the cause of the fuzzing leak - 0ec329c changed the
behavior of UVector in the presence of errors.
- Changed RegexCompile::fSetStack to use an object deleter function. This fixes the
leak checking at the points new elements are pushed onto this stack.
Rich Gillam [Fri, 6 Aug 2021 21:00:41 +0000 (14:00 -0700)]
ICU-13353 Fixed several problems preventing horizontal resource inheritance from working as intended, and added a
Java version of a unit test I'd previously only added on the C++ side.
ICU-21735 Fixed two old references to UNUM_CURRENCY_CASH, one in a comment, one in a unit-test failure message--
to refer to the now-correct UNUM_CASH_CURRENCY.
Andy Heninger [Thu, 2 Sep 2021 23:09:04 +0000 (16:09 -0700)]
ICU-21662 Improve UVector error handling.
This is the next installment of UVector error handling cleanup. It includes:
- Revise UStack to follow the conventions of UVector, to leave the stack
unmodified when there is an incoming error code. And, for stacks with a
deleter function, to delete the incoming object if it cannot be
succesfully pushed.
- Review all useage of UStack in ICU; adjust call sites as needed.
- Review all uses of UVector::addElementX() in the implementation of
- Regular Expressions
- Break Iteration
- toolutil/xmlparser
changing to the updated functions, and adjusting call sites as needed.