Jeff Genovy [Fri, 27 Aug 2021 02:23:16 +0000 (19:23 -0700)]
ICU-21710 Additional clean up after removing BOYER_MOORE code from usearch.cpp
Changes:
- We can completely remove the shift tables and related fields from
data structs.
- Creation of UStringSearch objects should be faster now,
as it doesn't waste time computing the unused shift tables.
- The sizeof(UStringSearch) is decreased from 5240 to 3192, so
this should help to reduce memory for applications that create many string search objects.
Note regarding the comments on initialize(). It actually does not set illegal argument error if pattern is all ignoreables. Added a test case for it.
Fredrik Roubert [Thu, 26 Aug 2021 13:15:30 +0000 (15:15 +0200)]
ICU-20973 Use standard keywords true & false to initialize type bool.
Now when all equality operators return standard bool (commit 633438f),
it no longer makes any sense to use the ICU4C constants TRUE & FALSE
or local variables of type UBool for their return value.
Fredrik Roubert [Wed, 25 Aug 2021 16:11:20 +0000 (18:11 +0200)]
ICU-20973 Rewrite polymorphic CacheKeyBase equality operators for C++20.
The existing polymorphic equality operators that use different types for
the `this` and `other` objects are ambiguous with C++20 resolution rules
that require equality for reversed arguments.
In order to resolve that, while also possibly making the implementation
somewhat simpler overall, the implementation classes (LocaleCacheKey
and DateFmtBestPatternKey) now get normal (non-polymorphic) equality
operators that are trivially non-ambiguous (and as a bonus also don't
need any type casts), while the dynamic type checking logic is moved
into protected helper functions, which in the end are invoked
(without any ambiguity) by friend operators in the base class.
This way, all equality testing of cache key objects ends up taking one
of these two possible paths:
1. Both sides of the equality operator are of the same implementation
type (ie. LocaleCacheKey or DateFmtBestPatternKey):
The type specific equality operator is called directly, comparing the
relevant attributes of the two objects directly.
2. The two sides of the equality operator are either of different types
or of some base class type:
The friend equality operators of CacheKeyBase call the virtual helper
function to figure out whether the two objects are actually of the
same type and if they are and this type is an implementation type
then does the necessary type cast to get to 1.
ICU-21639 Added an internal utility class to streamline preflighting and heap-allocating a char buffer for a locale ID
and changed several internal methods in ULocale to use it, so that they work correctly on locale IDs that are longer
than ULOC_FULLNAME_CAPACITY.
Andy Heninger [Tue, 27 Jul 2021 22:51:34 +0000 (15:51 -0700)]
ICU-21662 Rename UVector::addElement().
This is the first step towards improving the error handling and out-of-memory
behavior of UVector::addElement(). A followup PR will add back a new addElement()
with corrected error handling, then additional followups will switch call sites
from the original (renamed) function to the new addElement().
This commit includes no logic or behavior changes; it only renames the existing functions.
Some links fixed, copyright notices added, filenames improved, Sidebar
navigation links to the new pages, but needs some further
improvements. Updated /trac/ticket/ links, and /trac/changeset/ links
where I could find the corresponding git commit. Also tweaked
userguide/dev/editing.md to clarify 'root directory'. Apply branch
rename: s/master/main/.
Frank Tang [Thu, 29 Apr 2021 07:07:45 +0000 (00:07 -0700)]
ICU-21569 Add GA to test LSTM configuration
1. Add GA to test BreakIterator under LSTM configuration (remove Thai
and Burmese dictionary and include Thai and Burmese LSTM)
2. Add LSTMDataName for the purpose of testing.
3. Add file base test code to test BreakIterator match results from test
file generated by pythong code in
https://github.com/unicode-org/lstm_word_segmentation/blob/master/segment_text.py
4. Fix a LSTMBreakEngine::divideUpDictionaryRange bug when the return value
should only contains the number of words found when the passed in foundBreaks
already contains some data.
5. Change the cintltest TestSwapData from testing thaidict to laodict so
it will not break while we filter out thaidict under the LSTM
configuration.
Andy Heninger [Sun, 25 Apr 2021 22:45:18 +0000 (15:45 -0700)]
ICU-21591 Release lock in SimpleDateFormat::tzFormat in case of failure
Also remove the use of the unsafe double-checked lock idiom in the same
function, SimpleDateFormat::tzFormat(). Synchronization now always uses a
mutex, which is slower, but in the context of format or parse operations,
shouldn't be significant.
Added synchronization to one more unsafe direct reference to a const
SimpleDateFormat::fTimeZoneFormat. In the assignment operator.