Andy Heninger [Thu, 2 Sep 2021 23:09:04 +0000 (16:09 -0700)]
ICU-21662 Improve UVector error handling.
This is the next installment of UVector error handling cleanup. It includes:
- Revise UStack to follow the conventions of UVector, to leave the stack
unmodified when there is an incoming error code. And, for stacks with a
deleter function, to delete the incoming object if it cannot be
succesfully pushed.
- Review all useage of UStack in ICU; adjust call sites as needed.
- Review all uses of UVector::addElementX() in the implementation of
- Regular Expressions
- Break Iteration
- toolutil/xmlparser
changing to the updated functions, and adjusting call sites as needed.
Markus Scherer [Fri, 3 Sep 2021 21:53:25 +0000 (14:53 -0700)]
ICU-21652 add emoji properties of strings
- 7 new properties: API constants & property names
- u_stringHasBinaryProperty(s, property) & UCharacter.hasBinaryProperty(s, property)
- two additional source data files
- new genprops part for writing new binary data file uemoji.icu
- data for existing emoji properties moved from uprops.icu (hardcoded in C++) to uemoji.icu (always loaded)
- new EmojiProps implementation
Rich Gillam [Tue, 17 Aug 2021 23:28:32 +0000 (16:28 -0700)]
ICU-21460 Changed the ULocale initializers to allow locale IDs that use BCP47 syntax, but with '_' as a field delimiter.
(APIs that specifically require BCP47 syntax are unaffected-- they still require '-').
Andy Heninger [Wed, 28 Jul 2021 18:20:53 +0000 (11:20 -0700)]
ICU-21662 Improve UVector error handling.
- Add updated versions of UVector::addElement() and ensureCapacity() that respect
incoming errors.
Follow on to c26aebe, which renamed the original versions.
- Add UVector::adoptElement() as a replacement for addElement() when the vector
has a deleter function set, meaning that it adopts ownership of its elements.
The intent is to make the behavior clearer at the call sites when looking
at unfamiliar code.
- Make all functions with an incoming failure, as indicated by a UErrorCode parameter,
leave the vector unchanged.
- Change all functions that store object pointers into the vector such that,
when the store cannot be completed for any reason _and_ the vector has a deleter function,
then the incoming object is deleted.
This change can simplify the error handling code around calls to the affected functions
(addElement() and insertElementAt(), in particular)
- Add index bounds checking on functions where it was possible - that is, on functions
with both U_ErrorCode and index parameters.
- Changed to more modern C++ idioms in some parts of the UVector implementation.
- Review & update as required all uses of the UVector functions
setElementAt(), insertElementAt(), setSize(), sortedInsert()
these being the functions with changed behavior on error conditions
(aside from addElement()).
This PR will be followed by more, switching call sites in various ICU services
from UVector::addElementX() (old behavior on errors)
to UVector::addElement() (new behavior on errors)
Markus Scherer [Thu, 26 Aug 2021 21:44:49 +0000 (14:44 -0700)]
ICU-20769 refactor UResourceBundle.fResData & init_resb_result()
- replace UResourceBundle.fResData with access via fData
- change ResourceDataValue.resData into a pointer
- move C tests to C++ that stack-allocate now-C++ UResourceBundle
- init_resb_result():
- remove redundant ResourceData parameter
- simplify memory management, and various small simplifications
- /LOCALE/path reuse valid locale, no ures_open()
- pull alias-fetching code into separate function getAliasTargetAsResourceBundle()
- `const char *` for keyPath pieces
- s/noAlias/recursionDepth/
- getAllItemsWithFallback() set correct parentBundle.fValidLocaleDataEntry
Rich Gillam [Sat, 28 Aug 2021 00:03:22 +0000 (17:03 -0700)]
ICU-21671 Corrected a bug in SimpleDateFormat.subParse() that was causing us to always parse quarter names that begin
with numbers as though the number was the whole quarter name.
Jeff Genovy [Fri, 27 Aug 2021 02:23:16 +0000 (19:23 -0700)]
ICU-21710 Additional clean up after removing BOYER_MOORE code from usearch.cpp
Changes:
- We can completely remove the shift tables and related fields from
data structs.
- Creation of UStringSearch objects should be faster now,
as it doesn't waste time computing the unused shift tables.
- The sizeof(UStringSearch) is decreased from 5240 to 3192, so
this should help to reduce memory for applications that create many string search objects.
Note regarding the comments on initialize(). It actually does not set illegal argument error if pattern is all ignoreables. Added a test case for it.
Fredrik Roubert [Thu, 26 Aug 2021 13:15:30 +0000 (15:15 +0200)]
ICU-20973 Use standard keywords true & false to initialize type bool.
Now when all equality operators return standard bool (commit 633438f),
it no longer makes any sense to use the ICU4C constants TRUE & FALSE
or local variables of type UBool for their return value.
Fredrik Roubert [Wed, 25 Aug 2021 16:11:20 +0000 (18:11 +0200)]
ICU-20973 Rewrite polymorphic CacheKeyBase equality operators for C++20.
The existing polymorphic equality operators that use different types for
the `this` and `other` objects are ambiguous with C++20 resolution rules
that require equality for reversed arguments.
In order to resolve that, while also possibly making the implementation
somewhat simpler overall, the implementation classes (LocaleCacheKey
and DateFmtBestPatternKey) now get normal (non-polymorphic) equality
operators that are trivially non-ambiguous (and as a bonus also don't
need any type casts), while the dynamic type checking logic is moved
into protected helper functions, which in the end are invoked
(without any ambiguity) by friend operators in the base class.
This way, all equality testing of cache key objects ends up taking one
of these two possible paths:
1. Both sides of the equality operator are of the same implementation
type (ie. LocaleCacheKey or DateFmtBestPatternKey):
The type specific equality operator is called directly, comparing the
relevant attributes of the two objects directly.
2. The two sides of the equality operator are either of different types
or of some base class type:
The friend equality operators of CacheKeyBase call the virtual helper
function to figure out whether the two objects are actually of the
same type and if they are and this type is an implementation type
then does the necessary type cast to get to 1.
ICU-21639 Added an internal utility class to streamline preflighting and heap-allocating a char buffer for a locale ID
and changed several internal methods in ULocale to use it, so that they work correctly on locale IDs that are longer
than ULOC_FULLNAME_CAPACITY.
Andy Heninger [Tue, 27 Jul 2021 22:51:34 +0000 (15:51 -0700)]
ICU-21662 Rename UVector::addElement().
This is the first step towards improving the error handling and out-of-memory
behavior of UVector::addElement(). A followup PR will add back a new addElement()
with corrected error handling, then additional followups will switch call sites
from the original (renamed) function to the new addElement().
This commit includes no logic or behavior changes; it only renames the existing functions.
Some links fixed, copyright notices added, filenames improved, Sidebar
navigation links to the new pages, but needs some further
improvements. Updated /trac/ticket/ links, and /trac/changeset/ links
where I could find the corresponding git commit. Also tweaked
userguide/dev/editing.md to clarify 'root directory'. Apply branch
rename: s/master/main/.