From: Markus Scherer Date: Tue, 9 Sep 2014 22:05:13 +0000 (+0000) Subject: ICU-7118 document that compare() is often more efficient than getSortKey() X-Git-Tag: milestone-59-0-1~1575 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=8c4f2b0036721fc270ada65eacd69498e99f0ea3;p=icu ICU-7118 document that compare() is often more efficient than getSortKey() X-SVN-Rev: 36414 --- diff --git a/icu4c/source/i18n/unicode/coll.h b/icu4c/source/i18n/unicode/coll.h index a7932759de4..1d2ff5b37f5 100644 --- a/icu4c/source/i18n/unicode/coll.h +++ b/icu4c/source/i18n/unicode/coll.h @@ -135,20 +135,12 @@ class CollationKey; * \endcode * * \htmlonly\endhtmlonly -*

-* For comparing strings exactly once, the compare method -* provides the best performance. When sorting a list of strings however, it -* is generally necessary to compare each string multiple times. In this case, -* sort keys provide better performance. The getSortKey methods +* +* The getSortKey methods * convert a string to a series of bytes that can be compared bitwise against * other sort keys using strcmp(). Sort keys are written as -* zero-terminated byte strings. They consist of several substrings, one for -* each collation strength level, that are delimited by 0x01 bytes. -* If the string code points are appended for UCOL_IDENTICAL, then they are -* processed for correct code point order comparison and may contain 0x01 -* bytes but not zero bytes. -*

-*

+* zero-terminated byte strings. +* * Another set of APIs returns a CollationKey object that wraps * the sort key bytes instead of returning the bytes themselves. *

@@ -482,11 +474,14 @@ public: /** * Transforms the string into a series of characters that can be compared * with CollationKey::compareTo. It is not possible to restore the original - * string from the chars in the sort key. The generated sort key handles - * only a limited number of ignorable characters. + * string from the chars in the sort key. *

Use CollationKey::equals or CollationKey::compare to compare the * generated sort keys. * If the source string is null, a null collation key will be returned. + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * * @param source the source string to be transformed into a sort key. * @param key the collation key to be filled in * @param status the error code status. @@ -501,11 +496,14 @@ public: /** * Transforms the string into a series of characters that can be compared * with CollationKey::compareTo. It is not possible to restore the original - * string from the chars in the sort key. The generated sort key handles - * only a limited number of ignorable characters. + * string from the chars in the sort key. *

Use CollationKey::equals or CollationKey::compare to compare the * generated sort keys. *

If the source string is null, a null collation key will be returned. + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * * @param source the source string to be transformed into a sort key. * @param sourceLength length of the collation key * @param key the collation key to be filled in @@ -980,6 +978,10 @@ public: * Get the sort key as an array of bytes from a UnicodeString. * Sort key byte arrays are zero-terminated and can be compared using * strcmp(). + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * * @param source string to be processed. * @param result buffer to store result in. If NULL, number of bytes needed * will be returned. @@ -996,6 +998,10 @@ public: * Get the sort key as an array of bytes from a UChar buffer. * Sort key byte arrays are zero-terminated and can be compared using * strcmp(). + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * * @param source string to be processed. * @param sourceLength length of string to be processed. * If -1, the string is 0 terminated and length will be decided by the diff --git a/icu4c/source/i18n/unicode/tblcoll.h b/icu4c/source/i18n/unicode/tblcoll.h index 64f092c63f2..40d50902174 100644 --- a/icu4c/source/i18n/unicode/tblcoll.h +++ b/icu4c/source/i18n/unicode/tblcoll.h @@ -6,7 +6,7 @@ */ /** - * \file + * \file * \brief C++ API: The RuleBasedCollator class implements the Collator abstract base class. */ @@ -343,34 +343,38 @@ public: UErrorCode &status) const; /** - * Transforms a specified region of the string into a series of characters - * that can be compared with CollationKey.compare. Use a CollationKey when - * you need to do repeated comparisions on the same string. For a single - * comparison the compare method will be faster. - * @param source the source string. - * @param key the transformed key of the source string. - * @param status the error code status. - * @return the transformed key. - * @see CollationKey - * @stable ICU 2.0 - */ + * Transforms the string into a series of characters + * that can be compared with CollationKey.compare(). + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * + * @param source the source string. + * @param key the transformed key of the source string. + * @param status the error code status. + * @return the transformed key. + * @see CollationKey + * @stable ICU 2.0 + */ virtual CollationKey& getCollationKey(const UnicodeString& source, CollationKey& key, UErrorCode& status) const; /** - * Transforms a specified region of the string into a series of characters - * that can be compared with CollationKey.compare. Use a CollationKey when - * you need to do repeated comparisions on the same string. For a single - * comparison the compare method will be faster. - * @param source the source string. - * @param sourceLength the length of the source string. - * @param key the transformed key of the source string. - * @param status the error code status. - * @return the transformed key. - * @see CollationKey - * @stable ICU 2.0 - */ + * Transforms a specified region of the string into a series of characters + * that can be compared with CollationKey.compare. + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * + * @param source the source string. + * @param sourceLength the length of the source string. + * @param key the transformed key of the source string. + * @param status the error code status. + * @return the transformed key. + * @see CollationKey + * @stable ICU 2.0 + */ virtual CollationKey& getCollationKey(const UChar *source, int32_t sourceLength, CollationKey& key, @@ -609,6 +613,10 @@ public: /** * Get the sort key as an array of bytes from a UnicodeString. + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * * @param source string to be processed. * @param result buffer to store result in. If NULL, number of bytes needed * will be returned. @@ -622,6 +630,10 @@ public: /** * Get the sort key as an array of bytes from a UChar buffer. + * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * * @param source string to be processed. * @param sourceLength length of string to be processed. If -1, the string * is 0 terminated and length will be decided by the function. diff --git a/icu4c/source/i18n/unicode/ucol.h b/icu4c/source/i18n/unicode/ucol.h index 8919b094ff7..8aecc38c254 100644 --- a/icu4c/source/i18n/unicode/ucol.h +++ b/icu4c/source/i18n/unicode/ucol.h @@ -970,6 +970,9 @@ ucol_normalizeShortDefinitionString(const char *source, * Get a sort key for a string from a UCollator. * Sort keys may be compared using strcmp. * + * Note that sort keys are often less efficient than simply doing comparison. + * For more details, see the ICU User Guide. + * * Like ICU functions that write to an output buffer, the buffer contents * is undefined if the buffer capacity (resultLength parameter) is too small. * Unlike ICU functions that write a string to an output buffer,