From 5799b7184975e6c5768bf4f5e57516bbad85d9cc Mon Sep 17 00:00:00 2001
From: Yoshito Umaoka SearchIterator is an abstract base class that defines a protocol
- * for text searching. Subclasses provide concrete implementations of
- * various search algorithms. A concrete subclass, StringSearch, is
- * provided that implements language-sensitive pattern matching based
- * on the comparison rules defined in a RuleBasedCollator
- * object. Instances of SearchIterator maintain a current position and
- * scan over the target text, returning the indices where a match is
- * found and the length of each match. Generally, the sequence of forward
- * matches will be equivalent to the sequence of backward matches.One
- * case where this statement may not hold is when non-overlapping mode
- * is set on and there are continuous repetitive patterns in the text.
- * Consider the case searching for pattern "aba" in the text
- * "ababababa", setting overlapping mode off will produce forward matches
- * at offsets 0, 4. However when a backwards search is done, the
- * results will be at offsets 6 and 2. If matches searched for have boundary restrictions. BreakIterators
- * can be used to define the valid boundaries of such a match. Once a
- * BreakIterator is set, potential matches will be tested against the
- * BreakIterator to determine if the boundaries are valid and that all
- * characters in the potential match are equivalent to the pattern
- * searched for. For example, looking for the pattern "fox" in the text
- * "foxy fox" will produce match results at offset 0 and 5 with length 3
- * if no BreakIterators were set. However if a WordBreakIterator is set,
- * the only match that would be found will be at the offset 5. Since,
- * the SearchIterator guarantees that if a BreakIterator is set, all its
- * matches will match the given pattern exactly, a potential match that
- * passes the BreakIterator might still not produce a valid match. For
- * instance the pattern "e" will not be found in the string
- * "\u00e9" (latin small letter e with acute) if a
- * CharacterBreakIterator is used. Even though "e" is
- * a part of the character "\u00e9" and the potential match at
- * offset 0 length 1 passes the CharacterBreakIterator test, "\u00e9"
- * is not equivalent to "e", hence the SearchIterator rejects the potential
- * match. By default, the SearchIterator
- * does not impose any boundary restriction on the matches, it will
- * return all results that match the pattern. Illustrating with the
- * above example, "e" will
- * be found in the string "\u00e9" if no BreakIterator is
- * specified. SearchIterator also provides a means to handle overlapping
- * matches via the API setOverlapping(boolean). For example, if
- * overlapping mode is set, searching for the pattern "abab" in the
- * text "ababab" will match at positions 0 and 2, whereas if
- * overlapping is not set, SearchIterator will only match at position
- * 0. By default, overlapping mode is not set. The APIs in SearchIterator are similar to that of other text
- * iteration classes such as BreakIterator. Using this class, it is
- * easy to scan through text looking for all occurances of a
- * match.
- * Example of use:
+ * Other options for searching includes using a BreakIterator to restrict
+ * the points at which matches are detected.
+ *
+ * SearchIterator provides an API that is similar to that of
+ * other text iteration classes such as BreakIterator. Using
+ * this class, it is easy to scan through text looking for all occurances of
+ * a given pattern. The following example uses a StringSearch
+ * object to find all instances of "fox" in the target string. Any other
+ * subclass of SearchIterator can be used in an identical
+ * manner.
+ *
- *
+ * SearchIterator defines a protocol for text searching.
+ * Subclasses provide concrete implementations of various search algorithms.
+ * For example, StringSearch implements language-sensitive pattern
+ * matching based on the comparison rules defined in a
+ * RuleBasedCollator object.
+ *
- *
* String target = "The quick brown fox jumped over the lazy fox";
* String pattern = "fox";
* SearchIterator iter = new StringSearch(pattern, target);
- * for (int pos = iter.first(); pos != SearchIterator.DONE;
- * pos = iter.next()) {
- * // println matches at offset 16 and 41 with length 3
- * System.out.println("Found match at " + pos + ", length is "
- * + iter.getMatchLength());
- * }
- * target = "ababababa";
- * pattern = "aba";
- * iter.setTarget(new StringCharacterIterator(pattern));
- * iter.setOverlapping(false);
- * System.out.println("Overlapping mode set to false");
- * System.out.println("Forward matches of pattern " + pattern + " in text "
- * + text + ": ");
- * for (int pos = iter.first(); pos != SearchIterator.DONE;
- * pos = iter.next()) {
- * // println matches at offset 0 and 4 with length 3
- * System.out.println("offset " + pos + ", length "
- * + iter.getMatchLength());
+ * for (int pos = iter.first(); pos != SearchIterator.DONE;
+ * pos = iter.next()) {
+ * System.out.println("Found match at " + pos +
+ * ", length is " + iter.getMatchLength());
* }
- * System.out.println("Backward matches of pattern " + pattern + " in text "
- * + text + ": ");
- * for (int pos = iter.last(); pos != SearchIterator.DONE;
- * pos = iter.previous()) {
- * // println matches at offset 6 and 2 with length 3
- * System.out.println("offset " + pos + ", length "
- * + iter.getMatchLength());
- * }
- * System.out.println("Overlapping mode set to true");
- * System.out.println("Index set to 2");
- * iter.setIndex(2);
- * iter.setOverlapping(true);
- * System.out.println("Forward matches of pattern " + pattern + " in text "
- * + text + ": ");
- * for (int pos = iter.first(); pos != SearchIterator.DONE;
- * pos = iter.next()) {
- * // println matches at offset 2, 4 and 6 with length 3
- * System.out.println("offset " + pos + ", length "
- * + iter.getMatchLength());
- * }
- * System.out.println("Index set to 2");
- * iter.setIndex(2);
- * System.out.println("Backward matches of pattern " + pattern + " in text "
- * + text + ": ");
- * for (int pos = iter.last(); pos != SearchIterator.DONE;
- * pos = iter.previous()) {
- * // println matches at offset 0 with length 3
- * System.out.println("offset " + pos + ", length "
- * + iter.getMatchLength());
- * }
- *
* Determines whether overlapping matches are returned. See the class * documentation for more information about overlapping matches. - *
** The default setting of this property is false - *
+ * * @param allowOverlap flag indicator if overlapping matches are allowed * @see #isOverlapping * @stable ICU 2.8 */ - public void setOverlapping(boolean allowOverlap) - { + public void setOverlapping(boolean allowOverlap) { search_.isOverlap_ = allowOverlap; } - + /** - * Set the BreakIterator that is used to restrict the points at which - * matches are detected. - * Using null as the parameter is legal; it means that break - * detection should not be attempted. - * See class documentation for more information. + * Set the BreakIterator that will be used to restrict the points + * at which matches are detected. + * * @param breakiter A BreakIterator that will be used to restrict the - * points at which matches are detected. - * @see #getBreakIterator + * points at which matches are detected. If a match is + * found, but the match's start or end index is not a + * boundary as determined by the {@link BreakIterator}, + * the match will be rejected and another will be searched + * for. If this parameter is null, no break + * detection is attempted. * @see BreakIterator * @stable ICU 2.0 */ - public void setBreakIterator(BreakIterator breakiter) - { + public void setBreakIterator(BreakIterator breakiter) { search_.setBreakIter(breakiter); if (search_.breakIter() != null) { // Create a clone of CharacterItearator, so it won't @@ -313,8 +236,9 @@ public abstract class SearchIterator /** * Set the target text to be searched. Text iteration will then begin at - * the start of the text string. This method is useful if you want to + * the start of the text string. This method is useful if you want to * reuse an iterator to search within a different body of text. + * * @param text new text iterator to look for match, * @exception IllegalArgumentException thrown when text is null or has * 0 length @@ -343,128 +267,103 @@ public abstract class SearchIterator } } - //TODO: We should add APIs below to match ICU4C APIs + //TODO: We may add APIs below to match ICU4C APIs // setCanonicalMatch - // setElementComparison // public getters ---------------------------------------------------- - + /** - *- * Returns the index of the most recent match in the target text. - * This call returns a valid result only after a successful call to - * {@link #first}, {@link #next}, {@link #previous}, or {@link #last}. - * Just after construction, or after a searching method returns - * DONE, this method will return DONE. - *
- *- * Use getMatchLength to get the length of the matched text. - * getMatchedText will return the subtext in the searched - * target text from index getMatchStart() with length getMatchLength(). - *
- * @return index to a substring within the text string that is being - * searched. - * @see #getMatchLength - * @see #getMatchedText - * @see #first - * @see #next - * @see #previous - * @see #last - * @see #DONE - * @stable ICU 2.8 - */ - public int getMatchStart() - { + * Returns the index to the match in the text string that was searched. + * This call returns a valid result only after a successful call to + * {@link #first}, {@link #next}, {@link #previous}, or {@link #last}. + * Just after construction, or after a searching method returns + * {@link #DONE}, this method will return {@link #DONE}. + *+ * Use {@link #getMatchLength} to get the matched string length. + * + * @return index of a substring within the text string that is being + * searched. + * @see #first + * @see #next + * @see #previous + * @see #last + * @stable ICU 2.0 + */ + public int getMatchStart() { return search_.matchedIndex_; } /** - * Return the index in the target text at which the iterator is currently - * positioned. - * If the iteration has gone past the end of the target text, or past - * the beginning for a backwards search, {@link #DONE} is returned. - * @return index in the target text at which the iterator is currently - * positioned. + * Return the current index in the text being searched. + * If the iteration has gone past the end of the text + * (or past the beginning for a backwards search), {@link #DONE} + * is returned. + * + * @return current index in the text being searched. * @stable ICU 2.8 - * @see #first - * @see #next - * @see #previous - * @see #last - * @see #DONE */ public abstract int getIndex(); - + /** - *
- * Returns the length of the most recent match in the target text. - * This call returns a valid result only after a successful - * call to {@link #first}, {@link #next}, {@link #previous}, or - * {@link #last}. - * Just after construction, or after a searching method returns - * DONE, this method will return 0. See getMatchStart() for - * more details. - *
- * @return The length of the most recent match in the target text, or 0 if - * there is no match. - * @see #getMatchStart - * @see #getMatchedText + * Returns the length of text in the string which matches the search + * pattern. This call returns a valid result only after a successful call + * to {@link #first}, {@link #next}, {@link #previous}, or {@link #last}. + * Just after construction, or after a searching method returns + * {@link #DONE}, this method will return 0. + * + * @return The length of the match in the target text, or 0 if there + * is no match currently. * @see #first * @see #next * @see #previous * @see #last - * @see #DONE * @stable ICU 2.0 */ - public int getMatchLength() - { + public int getMatchLength() { return search_.matchedLength(); } - + /** * Returns the BreakIterator that is used to restrict the indexes at which * matches are detected. This will be the same object that was passed to - * the constructor or tosetBreakIterator
.
- * If the BreakIterator has not been set, null will be returned.
- * See setBreakIterator for more information.
+ * the constructor or to {@link #setBreakIterator}.
+ * If the {@link BreakIterator} has not been set, null will be returned.
+ * See {@link #setBreakIterator} for more information.
+ *
* @return the BreakIterator set to restrict logic matches
* @see #setBreakIterator
* @see BreakIterator
* @stable ICU 2.0
*/
- public BreakIterator getBreakIterator()
- {
+ public BreakIterator getBreakIterator() {
return search_.breakIter();
}
-
+
/**
- * Return the target text that is being searched.
- * @return target text being searched.
- * @see #setTarget
+ * Return the string text to be searched.
+ * @return text string to be searched.
* @stable ICU 2.0
*/
- public CharacterIterator getTarget()
- {
+ public CharacterIterator getTarget() {
return search_.text();
}
-
+
/**
* Returns the text that was matched by the most recent call to
- * {@link #first}, {@link #next}, {@link #previous}, or {@link #last}.
- * If the iterator is not pointing at a valid match, for instance just
- * after construction or after DONE has been returned, an empty
- * String will be returned. See getMatchStart for more information
- * @see #getMatchStart
- * @see #getMatchLength
+ * {@link #first}, {@link #next}, {@link #previous}, or {@link #last}.
+ * If the iterator is not pointing at a valid match (e.g. just after
+ * construction or after {@link #DONE} has been returned,
+ * returns an empty string.
+ *
+ * @return the substring in the target test of the most recent match,
+ * or null if there is no match currently.
* @see #first
* @see #next
* @see #previous
* @see #last
- * @see #DONE
- * @return the substring in the target text of the most recent match
* @stable ICU 2.0
*/
- public String getMatchedText()
- {
+ public String getMatchedText() {
if (search_.matchedLength() > 0) {
int limit = search_.matchedIndex_ + search_.matchedLength();
StringBuilder result = new StringBuilder(search_.matchedLength());
@@ -481,31 +380,22 @@ public abstract class SearchIterator
}
// miscellaneous public methods -----------------------------------------
-
+
/**
- * Search forwards in the target text for the next valid match,
- * starting the search from the current iterator position. The iterator is
- * adjusted so that its current index, as returned by {@link #getIndex},
- * is the starting position of the match if one was found. If a match is
- * found, the index of the match is returned, otherwise DONE is
- * returned. If overlapping mode is set, the beginning of the found match
- * can be before the end of the current match, if any.
- * @return The starting index of the next forward match after the current
- * iterator position, or
- * DONE if there are no more matches.
- * @see #getMatchStart
- * @see #getMatchLength
- * @see #getMatchedText
- * @see #following
- * @see #preceding
- * @see #previous
- * @see #first
- * @see #last
- * @see #DONE
+ * Returns the index of the next point at which the text matches the
+ * search pattern, starting from the current position
+ * The iterator is adjusted so that its current index (as returned by
+ * {@link #getIndex}) is the match position if one was found.
+ * If a match is not found, {@link #DONE} will be returned and
+ * the iterator will be adjusted to a position after the end of the text
+ * string.
+ *
+ * @return The index of the next match after the current position,
+ * or {@link #DONE} if there are no more matches.
+ * @see #getIndex
* @stable ICU 2.0
*/
- public int next()
- {
+ public int next() {
int index = getIndex(); // offset = getOffset() in ICU4C
int matchindex = search_.matchedIndex_;
int matchlength = search_.matchedLength();
@@ -545,29 +435,19 @@ public abstract class SearchIterator
}
/**
- * Search backwards in the target text for the next valid match,
- * starting the search from the current iterator position. The iterator is
- * adjusted so that its current index, as returned by {@link #getIndex},
- * is the starting position of the match if one was found. If a match is
- * found, the index is returned, otherwise DONE is returned. If
- * overlapping mode is set, the end of the found match can be after the
- * beginning of the previous match, if any.
- * @return The starting index of the next backwards match after the current
- * iterator position, or
- * DONE if there are no more matches.
- * @see #getMatchStart
- * @see #getMatchLength
- * @see #getMatchedText
- * @see #following
- * @see #preceding
- * @see #next
- * @see #first
- * @see #last
- * @see #DONE
+ * Returns the index of the previous point at which the string text
+ * matches the search pattern, starting at the current position.
+ * The iterator is adjusted so that its current index (as returned by
+ * {@link #getIndex}) is the match position if one was found.
+ * If a match is not found, {@link #DONE} will be returned and
+ * the iterator will be adjusted to the index {@link #DONE}.
+ *
+ * @return The index of the previous match before the current position,
+ * or {@link #DONE} if there are no more matches.
+ * @see #getIndex
* @stable ICU 2.0
*/
- public int previous()
- {
+ public int previous() {
int index; // offset in ICU4C
if (search_.reset_) {
index = search_.endIndex(); // m_search_->textLength in ICU4C
@@ -611,34 +491,29 @@ public abstract class SearchIterator
/**
* Return true if the overlapping property has been set.
- * See setOverlapping(boolean) for more information.
+ * See {@link #setOverlapping(boolean)} for more information.
+ *
* @see #setOverlapping
* @return true if the overlapping property has been set, false otherwise
* @stable ICU 2.8
*/
- public boolean isOverlapping()
- {
+ public boolean isOverlapping() {
return search_.isOverlap_;
}
- //TODO: We should add APIs below to match ICU4C APIs
+ //TODO: We may add APIs below to match ICU4C APIs
// isCanonicalMatch
- // getElementComparison
/**
- * - * Resets the search iteration. All properties will be reset to their - * default values. - *
- *- * If a forward iteration is initiated, the next search will begin at the - * start of the target text. Otherwise, if a backwards iteration is initiated, - * the next search will begin at the end of the target text. - *
- * @stable ICU 2.8 - */ - public void reset() - { + * Resets the iteration. + * Search will begin at the start of the text string if a forward + * iteration is initiated before a backwards iteration. Otherwise if a + * backwards iteration is initiated before a forwards iteration, the + * search will begin at the end of the text string. + * + * @stable ICU 2.0 + */ + public void reset() { setMatchNotFound(); setIndex(search_.beginIndex()); search_.isOverlap_ = false; @@ -647,112 +522,103 @@ public abstract class SearchIterator search_.isForwardSearching_ = true; search_.reset_ = true; } - + /** - * Return the index of the first forward match in the target text. - * This method sets the iteration to begin at the start of the - * target text and searches forward from there. - * @return The index of the first forward match, orDONE
- * if there are no matches.
- * @see #getMatchStart
- * @see #getMatchLength
- * @see #getMatchedText
- * @see #following
- * @see #preceding
- * @see #next
- * @see #previous
- * @see #last
- * @see #DONE
+ * Returns the first index at which the string text matches the search
+ * pattern. The iterator is adjusted so that its current index (as
+ * returned by {@link #getIndex()}) is the match position if one
+ *
+ * was found.
+ * If a match is not found, {@link #DONE} will be returned and
+ * the iterator will be adjusted to the index {@link #DONE}.
+ * @return The character index of the first match, or
+ * {@link #DONE} if there are no matches.
+ *
+ * @see #getIndex
* @stable ICU 2.0
*/
- public final int first()
- {
+ public final int first() {
int startIdx = search_.beginIndex();
setIndex(startIdx);
return handleNext(startIdx);
}
/**
- * Return the index of the first forward match in target text that
- * is at or after argument position.
- * This method sets the iteration to begin at the specified
- * position in the the target text and searches forward from there.
- * @return The index of the first forward match, or DONE
- * if there are no matches.
- * @see #getMatchStart
- * @see #getMatchLength
- * @see #getMatchedText
- * @see #first
- * @see #preceding
- * @see #next
- * @see #previous
- * @see #last
- * @see #DONE
+ * Returns the first index equal or greater than position at which the
+ * string text matches the search pattern. The iterator is adjusted so
+ * that its current index (as returned by {@link #getIndex()}) is the
+ * match position if one was found.
+ * If a match is not found, {@link #DONE} will be returned and the
+ * iterator will be adjusted to the index {@link #DONE}.
+ *
+ * @param position where search if to start from.
+ * @return The character index of the first match following
+ * position, or {@link #DONE} if there are no matches.
+ * @throws IndexOutOfBoundsException If position is less than or greater
+ * than the text range for searching.
+ * @see #getIndex
* @stable ICU 2.0
*/
- public final int following(int position)
- {
+ public final int following(int position) {
setIndex(position);
return handleNext(position);
}
-
+
/**
- * Return the index of the first backward match in target text.
- * This method sets the iteration to begin at the end of the
- * target text and searches backwards from there.
- * @return The starting index of the first backward match, or
- * DONE
if there are no matches.
- * @see #getMatchStart
- * @see #getMatchLength
- * @see #getMatchedText
- * @see #first
- * @see #preceding
- * @see #next
- * @see #previous
- * @see #following
- * @see #DONE
+ * Returns the last index in the target text at which it matches the
+ * search pattern. The iterator is adjusted so that its current index
+ * (as returned by {@link #getIndex}) is the match position if one was
+ * found.
+ * If a match is not found, {@link #DONE} will be returned and
+ * the iterator will be adjusted to the index {@link #DONE}.
+ *
+ * @return The index of the first match, or {@link #DONE} if
+ * there are no matches.
+ * @see #getIndex
* @stable ICU 2.0
*/
- public final int last()
- {
+ public final int last() {
int endIdx = search_.endIndex();
setIndex(endIdx);
return handlePrevious(endIdx);
}
-
+
/**
- * Return the index of the first backwards match in target
- * text that ends at or before argument position.
- * This method sets the iteration to begin at the argument
- * position index of the target text and searches backwards from there.
- * @return The starting index of the first backwards match, or
- * DONE
- * if there are no matches.
- * @see #getMatchStart
- * @see #getMatchLength
- * @see #getMatchedText
- * @see #first
- * @see #following
- * @see #next
- * @see #previous
- * @see #last
- * @see #DONE
+ * Returns the first index less than position at which the string
+ * text matches the search pattern. The iterator is adjusted so that its
+ * current index (as returned by {@link #getIndex}) is the match
+ * position if one was found. If a match is not found,
+ * {@link #DONE} will be returned and the iterator will be
+ * adjusted to the index {@link #DONE}
+ * + * When the overlapping option ({@link #isOverlapping}) is off, the last index of the + * result match is always less than position. + * When the overlapping option is on, the result match may span across + * position. + * + * @param position where search is to start from. + * @return The character index of the first match preceding + * position, or {@link #DONE} if there are + * no matches. + * @throws IndexOutOfBoundsException If position is less than or greater than + * the text range for searching + * @see #getIndex * @stable ICU 2.0 */ - public final int preceding(int position) - { + public final int preceding(int position) { setIndex(position); return handlePrevious(position); } // protected constructor ---------------------------------------------- - + /** * Protected constructor for use by subclasses. * Initializes the iterator with the argument target text for searching * and sets the BreakIterator. * See class documentation for more details on the use of the target text - * and BreakIterator. + * and {@link BreakIterator}. + * * @param target The target text to be searched. * @param breaker A {@link BreakIterator} that is used to determine the * boundaries of a logical match. This argument can be null. @@ -790,7 +656,8 @@ public abstract class SearchIterator /** * Sets the length of the most recent match in the target text. * Subclasses' handleNext() and handlePrevious() methods should call this - * after they find a match in the target text. + * after they find a match in the target text. + * * @param length new length to set * @see #handleNext * @see #handlePrevious @@ -802,50 +669,41 @@ public abstract class SearchIterator } /** + * Abstract method which subclasses override to provide the mechanism + * for finding the next match in the target text. This allows different + * subclasses to provide different search algorithms. *
- * Abstract method that subclasses override to provide the mechanism - * for finding the next forwards match in the target text. This - * allows different subclasses to provide different search algorithms. - *
- *- * If a match is found, this function must call setMatchLength(int) to - * set the length of the result match. - * The iterator is adjusted so that its current index, as returned by - * {@link #getIndex}, is the starting position of the match if one was - * found. If a match is not found, DONE will be returned. - *
- * @param start index in the target text at which the forwards search - * should begin. - * @return the starting index of the next forwards match if found, DONE - * otherwise - * @see #setMatchLength(int) - * @see #handlePrevious(int) - * @see #DONE + * If a match is found, the implementation should return the index at + * which the match starts and should call + * {@link #setMatchLength} with the number of characters + * in the target text that make up the match. If no match is found, the + * method should return {@link #DONE}. + * + * @param start The index in the target text at which the search + * should start. + * @return index at which the match starts, else if match is not found + * {@link #DONE} is returned + * @see #setMatchLength * @stable ICU 2.0 */ protected abstract int handleNext(int start); - + /** + * Abstract method which subclasses override to provide the mechanism for + * finding the previous match in the target text. This allows different + * subclasses to provide different search algorithms. *- * Abstract method which subclasses override to provide the mechanism - * for finding the next backwards match in the target text. - * This allows different - * subclasses to provide different search algorithms. - *
- *- * If a match is found, this function must call setMatchLength(int) to - * set the length of the result match. - * The iterator is adjusted so that its current index, as returned by - * {@link #getIndex}, is the starting position of the match if one was - * found. If a match is not found, DONE will be returned. - *
- * @param startAt index in the target text at which the backwards search - * should begin. - * @return the starting index of the next backwards match if found, - * DONE otherwise - * @see #setMatchLength(int) - * @see #handleNext(int) - * @see #DONE + * If a match is found, the implementation should return the index at + * which the match starts and should call + * {@link #setMatchLength} with the number of characters + * in the target text that make up the match. If no match is found, the + * method should return {@link #DONE}. + * + * @param startAt The index in the target text at which the search + * should start. + * @return index at which the match starts, else if match is not found + * {@link #DONE} is returned + * @see #setMatchLength * @stable ICU 2.0 */ protected abstract int handlePrevious(int startAt); @@ -878,16 +736,16 @@ public abstract class SearchIterator */ STANDARD_ELEMENT_COMPARISON, /** - *Collation element comparison is modified to effectively provide behavior - * between the specified strength and strength - 1.
- * - *Collation elements in the pattern that have the base weight for the specified + * Collation element comparison is modified to effectively provide behavior + * between the specified strength and strength - 1. + *
+ * Collation elements in the pattern that have the base weight for the specified * strength are treated as "wildcards" that match an element with any other * weight at that collation level in the searched text. For example, with a * secondary-strength English collator, a plain 'e' in the pattern will match * a plain e or an e with any diacritic in the searched text, but an e with * diacritic in the pattern will only match an e with the same diacritic in - * the searched text.
+ * the searched text. * * @draft ICU 53 * @provisional This API might change or be removed in a future release. @@ -895,16 +753,16 @@ public abstract class SearchIterator PATTERN_BASE_WEIGHT_IS_WILDCARD, /** - *
Collation element comparison is modified to effectively provide behavior - * between the specified strength and strength - 1.
- * - *Collation elements in either the pattern or the searched text that have the + * Collation element comparison is modified to effectively provide behavior + * between the specified strength and strength - 1. + *
+ * Collation elements in either the pattern or the searched text that have the * base weight for the specified strength are treated as "wildcards" that match * an element with any other weight at that collation level. For example, with * a secondary-strength English collator, a plain 'e' in the pattern will match * a plain e or an e with any diacritic in the searched text, but an e with * diacritic in the pattern will only match an e with the same diacritic or a - * plain e in the searched text.
+ * plain e in the searched text. * * @draft ICU 53 * @provisional This API might change or be removed in a future release. @@ -913,9 +771,9 @@ public abstract class SearchIterator } /** - *Sets the collation element comparison type.
- * - *The default comparison type is {@link ElementComparisonType#STANDARD_ELEMENT_COMPARISON}.
+ * Sets the collation element comparison type. + *+ * The default comparison type is {@link ElementComparisonType#STANDARD_ELEMENT_COMPARISON}. * * @see ElementComparisonType * @see #getElementComparisonType() @@ -927,7 +785,7 @@ public abstract class SearchIterator } /** - *
Returns the collation element comparison type.
+ * Returns the collation element comparison type. * * @see ElementComparisonType * @see #setElementComparisonType(ElementComparisonType) diff --git a/icu4j/main/classes/collate/src/com/ibm/icu/text/StringSearch.java b/icu4j/main/classes/collate/src/com/ibm/icu/text/StringSearch.java index ed6ccf57172..c106297364e 100644 --- a/icu4j/main/classes/collate/src/com/ibm/icu/text/StringSearch.java +++ b/icu4j/main/classes/collate/src/com/ibm/icu/text/StringSearch.java @@ -14,150 +14,111 @@ import com.ibm.icu.util.ICUException; import com.ibm.icu.util.ULocale; // Java porting note: -// ICU4C implementation contains dead code in many places. +// +// ICU4C implementation contains dead code in many places. // While porting ICU4C linear search implementation, these dead codes // were not fully ported. The code block tagged by "// *** Boyer-Moore ***" // are those dead code, still available in ICU4C. -//TODO: ICU4C implementation does not seem to handle UCharacterIterator pointing +// ICU4C implementation does not seem to handle UCharacterIterator pointing // a fragment of text properly. ICU4J uses CharacterIterator to navigate through // the input text. We need to carefully review the code ported from ICU4C // assuming the start index is 0. -//TODO: ICU4C implementation initializes pattern.CE and pattern.PCE. It looks +// ICU4C implementation initializes pattern.CE and pattern.PCE. It looks // CE is no longer used, except a few places checking CELength. It looks this // is a left over from already disable Boyer-Moore search code. This Java implementation // preserves the code, but we should clean them up later. -//TODO: We need to update document to remove the term "Boyer-Moore search". - -/** - *
- * StringSearch
is the concrete subclass of
- * SearchIterator
that provides language-sensitive text searching
- * based on the comparison rules defined in a {@link RuleBasedCollator} object.
- *
- * StringSearch
uses a version of the fast Boyer-Moore search
- * algorithm that has been adapted to work with the large character set of
- * Unicode. Refer to
- *
- * "Efficient Text Searching in Java", published in the
- * Java Report on February, 1999, for further information on the
- * algorithm.
- *
- * Users are also strongly encouraged to read the section on - * - * String Search and - * - * Collation in the user guide before attempting to use this class. - *
- *- * String searching becomes a little complicated when accents are encountered at - * match boundaries. If a match is found and it has preceding or trailing - * accents not part of the match, the result returned will include the - * preceding accents up to the first base character, if the pattern searched - * for starts an accent. Likewise, - * if the pattern ends with an accent, all trailing accents up to the first - * base character will be included in the result. - *
- *- * For example, if a match is found in target text "a\u0325\u0300" for - * the pattern - * "a\u0325", the result returned by StringSearch will be the index 0 and - * length 3 <0, 3>. If a match is found in the target - * "a\u0325\u0300" - * for the pattern "\u0300", then the result will be index 1 and length 2 - * <1, 2>. - *
- *- * In the case where the decomposition mode is on for the RuleBasedCollator, - * all matches that starts or ends with an accent will have its results include - * preceding or following accents respectively. For example, if pattern "a" is - * looked for in the target text "á\u0325", the result will be - * index 0 and length 2 <0, 2>. - *
- *- * The StringSearch class provides two options to handle accent matching - * described below: - *
+/** + * + * StringSearch is a {@link SearchIterator} that provides + * language-sensitive text searching based on the comparison rules defined + * in a {@link RuleBasedCollator} object. + * StringSearch ensures that language eccentricity can be + * handled, e.g. for the German collator, characters ß and SS will be matched + * if case is chosen to be ignored. + * See the + * "ICU Collation Design Document" for more information. *
- * Let S' be the sub-string of a text string S between the offsets start and
- * end <start, end>.
- *
- * A pattern string P matches a text string S at the offsets <start,
- * length>
+ * There are 2 match options for selection:
+ * Let S' be the sub-string of a text string S between the offsets start and
+ * end [start, end].
*
+ * A pattern string P matches a text string S at the offsets [start, end]
* if
*
- * option 1. P matches some canonical equivalent string of S'. Suppose the - * RuleBasedCollator used for searching has a collation strength of - * TERTIARY, all accents are non-ignorable. If the pattern - * "a\u0300" is searched in the target text - * "a\u0325\u0300", - * a match will be found, since the target text is canonically - * equivalent to "a\u0300\u0325" - * option 2. P matches S' and if P starts or ends with a combining mark, - * there exists no non-ignorable combining mark before or after S' - * in S respectively. Following the example above, the pattern - * "a\u0300" will not find a match in "a\u0325\u0300", - * since - * there exists a non-ignorable accent '\u0325' in the middle of - * 'a' and '\u0300'. Even with a target text of - * "a\u0300\u0325" a match will not be found because of the - * non-ignorable trailing accent \u0325. + * option 1. Some canonical equivalent of P matches some canonical equivalent + * of S' + * option 2. P matches S' and if P starts or ends with a combining mark, + * there exists no non-ignorable combining mark before or after S? + * in S respectively. *- * Option 2. will be the default mode for dealing with boundary accents unless - * specified via the API setCanonical(boolean). - * One restriction is to be noted for option 1. Currently there are no - * composite characters that consists of a character with combining class > 0 - * before a character with combining class == 0. However, if such a character - * exists in the future, the StringSearch may not work correctly with option 1 - * when such characters are encountered. - * + * Option 2. will be the default. *
- * SearchIterator provides APIs to specify the starting position - * within the text string to be searched, e.g. setIndex, - * preceding and following. Since the starting position will - * be set as it is specified, please take note that there are some dangerous - * positions which the search may render incorrect results: + * This search has APIs similar to that of other text iteration mechanisms + * such as the break iterators in {@link BreakIterator}. Using these + * APIs, it is easy to scan through text looking for all occurrences of + * a given pattern. This search iterator allows changing of direction by + * calling a {@link #reset} followed by a {@link #next} or {@link #previous}. + * Though a direction change can occur without calling {@link #reset} first, + * this operation comes with some speed penalty. + * Match results in the forward direction will match the result matches in + * the backwards direction in the reverse order + *
+ * {@link SearchIterator} provides APIs to specify the starting position + * within the text string to be searched, e.g. {@link SearchIterator#setIndex setIndex}, + * {@link SearchIterator#preceding preceding} and {@link SearchIterator#following following}. Since the + * starting position will be set as it is specified, please take note that + * there are some danger points which the search may render incorrect + * results: *
- * Though collator attributes will be taken into consideration while - * performing matches, there are no APIs provided in StringSearch for setting - * and getting the attributes. These attributes can be set by getting the - * collator from getCollator and using the APIs in - * com.ibm.icu.text.Collator. To update StringSearch to the new - * collator attributes, reset() or - * setCollator(RuleBasedCollator) has to be called. - *
+ * A {@link BreakIterator} can be used if only matches at logical breaks are desired. + * Using a {@link BreakIterator} will only give you results that exactly matches the + * boundaries given by the {@link BreakIterator}. For instance the pattern "e" will + * not be found in the string "\u00e9" if a character break iterator is used. *
- * Consult the
- *
- * String Search user guide and the SearchIterator
- * documentation for more information and examples of use.
- *
+ * Though collator attributes will be taken into consideration while + * performing matches, there are no APIs here for setting and getting the + * attributes. These attributes can be set by getting the collator + * from {@link #getCollator} and using the APIs in {@link RuleBasedCollator}. + * Lastly to update StringSearch to the new collator attributes, + * {@link #reset} has to be called. + *
+ * Restriction:
+ * Currently there are no composite characters that consists of a
+ * character with combining class > 0 before a character with combining
+ * class == 0. However, if such a character exists in the future,
+ * StringSearch does not guarantee the results for option 1.
+ *
+ * Consult the {@link SearchIterator} documentation for information on + * and examples of how to use instances of this class to implement text + * searching. *
- * This class is not subclassable + * Note, StringSearch is not to be subclassed. *
* @see SearchIterator * @see RuleBasedCollator * @author Laura Werner, synwee - * @stable ICU 2.0 + * @since ICU 2.0 */ // internal notes: all methods do not guarantee the correct status of the // characteriterator. the caller has to maintain the original index position @@ -165,8 +126,9 @@ import com.ibm.icu.util.ULocale; public final class StringSearch extends SearchIterator { /** - * DONE is returned by previous() and next() after all valid matches have - * been returned, and by first() and last() if there are no matches at all. + * DONE is returned by {@link #previous()} and {@link #next()} after all valid matches have + * been returned, and by {@link SearchIterator#first() first()} and + * {@link SearchIterator#last() last()} if there are no matches at all. * @see #previous * @see #next * @stable ICU 2.0 @@ -198,19 +160,18 @@ public final class StringSearch extends SearchIterator { /** * Initializes the iterator to use the language-specific rules defined in * the argument collator to search for argument pattern in the argument - * target text. The argument breakiter is used to define logical matches. + * target text. The argumentbreakiter
is used to define logical matches.
* See super class documentation for more details on the use of the target
- * text and BreakIterator.
+ * text and {@link BreakIterator}.
* @param pattern text to look for.
* @param target target text to search for pattern.
- * @param collator RuleBasedCollator that defines the language rules
+ * @param collator {@link RuleBasedCollator} that defines the language rules
* @param breakiter A {@link BreakIterator} that is used to determine the
* boundaries of a logical match. This argument can be null.
- * @exception IllegalArgumentException thrown when argument target is null,
+ * @throws IllegalArgumentException thrown when argument target is null,
* or of length 0
* @see BreakIterator
* @see RuleBasedCollator
- * @see SearchIterator
* @stable ICU 2.0
*/
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator,
@@ -259,14 +220,13 @@ public final class StringSearch extends SearchIterator {
/**
* Initializes the iterator to use the language-specific rules defined in
* the argument collator to search for argument pattern in the argument
- * target text. No BreakIterators are set to test for logical matches.
+ * target text. No {@link BreakIterator}s are set to test for logical matches.
* @param pattern text to look for.
* @param target target text to search for pattern.
- * @param collator RuleBasedCollator that defines the language rules
- * @exception IllegalArgumentException thrown when argument target is null,
+ * @param collator {@link RuleBasedCollator} that defines the language rules
+ * @throws IllegalArgumentException thrown when argument target is null,
* or of length 0
* @see RuleBasedCollator
- * @see SearchIterator
* @stable ICU 2.0
*/
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator) {
@@ -277,17 +237,12 @@ public final class StringSearch extends SearchIterator {
* Initializes the iterator to use the language-specific rules and
* break iterator rules defined in the argument locale to search for
* argument pattern in the argument target text.
- * See super class documentation for more details on the use of the target
- * text and BreakIterator.
* @param pattern text to look for.
* @param target target text to search for pattern.
* @param locale locale to use for language and break iterator rules
- * @exception IllegalArgumentException thrown when argument target is null,
+ * @throws IllegalArgumentException thrown when argument target is null,
* or of length 0. ClassCastException thrown if the collator for
* the specified locale is not a RuleBasedCollator.
- * @see BreakIterator
- * @see RuleBasedCollator
- * @see SearchIterator
* @stable ICU 2.0
*/
public StringSearch(String pattern, CharacterIterator target, Locale locale) {
@@ -299,11 +254,11 @@ public final class StringSearch extends SearchIterator {
* break iterator rules defined in the argument locale to search for
* argument pattern in the argument target text.
* See super class documentation for more details on the use of the target
- * text and BreakIterator.
+ * text and {@link BreakIterator}.
* @param pattern text to look for.
* @param target target text to search for pattern.
- * @param locale ulocale to use for language and break iterator rules
- * @exception IllegalArgumentException thrown when argument target is null,
+ * @param locale locale to use for language and break iterator rules
+ * @throws IllegalArgumentException thrown when argument target is null,
* or of length 0. ClassCastException thrown if the collator for
* the specified locale is not a RuleBasedCollator.
* @see BreakIterator
@@ -318,17 +273,12 @@ public final class StringSearch extends SearchIterator {
/**
* Initializes the iterator to use the language-specific rules and
* break iterator rules defined in the default locale to search for
- * argument pattern in the argument target text.
- * See super class documentation for more details on the use of the target
- * text and BreakIterator.
+ * argument pattern in the argument target text.
* @param pattern text to look for.
* @param target target text to search for pattern.
- * @exception IllegalArgumentException thrown when argument target is null,
+ * @throws IllegalArgumentException thrown when argument target is null,
* or of length 0. ClassCastException thrown if the collator for
* the default locale is not a RuleBasedCollator.
- * @see BreakIterator
- * @see RuleBasedCollator
- * @see SearchIterator
* @stable ICU 2.0
*/
public StringSearch(String pattern, String target) {
@@ -337,17 +287,14 @@ public final class StringSearch extends SearchIterator {
}
/**
+ * Gets the {@link RuleBasedCollator} used for the language rules.
* - * Gets the RuleBasedCollator used for the language rules. - *
- *- * Since StringSearch depends on the returned RuleBasedCollator, any - * changes to the RuleBasedCollator result should follow with a call to - * either StringSearch.reset() or - * StringSearch.setCollator(RuleBasedCollator) to ensure the correct - * search behaviour. + * Since StringSearch depends on the returned {@link RuleBasedCollator}, any + * changes to the {@link RuleBasedCollator} result should follow with a call to + * either {@link #reset()} or {@link #setCollator(RuleBasedCollator)} to ensure the correct + * search behavior. *
- * @return RuleBasedCollator used by this StringSearch + * @return {@link RuleBasedCollator} used by this StringSearch * @see RuleBasedCollator * @see #setCollator * @stable ICU 2.0 @@ -357,15 +304,11 @@ public final class StringSearch extends SearchIterator { } /** + * Sets the {@link RuleBasedCollator} to be used for language-specific searching. *- * Sets the RuleBasedCollator to be used for language-specific searching. - *
- *- * This method causes internal data such as Boyer-Moore shift tables - * to be recalculated, but the iterator's position is unchanged. - *
- * @param collator to use for this StringSearch - * @exception IllegalArgumentException thrown when collator is null + * The iterator's position will not be changed by this method. + * @param collator to use for this StringSearch + * @throws IllegalArgumentException thrown when collator is null * @see #getCollator * @stable ICU 2.0 */ @@ -390,7 +333,7 @@ public final class StringSearch extends SearchIterator { } /** - * Returns the pattern for which StringSearch is searching for. + * Returns the pattern for which StringSearch is searching for. * @return the pattern searched for * @stable ICU 2.0 */ @@ -399,13 +342,8 @@ public final class StringSearch extends SearchIterator { } /** - ** Set the pattern to search for. - *
- *- * This method causes internal data such as Boyer-Moore shift tables - * to be recalculated, but the iterator's position is unchanged. - *
+ * The iterator's position will not be changed by this method. * @param pattern for searching * @see #getPattern * @exception IllegalArgumentException thrown if pattern is null or of @@ -435,10 +373,8 @@ public final class StringSearch extends SearchIterator { } /** - ** Set the canonical match mode. See class documentation for details. * The default setting for this property is false. - *
* @param allowCanonical flag indicator if canonical matches are allowed * @see #isCanonical * @stable ICU 2.8 @@ -449,13 +385,7 @@ public final class StringSearch extends SearchIterator { } /** - * Set the target text to be searched. Text iteration will hence begin at - * the start of the text string. This method is useful if you want to - * re-use an iterator to search within a different body of text. - * @param text new text iterator to look for match, - * @exception IllegalArgumentException thrown when text is null or has - * 0 length - * @see #getTarget + * {@inheritDoc} * @stable ICU 2.8 */ @Override @@ -465,12 +395,7 @@ public final class StringSearch extends SearchIterator { } /** - * Return the index in the target text where the iterator is currently - * positioned at. - * If the iteration has gone past the end of the target text or past - * the beginning for a backwards search, {@link #DONE} is returned. - * @return index in the target text where the iterator is currently - * positioned at + * {@inheritDoc} * @stable ICU 2.8 */ @Override @@ -483,23 +408,7 @@ public final class StringSearch extends SearchIterator { } /** - *- * Sets the position in the target text which the next search will start - * from to the argument. This method clears all previous states. - *
- *- * This method takes the argument position and sets the position in the - * target text accordingly, without checking if position is pointing to a - * valid starting point to begin searching. - *
- *- * Search positions that may render incorrect results are highlighted in - * the class documentation. - *
- * @param position index to start next search from. - * @exception IndexOutOfBoundsException thrown if argument position is out - * of the target text range. - * @see #getIndex + * {@inheritDoc} * @stable ICU 2.8 */ @Override @@ -513,19 +422,7 @@ public final class StringSearch extends SearchIterator { } /** - *- * Resets the search iteration. All properties will be reset to the - * default value. - *
- *- * Search will begin at the start of the target text if a forward iteration - * is initiated before a backwards iteration. Otherwise if a - * backwards iteration is initiated before a forwards iteration, the search - * will begin at the end of the target text. - *
- *- * Canonical match option will be reset to false, ie an exact match. - *
+ * {@inheritDoc} * @stable ICU 2.8 */ @Override @@ -581,17 +478,7 @@ public final class StringSearch extends SearchIterator { } /** - *- * Concrete method to provide the mechanism - * for finding the next forwards match in the target text. - * See super class documentation for its use. - *
- * @param position index in the target text at which the forwards search - * should begin. - * @return the starting index of the next forwards match if found, DONE - * otherwise - * @see #handlePrevious(int) - * @see #DONE + * {@inheritDoc} * @stable ICU 2.8 */ @Override @@ -641,17 +528,7 @@ public final class StringSearch extends SearchIterator { } /** - *- * Concrete method to provide the mechanism - * for finding the next backwards match in the target text. - * See super class documentation for its use. - *
- * @param position index in the target text at which the backwards search - * should begin. - * @return the starting index of the next backwards match if found, DONE - * otherwise - * @see #handleNext(int) - * @see #DONE + * {@inheritDoc} * @stable ICU 2.8 */ @Override -- 2.40.0