StringSearch
class StringSearch : SearchIterator
kotlin.Any | ||
↳ | android.icu.text.SearchIterator | |
↳ | android.icu.text.StringSearch |
StringSearch is a SearchIterator
that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator
object. StringSearch ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information.
There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and end [start, end].
A pattern string P matches a text string S at the offsets [start, end] if
option 1. Some canonical equivalent of P matches some canonical equivalent of S' option 2. P matches S' and if P starts or ends with a combining mark, there exists no non-ignorable combining mark before or after S? in S respectively.
This search has APIs similar to that of other text iteration mechanisms such as the break iterators in BreakIterator
. Using these APIs, it is easy to scan through text looking for all occurrences of a given pattern. This search iterator allows changing of direction by calling a reset
followed by a next
or previous
. Though a direction change can occur without calling reset
first, this operation comes with some speed penalty. Match results in the forward direction will match the result matches in the backwards direction in the reverse order
SearchIterator
provides APIs to specify the starting position within the text string to be searched, e.g. setIndex
, preceding
and following
. Since the starting position will be set as it is specified, please take note that there are some danger points at which the search may render incorrect results:
- In the midst of a substring that requires normalization.
- If the following match is to be found, the position should not be the second character which requires swapping with the preceding character. Vice versa, if the preceding match is to be found, the position to search from should not be the first character which requires swapping with the next character. E.g certain Thai and Lao characters require swapping.
- If a following pattern match is to be found, any position within a contracting sequence except the first will fail. Vice versa if a preceding pattern match is to be found, an invalid starting point would be any character within a contracting sequence except the last.
A BreakIterator
can be used if only matches at logical breaks are desired. Using a BreakIterator
will only give you results that exactly matches the boundaries given by the BreakIterator
. For instance the pattern "e" will not be found in the string "\u00e9" if a character break iterator is used.
Options are provided to handle overlapping matches. E.g. In English, overlapping matches produces the result 0 and 2 for the pattern "abab" in the text "ababab", where mutually exclusive matches only produces the result of 0.
Options are also provided to implement "asymmetric search" as described in UTS #10 Unicode Collation Algorithm, specifically the ElementComparisonType values.
Though collator attributes will be taken into consideration while performing matches, there are no APIs here for setting and getting the attributes. These attributes can be set by getting the collator from getCollator
and using the APIs in RuleBasedCollator
. Lastly to update StringSearch to the new collator attributes, reset
has to be called.
Restriction:
Currently there are no composite characters that consists of a character with combining class > 0 before a character with combining class == 0. However, if such a character exists in the future, StringSearch does not guarantee the results for option 1.
Consult the SearchIterator
documentation for information on and examples of how to use instances of this class to implement text searching.
Note, StringSearch is not to be subclassed.
Summary
Inherited constants | |
---|---|
Public constructors | |
---|---|
StringSearch(pattern: String!, target: CharacterIterator!, collator: RuleBasedCollator!, breakiter: BreakIterator!) Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text. |
|
StringSearch(pattern: String!, target: CharacterIterator!, collator: RuleBasedCollator!) Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text. |
|
StringSearch(pattern: String!, target: CharacterIterator!, locale: Locale!) Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text. |
|
StringSearch(pattern: String!, target: CharacterIterator!, locale: ULocale!) Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text. |
|
StringSearch(pattern: String!, target: String!) Initializes the iterator to use the language-specific rules and break iterator rules defined in the default locale to search for argument pattern in the argument target text. |
Public methods | |
---|---|
RuleBasedCollator! |
Gets the |
Int |
getIndex() Return the current index in the text being searched. |
String! |
Returns the pattern for which StringSearch is searching for. |
Boolean |
Determines whether canonical matches (option 1, as described in the class documentation) is set. |
Unit |
reset() Resets the iteration. |
Unit |
setCanonical(allowCanonical: Boolean) Set the canonical match mode. |
Unit |
setCollator(collator: RuleBasedCollator!) Sets the |
Unit |
Sets the position in the target text at which the next search will start. |
Unit |
setPattern(pattern: String!) Set the pattern to search for. |
Unit |
setTarget(text: CharacterIterator!) Set the target text to be searched. |
Protected methods | |
---|---|
Int |
handleNext(position: Int) Abstract method which subclasses override to provide the mechanism for finding the next match in the target text. |
Int |
handlePrevious(position: Int) Abstract method which subclasses override to provide the mechanism for finding the previous match in the target text. |
Inherited functions | |
---|---|
Inherited properties | |
---|---|
Public constructors
StringSearch
StringSearch(
pattern: String!,
target: CharacterIterator!,
collator: RuleBasedCollator!,
breakiter: BreakIterator!)
Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text. The argument breakiter
is used to define logical matches. See super class documentation for more details on the use of the target text and BreakIterator
.
Parameters | |
---|---|
pattern |
String!: text to look for. |
target |
CharacterIterator!: target text to search for pattern. |
collator |
RuleBasedCollator!: RuleBasedCollator that defines the language rules |
breakiter |
BreakIterator!: A BreakIterator that is used to determine the boundaries of a logical match. This argument can be null. |
Exceptions | |
---|---|
java.lang.IllegalArgumentException |
thrown when argument target is null, or of length 0 |
StringSearch
StringSearch(
pattern: String!,
target: CharacterIterator!,
collator: RuleBasedCollator!)
Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text. No BreakIterator
s are set to test for logical matches.
Parameters | |
---|---|
pattern |
String!: text to look for. |
target |
CharacterIterator!: target text to search for pattern. |
collator |
RuleBasedCollator!: RuleBasedCollator that defines the language rules |
Exceptions | |
---|---|
java.lang.IllegalArgumentException |
thrown when argument target is null, or of length 0 |
See Also
StringSearch
StringSearch(
pattern: String!,
target: CharacterIterator!,
locale: Locale!)
Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text.
Parameters | |
---|---|
pattern |
String!: text to look for. |
target |
CharacterIterator!: target text to search for pattern. |
locale |
Locale!: locale to use for language and break iterator rules |
Exceptions | |
---|---|
java.lang.IllegalArgumentException |
thrown when argument target is null, or of length 0. ClassCastException thrown if the collator for the specified locale is not a RuleBasedCollator. |
StringSearch
StringSearch(
pattern: String!,
target: CharacterIterator!,
locale: ULocale!)
Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text. See super class documentation for more details on the use of the target text and BreakIterator
.
Parameters | |
---|---|
pattern |
String!: text to look for. |
target |
CharacterIterator!: target text to search for pattern. |
locale |
ULocale!: locale to use for language and break iterator rules |
Exceptions | |
---|---|
java.lang.IllegalArgumentException |
thrown when argument target is null, or of length 0. ClassCastException thrown if the collator for the specified locale is not a RuleBasedCollator. |
StringSearch
StringSearch(
pattern: String!,
target: String!)
Initializes the iterator to use the language-specific rules and break iterator rules defined in the default locale to search for argument pattern in the argument target text.
Parameters | |
---|---|
pattern |
String!: text to look for. |
target |
String!: target text to search for pattern. |
Exceptions | |
---|---|
java.lang.IllegalArgumentException |
thrown when argument target is null, or of length 0. ClassCastException thrown if the collator for the default locale is not a RuleBasedCollator. |
Public methods
getCollator
fun getCollator(): RuleBasedCollator!
Gets the RuleBasedCollator
used for the language rules.
Since StringSearch depends on the returned RuleBasedCollator
, any changes to the RuleBasedCollator
result should follow with a call to either reset()
or setCollator(android.icu.text.RuleBasedCollator)
to ensure the correct search behavior.
Return | |
---|---|
RuleBasedCollator! |
RuleBasedCollator used by this StringSearch |
getIndex
fun getIndex(): Int
Return the current index in the text being searched. If the iteration has gone past the end of the text (or past the beginning for a backwards search), DONE
is returned.
Return | |
---|---|
Int |
current index in the text being searched. |
getPattern
fun getPattern(): String!
Returns the pattern for which StringSearch is searching for.
Return | |
---|---|
String! |
the pattern searched for |
isCanonical
fun isCanonical(): Boolean
Determines whether canonical matches (option 1, as described in the class documentation) is set. See setCanonical(boolean) for more information.
Return | |
---|---|
Boolean |
true if canonical matches is set, false otherwise |
See Also
reset
fun reset(): Unit
Resets the iteration. Search will begin at the start of the text string if a forward iteration is initiated before a backwards iteration. Otherwise if a backwards iteration is initiated before a forwards iteration, the search will begin at the end of the text string.
setCanonical
fun setCanonical(allowCanonical: Boolean): Unit
Set the canonical match mode. See class documentation for details. The default setting for this property is false.
Parameters | |
---|---|
allowCanonical |
Boolean: flag indicator if canonical matches are allowed |
See Also
setCollator
fun setCollator(collator: RuleBasedCollator!): Unit
Sets the RuleBasedCollator
to be used for language-specific searching.
The iterator's position will not be changed by this method.
Parameters | |
---|---|
collator |
RuleBasedCollator!: to use for this StringSearch |
Exceptions | |
---|---|
java.lang.IllegalArgumentException |
thrown when collator is null |
See Also
setIndex
fun setIndex(position: Int): Unit
Sets the position in the target text at which the next search will start. This method clears any previous match.
Parameters | |
---|---|
position |
Int: position from which to start the next search |
setPattern
fun setPattern(pattern: String!): Unit
Set the pattern to search for. The iterator's position will not be changed by this method.
Parameters | |
---|---|
pattern |
String!: for searching |
See Also
Exceptions | |
---|---|
java.lang.IllegalArgumentException |
thrown if pattern is null or of length 0 |
setTarget
fun setTarget(text: CharacterIterator!): Unit
Set the target text to be searched. Text iteration will then begin at the start of the text string. This method is useful if you want to reuse an iterator to search within a different body of text.
Parameters | |
---|---|
text |
CharacterIterator!: new text iterator to look for match, |
Protected methods
handleNext
protected fun handleNext(position: Int): Int
Abstract method which subclasses override to provide the mechanism for finding the next match in the target text. This allows different subclasses to provide different search algorithms.
If a match is found, the implementation should return the index at which the match starts and should call setMatchLength
with the number of characters in the target text that make up the match. If no match is found, the method should return DONE
.
Parameters | |
---|---|
start |
The index in the target text at which the search should start. |
Return | |
---|---|
Int |
index at which the match starts, else if match is not found DONE is returned |
handlePrevious
protected fun handlePrevious(position: Int): Int
Abstract method which subclasses override to provide the mechanism for finding the previous match in the target text. This allows different subclasses to provide different search algorithms.
If a match is found, the implementation should return the index at which the match starts and should call setMatchLength
with the number of characters in the target text that make up the match. If no match is found, the method should return DONE
.
Parameters | |
---|---|
startAt |
The index in the target text at which the search should start. |
Return | |
---|---|
Int |
index at which the match starts, else if match is not found DONE is returned |