org.apache.lucene.analysis.icu.segmentation
Class ICUTokenizerConfig
java.lang.Object
org.apache.lucene.analysis.icu.segmentation.ICUTokenizerConfig
- Direct Known Subclasses:
- DefaultICUTokenizerConfig
public abstract class ICUTokenizerConfig
- extends Object
Class that allows for tailored Unicode Text Segmentation on
a per-writing system basis.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
Method Summary |
abstract com.ibm.icu.text.BreakIterator |
getBreakIterator(int script)
Return a breakiterator capable of processing a given script. |
abstract String |
getType(int script,
int ruleStatus)
Return a token type value for a given script and BreakIterator
rule status. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ICUTokenizerConfig
public ICUTokenizerConfig()
getBreakIterator
public abstract com.ibm.icu.text.BreakIterator getBreakIterator(int script)
- Return a breakiterator capable of processing a given script.
getType
public abstract String getType(int script,
int ruleStatus)
- Return a token type value for a given script and BreakIterator
rule status.