| 
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.lucene.analysis.icu.segmentation.ICUTokenizerConfig
org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig
public class DefaultICUTokenizerConfig
Default ICUTokenizerConfig that is generally applicable
 to many languages.
 
 Generally tokenizes Unicode text according to UAX#29 
 (BreakIterator.getWordInstance(ULocale.ROOT)), 
 but with the following tailorings:
 
DictionaryBasedBreakIterator
   
| Field Summary | |
|---|---|
static String | 
WORD_HANGUL
Token type for words containing Korean hangul  | 
static String | 
WORD_HIRAGANA
Token type for words containing Japanese hiragana  | 
static String | 
WORD_IDEO
Token type for words containing ideographic characters  | 
static String | 
WORD_KATAKANA
Token type for words containing Japanese katakana  | 
static String | 
WORD_LETTER
Token type for words that contain letters  | 
static String | 
WORD_NUMBER
Token type for words that appear to be numbers  | 
| Constructor Summary | |
|---|---|
DefaultICUTokenizerConfig()
Creates a new config.  | 
|
| Method Summary | |
|---|---|
 com.ibm.icu.text.BreakIterator | 
getBreakIterator(int script)
Return a breakiterator capable of processing a given script.  | 
 String | 
getType(int script,
        int ruleStatus)
Return a token type value for a given script and BreakIterator rule status.  | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
public static final String WORD_IDEO
public static final String WORD_HIRAGANA
public static final String WORD_KATAKANA
public static final String WORD_HANGUL
public static final String WORD_LETTER
public static final String WORD_NUMBER
| Constructor Detail | 
|---|
public DefaultICUTokenizerConfig()
| Method Detail | 
|---|
public com.ibm.icu.text.BreakIterator getBreakIterator(int script)
ICUTokenizerConfig
getBreakIterator in class ICUTokenizerConfig
public String getType(int script,
                      int ruleStatus)
ICUTokenizerConfig
getType in class ICUTokenizerConfig
  | 
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||