|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Class Summary | |
|---|---|
| CJKAnalyzer | An Analyzer that tokenizes text with StandardTokenizer,
normalizes content with CJKWidthFilter, folds case with
LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter,
and filters stopwords with StopFilter |
| CJKBigramFilter | Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer. |
| CJKTokenizer | Deprecated. Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead. |
| CJKWidthFilter | A TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||