org.apache.lucene.analysis.cjk
Class CJKTokenizer
java.lang.Object
  
org.apache.lucene.util.AttributeSource
      
org.apache.lucene.analysis.TokenStream
          
org.apache.lucene.analysis.Tokenizer
              
org.apache.lucene.analysis.cjk.CJKTokenizer
- All Implemented Interfaces: 
 - Closeable
 
Deprecated. Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.
@Deprecated
public final class CJKTokenizer
- extends Tokenizer
 
CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
 
  
 The tokens returned are every two adjacent characters with overlap match.
 
 
 Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".
 
 Additionally, the following is applied to Latin text (such as English):
 
 - Text is converted to lowercase.
 
 - Numeric digits, '+', '#', and '_' are tokenized as letters.
 
 - Full-width forms are converted to half-width forms.
 
 
 For more info on Asian language (Chinese, Japanese, and Korean) text segmentation:
 please search  google
 
 
 
| Fields inherited from class org.apache.lucene.analysis.Tokenizer | 
input | 
 
 
| 
Method Summary | 
 void | 
end()
 
          Deprecated.   | 
 boolean | 
incrementToken()
 
          Deprecated. Returns true for the next token in the stream, or false at EOS. | 
 void | 
reset()
 
          Deprecated.   | 
 
 
| Methods inherited from class org.apache.lucene.util.AttributeSource | 
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState | 
 
 
CJKTokenizer
public CJKTokenizer(Reader in)
- Deprecated. 
- Construct a token stream processing the given input.
- Parameters:
 in - I/O reader
  
CJKTokenizer
public CJKTokenizer(AttributeSource source,
                    Reader in)
- Deprecated. 
 
CJKTokenizer
public CJKTokenizer(AttributeSource.AttributeFactory factory,
                    Reader in)
- Deprecated. 
 
incrementToken
public boolean incrementToken()
                       throws IOException
- Deprecated. 
- Returns true for the next token in the stream, or false at EOS.
 See http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.UnicodeBlock.html
 for detail.
- Specified by:
 incrementToken in class TokenStream
 
- Returns:
 - false for end of stream, true otherwise
 - Throws:
 IOException - - throw IOException when read error 
         happened in the InputStream
 
  
end
public final void end()
- Deprecated. 
- Overrides:
 end in class TokenStream
 
 
 
reset
public void reset()
           throws IOException
- Deprecated. 
- Overrides:
 reset in class TokenStream
 
- Throws:
 IOException
 
 
          Copyright © 2000-2012 Apache Software Foundation.  All Rights Reserved.