org.apache.lucene.analysis.icu.segmentation
Class ICUTokenizer
java.lang.Object
  
org.apache.lucene.util.AttributeSource
      
org.apache.lucene.analysis.TokenStream
          
org.apache.lucene.analysis.Tokenizer
              
org.apache.lucene.analysis.icu.segmentation.ICUTokenizer
- All Implemented Interfaces: 
 - Closeable
 
public final class ICUTokenizer
- extends Tokenizer
 
Breaks text into words according to UAX #29: Unicode Text Segmentation
 (http://www.unicode.org/reports/tr29/)
 
 Words are broken across script boundaries, then segmented according to
 the BreakIterator and typing provided by the ICUTokenizerConfig
 
- See Also:
 ICUTokenizerConfig- WARNING: This API is experimental and might change in incompatible ways in the next release.
 
  
 
 
 
| Fields inherited from class org.apache.lucene.analysis.Tokenizer | 
input | 
 
| 
Constructor Summary | 
ICUTokenizer(Reader input)
 
          Construct a new ICUTokenizer that breaks text into words from the given
 Reader. | 
ICUTokenizer(Reader input,
             ICUTokenizerConfig config)
 
          Construct a new ICUTokenizer that breaks text into words from the given
 Reader, using a tailored BreakIterator configuration. | 
 
 
 
| Methods inherited from class org.apache.lucene.util.AttributeSource | 
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState | 
 
 
ICUTokenizer
public ICUTokenizer(Reader input)
- Construct a new ICUTokenizer that breaks text into words from the given
 Reader.
 
 The default script-specific handling is used.
- Parameters:
 input - Reader containing text to tokenize.- See Also:
 DefaultICUTokenizerConfig
 
ICUTokenizer
public ICUTokenizer(Reader input,
                    ICUTokenizerConfig config)
- Construct a new ICUTokenizer that breaks text into words from the given
 Reader, using a tailored BreakIterator configuration.
- Parameters:
 input - Reader containing text to tokenize.config - Tailored BreakIterator configuration
 
incrementToken
public boolean incrementToken()
                       throws IOException
- Specified by:
 incrementToken in class TokenStream
 
- Throws:
 IOException
 
reset
public void reset()
           throws IOException
- Overrides:
 reset in class TokenStream
 
- Throws:
 IOException
 
end
public void end()
- Overrides:
 end in class TokenStream
 
 
          Copyright © 2000-2012 Apache Software Foundation.  All Rights Reserved.