org.apache.lucene.analysis.pattern
Class PatternTokenizer
java.lang.Object
  
org.apache.lucene.util.AttributeSource
      
org.apache.lucene.analysis.TokenStream
          
org.apache.lucene.analysis.Tokenizer
              
org.apache.lucene.analysis.pattern.PatternTokenizer
- All Implemented Interfaces: 
 - Closeable
 
public final class PatternTokenizer
- extends Tokenizer
 
This tokenizer uses regex pattern matching to construct distinct tokens
 for the input stream.  It takes two arguments:  "pattern" and "group".
 
 
 - "pattern" is the regular expression.
 
 - "group" says which group to extract into tokens.
 
  
 
 group=-1 (the default) is equivalent to "split".  In this case, the tokens will
 be equivalent to the output from (without empty tokens):
 String.split(java.lang.String)
 
 
 Using group >= 0 selects the matching group as the token.  For example, if you have:
 
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
 the output will be two tokens: 'bbb' and 'ccc' (including the ' marks).  With the same input
 but using group=1, the output would be: bbb and ccc (no ' marks)
 
 NOTE: This Tokenizer does not output tokens that are of zero length.
- See Also:
 Pattern
 
 
 
| Fields inherited from class org.apache.lucene.analysis.Tokenizer | 
input | 
 
| 
Constructor Summary | 
PatternTokenizer(Reader input,
                 Pattern pattern,
                 int group)
 
          creates a new PatternTokenizer returning tokens from group (-1 for split functionality) | 
 
 
 
| Methods inherited from class org.apache.lucene.util.AttributeSource | 
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState | 
 
 
PatternTokenizer
public PatternTokenizer(Reader input,
                        Pattern pattern,
                        int group)
                 throws IOException
- creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
- Throws:
 IOException
 
incrementToken
public boolean incrementToken()
- Specified by:
 incrementToken in class TokenStream
 
 
end
public void end()
- Overrides:
 end in class TokenStream
 
 
reset
public void reset()
           throws IOException
- Overrides:
 reset in class TokenStream
 
- Throws:
 IOException
 
          Copyright © 2000-2012 Apache Software Foundation.  All Rights Reserved.