org.apache.lucene.analysis.pattern
Class PatternTokenizerFactory
java.lang.Object
  
org.apache.lucene.analysis.util.AbstractAnalysisFactory
      
org.apache.lucene.analysis.util.TokenizerFactory
          
org.apache.lucene.analysis.pattern.PatternTokenizerFactory
public class PatternTokenizerFactory
- extends TokenizerFactory
 
Factory for PatternTokenizer.
 This tokenizer uses regex pattern matching to construct distinct tokens
 for the input stream.  It takes two arguments:  "pattern" and "group".
 
 
 - "pattern" is the regular expression.
 
 - "group" says which group to extract into tokens.
 
  
 
 group=-1 (the default) is equivalent to "split".  In this case, the tokens will
 be equivalent to the output from (without empty tokens):
 String.split(java.lang.String)
 
 
 Using group >= 0 selects the matching group as the token.  For example, if you have:
 
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
 the output will be two tokens: 'bbb' and 'ccc' (including the ' marks).  With the same input
 but using group=1, the output would be: bbb and ccc (no ' marks)
 
 NOTE: This Tokenizer does not output tokens that are of zero length.
 
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>
- Since:
 
  - solr1.2
 
- See Also:
 PatternTokenizer
 
 
 
 
 
| Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory | 
assureMatchVersion, getArgs, getBoolean, getBoolean, getInt, getInt, getInt, getLines, getLuceneMatchVersion, getPattern, getSnowballWordSet, getWordSet, setLuceneMatchVersion, splitFileNames | 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
PATTERN
public static final String PATTERN
- See Also:
 - Constant Field Values
 
GROUP
public static final String GROUP
- See Also:
 - Constant Field Values
 
pattern
protected Pattern pattern
group
protected int group
PatternTokenizerFactory
public PatternTokenizerFactory()
init
public void init(Map<String,String> args)
- Require a configured pattern
- Overrides:
 init in class AbstractAnalysisFactory
 
 
 
create
public Tokenizer create(Reader in)
- Split the input using configured pattern
- Specified by:
 create in class TokenizerFactory
 
 
 
          Copyright © 2000-2012 Apache Software Foundation.  All Rights Reserved.