org.apache.lucene.analysis
Class StopAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.ReusableAnalyzerBase
          extended by org.apache.lucene.analysis.StopwordAnalyzerBase
              extended by org.apache.lucene.analysis.StopAnalyzer
All Implemented Interfaces:
Closeable

public final class StopAnalyzer
extends StopwordAnalyzerBase

Filters LetterTokenizer with LowerCaseFilter and StopFilter.

You must specify the required Version compatibility when creating StopAnalyzer:


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
ReusableAnalyzerBase.TokenStreamComponents
 
Field Summary
static Set<?> ENGLISH_STOP_WORDS_SET
          An unmodifiable set containing some common English words that are not usually useful for searching.
 
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
 
Constructor Summary
StopAnalyzer(Version matchVersion)
          Builds an analyzer which removes words in ENGLISH_STOP_WORDS_SET.
StopAnalyzer(Version matchVersion, File stopwordsFile)
          Builds an analyzer with the stop words from the given file.
StopAnalyzer(Version matchVersion, Reader stopwords)
          Builds an analyzer with the stop words from the given reader.
StopAnalyzer(Version matchVersion, Set<?> stopWords)
          Builds an analyzer with the stop words from the given set.
 
Method Summary
protected  ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName, Reader reader)
          Creates ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.
 
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
 
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ENGLISH_STOP_WORDS_SET

public static final Set<?> ENGLISH_STOP_WORDS_SET
An unmodifiable set containing some common English words that are not usually useful for searching.

Constructor Detail

StopAnalyzer

public StopAnalyzer(Version matchVersion)
Builds an analyzer which removes words in ENGLISH_STOP_WORDS_SET.

Parameters:
matchVersion - See above

StopAnalyzer

public StopAnalyzer(Version matchVersion,
                    Set<?> stopWords)
Builds an analyzer with the stop words from the given set.

Parameters:
matchVersion - See above
stopWords - Set of stop words

StopAnalyzer

public StopAnalyzer(Version matchVersion,
                    File stopwordsFile)
             throws IOException
Builds an analyzer with the stop words from the given file.

Parameters:
matchVersion - See above
stopwordsFile - File to load stop words from
Throws:
IOException
See Also:
WordlistLoader.getWordSet(Reader, Version)

StopAnalyzer

public StopAnalyzer(Version matchVersion,
                    Reader stopwords)
             throws IOException
Builds an analyzer with the stop words from the given reader.

Parameters:
matchVersion - See above
stopwords - Reader to load stop words from
Throws:
IOException
See Also:
WordlistLoader.getWordSet(Reader, Version)
Method Detail

createComponents

protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                                      Reader reader)
Creates ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.

Specified by:
createComponents in class ReusableAnalyzerBase
Parameters:
fieldName - the name of the fields content passed to the ReusableAnalyzerBase.TokenStreamComponents sink as a reader
reader - the reader passed to the Tokenizer constructor
Returns:
ReusableAnalyzerBase.TokenStreamComponents built from a LowerCaseTokenizer filtered with StopFilter