QueryAutoStopWordAnalyzer (Lucene 4.0.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.query
Class QueryAutoStopWordAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      org.apache.lucene.analysis.AnalyzerWrapper
          org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer

All Implemented Interfaces:: Closeable

public final class QueryAutoStopWordAnalyzer
extends AnalyzerWrapper
extends AnalyzerWrapper

An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
`Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents`

Field Summary
`static float`	`defaultMaxDocFreqPercent`

Constructor Summary
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than `defaultMaxDocFreqPercent`
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

Method Summary
`Term[]`	`getStopWords()` Provides information on which stop words have been identified for all fields
`String[]`	`getStopWords(String fieldName)` Provides information on which stop words have been identified for a field
`protected Analyzer`	`getWrappedAnalyzer(String fieldName)`
`protected Analyzer.TokenStreamComponents`	`wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)`

Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
`createComponents, getOffsetGap, getPositionIncrementGap, initReader`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`close, tokenStream`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

defaultMaxDocFreqPercent

public static final float defaultMaxDocFreqPercent

See Also:: Constant Field Values

Constructor Detail

QueryAutoStopWordAnalyzer

public QueryAutoStopWordAnalyzer(Version matchVersion,
                                 Analyzer delegate,
                                 IndexReader indexReader)
                          throws IOException

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent

Parameters:: matchVersion - Version to be used in StopFilter; delegate - Analyzer whose TokenStream will be filtered; indexReader - IndexReader to identify the stopwords from
Throws:: IOException - Can be thrown while reading from the IndexReader

QueryAutoStopWordAnalyzer

public QueryAutoStopWordAnalyzer(Version matchVersion,
                                 Analyzer delegate,
                                 IndexReader indexReader,
                                 int maxDocFreq)
                          throws IOException

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

Parameters:: matchVersion - Version to be used in StopFilter; delegate - Analyzer whose TokenStream will be filtered; indexReader - IndexReader to identify the stopwords from; maxDocFreq - Document frequency terms should be above in order to be stopwords
Throws:: IOException - Can be thrown while reading from the IndexReader

QueryAutoStopWordAnalyzer

public QueryAutoStopWordAnalyzer(Version matchVersion,
                                 Analyzer delegate,
                                 IndexReader indexReader,
                                 float maxPercentDocs)
                          throws IOException

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs

Parameters:: matchVersion - Version to be used in StopFilter; delegate - Analyzer whose TokenStream will be filtered; indexReader - IndexReader to identify the stopwords from; maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
Throws:: IOException - Can be thrown while reading from the IndexReader

QueryAutoStopWordAnalyzer

public QueryAutoStopWordAnalyzer(Version matchVersion,
                                 Analyzer delegate,
                                 IndexReader indexReader,
                                 Collection<String> fields,
                                 float maxPercentDocs)
                          throws IOException

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs

Parameters:: matchVersion - Version to be used in StopFilter; delegate - Analyzer whose TokenStream will be filtered; indexReader - IndexReader to identify the stopwords from; fields - Selection of fields to calculate stopwords for; maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
Throws:: IOException - Can be thrown while reading from the IndexReader

QueryAutoStopWordAnalyzer

public QueryAutoStopWordAnalyzer(Version matchVersion,
                                 Analyzer delegate,
                                 IndexReader indexReader,
                                 Collection<String> fields,
                                 int maxDocFreq)
                          throws IOException

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq

Parameters:: matchVersion - Version to be used in StopFilter; delegate - Analyzer whose TokenStream will be filtered; indexReader - IndexReader to identify the stopwords from; fields - Selection of fields to calculate stopwords for; maxDocFreq - Document frequency terms should be above in order to be stopwords
Throws:: IOException - Can be thrown while reading from the IndexReader

Method Detail

getWrappedAnalyzer

protected Analyzer getWrappedAnalyzer(String fieldName)

Specified by:: getWrappedAnalyzer in class AnalyzerWrapper

wrapComponents

protected Analyzer.TokenStreamComponents wrapComponents(String fieldName,
                                                        Analyzer.TokenStreamComponents components)

Specified by:: wrapComponents in class AnalyzerWrapper

getStopWords

public String[] getStopWords(String fieldName)

Provides information on which stop words have been identified for a field

Parameters:: fieldName - The field for which stop words identified in "addStopWords" method calls will be returned
Returns:: the stop words identified for a field

getStopWords

public Term[] getStopWords()

Provides information on which stop words have been identified for all fields

Returns:: the stop words (as terms)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.query Class QueryAutoStopWordAnalyzer

defaultMaxDocFreqPercent

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

getWrappedAnalyzer

wrapComponents

getStopWords

getStopWords

org.apache.lucene.analysis.query
Class QueryAutoStopWordAnalyzer