|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.AnalyzerWrapper
org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer
public final class QueryAutoStopWordAnalyzer
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection
which prevents very common words from being passed into queries.
For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer |
|---|
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents |
| Field Summary | |
|---|---|
static float |
defaultMaxDocFreqPercent
|
| Constructor Summary | |
|---|---|
QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent |
|
QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
Collection<String> fields,
float maxPercentDocs)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs |
|
QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
Collection<String> fields,
int maxDocFreq)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq |
|
QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
float maxPercentDocs)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs |
|
QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
int maxDocFreq)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq |
|
| Method Summary | |
|---|---|
Term[] |
getStopWords()
Provides information on which stop words have been identified for all fields |
String[] |
getStopWords(String fieldName)
Provides information on which stop words have been identified for a field |
protected Analyzer |
getWrappedAnalyzer(String fieldName)
|
protected Analyzer.TokenStreamComponents |
wrapComponents(String fieldName,
Analyzer.TokenStreamComponents components)
|
| Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper |
|---|
createComponents, getOffsetGap, getPositionIncrementGap, initReader |
| Methods inherited from class org.apache.lucene.analysis.Analyzer |
|---|
close, tokenStream |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final float defaultMaxDocFreqPercent
| Constructor Detail |
|---|
public QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader)
throws IOException
defaultMaxDocFreqPercent
matchVersion - Version to be used in StopFilterdelegate - Analyzer whose TokenStream will be filteredindexReader - IndexReader to identify the stopwords from
IOException - Can be thrown while reading from the IndexReader
public QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
int maxDocFreq)
throws IOException
matchVersion - Version to be used in StopFilterdelegate - Analyzer whose TokenStream will be filteredindexReader - IndexReader to identify the stopwords frommaxDocFreq - Document frequency terms should be above in order to be stopwords
IOException - Can be thrown while reading from the IndexReader
public QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
float maxPercentDocs)
throws IOException
matchVersion - Version to be used in StopFilterdelegate - Analyzer whose TokenStream will be filteredindexReader - IndexReader to identify the stopwords frommaxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which
contain a term, after which the word is considered to be a stop word
IOException - Can be thrown while reading from the IndexReader
public QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
Collection<String> fields,
float maxPercentDocs)
throws IOException
matchVersion - Version to be used in StopFilterdelegate - Analyzer whose TokenStream will be filteredindexReader - IndexReader to identify the stopwords fromfields - Selection of fields to calculate stopwords formaxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which
contain a term, after which the word is considered to be a stop word
IOException - Can be thrown while reading from the IndexReader
public QueryAutoStopWordAnalyzer(Version matchVersion,
Analyzer delegate,
IndexReader indexReader,
Collection<String> fields,
int maxDocFreq)
throws IOException
matchVersion - Version to be used in StopFilterdelegate - Analyzer whose TokenStream will be filteredindexReader - IndexReader to identify the stopwords fromfields - Selection of fields to calculate stopwords formaxDocFreq - Document frequency terms should be above in order to be stopwords
IOException - Can be thrown while reading from the IndexReader| Method Detail |
|---|
protected Analyzer getWrappedAnalyzer(String fieldName)
getWrappedAnalyzer in class AnalyzerWrapper
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName,
Analyzer.TokenStreamComponents components)
wrapComponents in class AnalyzerWrapperpublic String[] getStopWords(String fieldName)
fieldName - The field for which stop words identified in "addStopWords"
method calls will be returned
public Term[] getStopWords()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||