|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.analysis.Analyzer org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper
public final class ShingleAnalyzerWrapper
A ShingleAnalyzerWrapper wraps a ShingleFilter
around another Analyzer
.
A shingle is another name for a token based n-gram.
Constructor Summary | |
---|---|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
|
|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int maxShingleSize)
|
|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int minShingleSize,
int maxShingleSize)
|
|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int minShingleSize,
int maxShingleSize,
String tokenSeparator,
boolean outputUnigrams,
boolean outputUnigramsIfNoShingles)
Creates a new ShingleAnalyzerWrapper |
|
ShingleAnalyzerWrapper(Version matchVersion)
Wraps StandardAnalyzer . |
|
ShingleAnalyzerWrapper(Version matchVersion,
int minShingleSize,
int maxShingleSize)
Wraps StandardAnalyzer . |
Method Summary | |
---|---|
int |
getMaxShingleSize()
The max shingle (token ngram) size |
int |
getMinShingleSize()
The min shingle (token ngram) size |
String |
getTokenSeparator()
|
boolean |
isOutputUnigrams()
|
boolean |
isOutputUnigramsIfNoShingles()
|
TokenStream |
reusableTokenStream(String fieldName,
Reader reader)
Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method. |
void |
setMaxShingleSize(int maxShingleSize)
Deprecated. Setting maxShingleSize after Analyzer instantiation prevents reuse. Confgure maxShingleSize during construction. |
void |
setMinShingleSize(int minShingleSize)
Deprecated. Setting minShingleSize after Analyzer instantiation prevents reuse. Confgure minShingleSize during construction. |
void |
setOutputUnigrams(boolean outputUnigrams)
Deprecated. Setting outputUnigrams after Analyzer instantiation prevents reuse. Confgure outputUnigrams during construction. |
void |
setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)
Deprecated. Setting outputUnigramsIfNoShingles after Analyzer instantiation prevents reuse. Confgure outputUnigramsIfNoShingles during construction. |
void |
setTokenSeparator(String tokenSeparator)
Deprecated. Setting tokenSeparator after Analyzer instantiation prevents reuse. Confgure tokenSeparator during construction. |
TokenStream |
tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
---|
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles)
defaultAnalyzer
- Analyzer whose TokenStream is to be filteredminShingleSize
- Min shingle (token ngram) sizemaxShingleSize
- Max shingle sizetokenSeparator
- Used to separate input stream tokens in output shinglesoutputUnigrams
- Whether or not the filter shall pass the original
tokens to the output streamoutputUnigramsIfNoShingles
- Overrides the behavior of outputUnigrams==false for those
times when no shingles are available (because there are fewer than
minShingleSize tokens in the input stream)?
Note that if outputUnigrams==true, then unigrams are always output,
regardless of whether any shingles are available.public ShingleAnalyzerWrapper(Version matchVersion)
StandardAnalyzer
.
public ShingleAnalyzerWrapper(Version matchVersion, int minShingleSize, int maxShingleSize)
StandardAnalyzer
.
Method Detail |
---|
public int getMaxShingleSize()
@Deprecated public void setMaxShingleSize(int maxShingleSize)
maxShingleSize
- max shingle sizepublic int getMinShingleSize()
@Deprecated public void setMinShingleSize(int minShingleSize)
Set the min shingle size (default: 2).
This method requires that the passed in minShingleSize is not greater than maxShingleSize, so make sure that maxShingleSize is set before calling this method.
minShingleSize
- min size of output shinglespublic String getTokenSeparator()
@Deprecated public void setTokenSeparator(String tokenSeparator)
tokenSeparator
- used to separate input stream tokens in output shinglespublic boolean isOutputUnigrams()
@Deprecated public void setOutputUnigrams(boolean outputUnigrams)
outputUnigrams
- Whether or not the filter shall pass the original
tokens to the output streampublic boolean isOutputUnigramsIfNoShingles()
@Deprecated public void setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)
Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)
Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
outputUnigramsIfNoShingles
- Whether or not to output a single
unigram when no shingles are available.public TokenStream tokenStream(String fieldName, Reader reader)
Analyzer
tokenStream
in class Analyzer
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
Analyzer
reusableTokenStream
in class Analyzer
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |