|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.search.Similarity org.apache.lucene.search.DefaultSimilarity org.apache.lucene.misc.SweetSpotSimilarity
public class SweetSpotSimilarity
A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.
For lengthNorm, A global min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.
A per field min/max can be specified if different fields have different sweet spots.
For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.
Field Summary |
---|
Fields inherited from class org.apache.lucene.search.DefaultSimilarity |
---|
discountOverlaps |
Fields inherited from class org.apache.lucene.search.Similarity |
---|
NO_DOC_ID_PROVIDED |
Constructor Summary | |
---|---|
SweetSpotSimilarity()
|
Method Summary | |
---|---|
float |
baselineTf(float freq)
Implemented as:
(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0. |
float |
computeLengthNorm(String fieldName,
int numTerms)
Implemented as:
1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
. |
float |
computeNorm(String fieldName,
FieldInvertState state)
Implemented as state.getBoost() *
lengthNorm(fieldName, numTokens) where
numTokens does not count overlap tokens if
discountOverlaps is true by default or true for this
specific field. |
float |
hyperbolicTf(float freq)
Uses a hyperbolic tangent function that allows for a hard max... |
void |
setBaselineTfFactors(float base,
float min)
Sets the baseline and minimum function variables for baselineTf |
void |
setHyperbolicTfFactors(float min,
float max,
double base,
float xoffset)
Sets the function variables for the hyperbolicTf functions |
void |
setLengthNormFactors(int min,
int max,
float steepness)
Sets the default function variables used by lengthNorm when no field specific variables have been set. |
void |
setLengthNormFactors(String field,
int min,
int max,
float steepness,
boolean discountOverlaps)
Sets the function variables used by lengthNorm for a specific named field. |
float |
tf(int freq)
Delegates to baselineTf |
Methods inherited from class org.apache.lucene.search.DefaultSimilarity |
---|
coord, getDiscountOverlaps, idf, queryNorm, setDiscountOverlaps, sloppyFreq, tf |
Methods inherited from class org.apache.lucene.search.Similarity |
---|
decodeNorm, decodeNormValue, encodeNorm, encodeNormValue, getDefault, getNormDecoder, idfExplain, idfExplain, idfExplain, lengthNorm, scorePayload, setDefault |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SweetSpotSimilarity()
Method Detail |
---|
public void setBaselineTfFactors(float base, float min)
baselineTf(float)
public void setHyperbolicTfFactors(float min, float max, double base, float xoffset)
min
- the minimum tf value to ever be returned (default: 0.0)max
- the maximum tf value to ever be returned (default: 2.0)base
- the base value to be used in the exponential for the hyperbolic function (default: 1.3)xoffset
- the midpoint of the hyperbolic function (default: 10.0)hyperbolicTf(float)
public void setLengthNormFactors(int min, int max, float steepness)
Similarity.lengthNorm(java.lang.String, int)
public void setLengthNormFactors(String field, int min, int max, float steepness, boolean discountOverlaps)
field
- field namemin
- minimum valuemax
- maximum valuesteepness
- steepness of the curvediscountOverlaps
- if true, numOverlapTokens
will be
subtracted from numTokens
; if false then
numOverlapTokens
will be assumed to be 0 (see
DefaultSimilarity.computeNorm(String, FieldInvertState)
for details).Similarity.lengthNorm(java.lang.String, int)
public float computeNorm(String fieldName, FieldInvertState state)
state.getBoost() *
lengthNorm(fieldName, numTokens)
where
numTokens does not count overlap tokens if
discountOverlaps is true by default or true for this
specific field.
computeNorm
in class DefaultSimilarity
fieldName
- field namestate
- current processing state for this field
public float computeLengthNorm(String fieldName, int numTerms)
1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
.
This degrades to 1/sqrt(x)
when min and max are both 1 and
steepness is 0.5
:TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.
setLengthNormFactors(int, int, float)
,
An SVG visualization of this functionpublic float tf(int freq)
tf
in class Similarity
freq
- the frequency of a term within a document
baselineTf(float)
public float baselineTf(float freq)
(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0.
This degrates to sqrt(x)
when min and base are both 0
setBaselineTfFactors(float, float)
,
An SVG visualization of this functionpublic float hyperbolicTf(float freq)
tf(x)=min+(max-min)/2*(((base**(x-xoffset)-base**-(x-xoffset))/(base**(x-xoffset)+base**-(x-xoffset)))+1)
This code is provided as a convenience for subclasses that want to use a hyperbolic tf function.
setHyperbolicTfFactors(float, float, double, float)
,
An SVG visualization of this function
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |