org.apache.lucene.misc
Class SweetSpotSimilarity

java.lang.Object
  extended by org.apache.lucene.search.Similarity
      extended by org.apache.lucene.search.DefaultSimilarity
          extended by org.apache.lucene.misc.SweetSpotSimilarity
All Implemented Interfaces:
Serializable

public class SweetSpotSimilarity
extends DefaultSimilarity

A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.

For lengthNorm, A global min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.

A per field min/max can be specified if different fields have different sweet spots.

For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.

See Also:
A Gnuplot file used to generate some of the visualizations refrenced from each function., Serialized Form

Field Summary
 
Fields inherited from class org.apache.lucene.search.DefaultSimilarity
discountOverlaps
 
Fields inherited from class org.apache.lucene.search.Similarity
NO_DOC_ID_PROVIDED
 
Constructor Summary
SweetSpotSimilarity()
           
 
Method Summary
 float baselineTf(float freq)
          Implemented as: (x <= min) ? base : sqrt(x+(base**2)-min) ...but with a special case check for 0.
 float computeLengthNorm(String fieldName, int numTerms)
          Implemented as: 1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 ) .
 float computeNorm(String fieldName, FieldInvertState state)
          Implemented as state.getBoost() * lengthNorm(fieldName, numTokens) where numTokens does not count overlap tokens if discountOverlaps is true by default or true for this specific field.
 float hyperbolicTf(float freq)
          Uses a hyperbolic tangent function that allows for a hard max...
 void setBaselineTfFactors(float base, float min)
          Sets the baseline and minimum function variables for baselineTf
 void setHyperbolicTfFactors(float min, float max, double base, float xoffset)
          Sets the function variables for the hyperbolicTf functions
 void setLengthNormFactors(int min, int max, float steepness)
          Sets the default function variables used by lengthNorm when no field specific variables have been set.
 void setLengthNormFactors(String field, int min, int max, float steepness, boolean discountOverlaps)
          Sets the function variables used by lengthNorm for a specific named field.
 float tf(int freq)
          Delegates to baselineTf
 
Methods inherited from class org.apache.lucene.search.DefaultSimilarity
coord, getDiscountOverlaps, idf, queryNorm, setDiscountOverlaps, sloppyFreq, tf
 
Methods inherited from class org.apache.lucene.search.Similarity
decodeNorm, decodeNormValue, encodeNorm, encodeNormValue, getDefault, getNormDecoder, idfExplain, idfExplain, idfExplain, lengthNorm, scorePayload, setDefault
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SweetSpotSimilarity

public SweetSpotSimilarity()
Method Detail

setBaselineTfFactors

public void setBaselineTfFactors(float base,
                                 float min)
Sets the baseline and minimum function variables for baselineTf

See Also:
baselineTf(float)

setHyperbolicTfFactors

public void setHyperbolicTfFactors(float min,
                                   float max,
                                   double base,
                                   float xoffset)
Sets the function variables for the hyperbolicTf functions

Parameters:
min - the minimum tf value to ever be returned (default: 0.0)
max - the maximum tf value to ever be returned (default: 2.0)
base - the base value to be used in the exponential for the hyperbolic function (default: 1.3)
xoffset - the midpoint of the hyperbolic function (default: 10.0)
See Also:
hyperbolicTf(float)

setLengthNormFactors

public void setLengthNormFactors(int min,
                                 int max,
                                 float steepness)
Sets the default function variables used by lengthNorm when no field specific variables have been set.

See Also:
Similarity.lengthNorm(java.lang.String, int)

setLengthNormFactors

public void setLengthNormFactors(String field,
                                 int min,
                                 int max,
                                 float steepness,
                                 boolean discountOverlaps)
Sets the function variables used by lengthNorm for a specific named field.

Parameters:
field - field name
min - minimum value
max - maximum value
steepness - steepness of the curve
discountOverlaps - if true, numOverlapTokens will be subtracted from numTokens; if false then numOverlapTokens will be assumed to be 0 (see DefaultSimilarity.computeNorm(String, FieldInvertState) for details).
See Also:
Similarity.lengthNorm(java.lang.String, int)

computeNorm

public float computeNorm(String fieldName,
                         FieldInvertState state)
Implemented as state.getBoost() * lengthNorm(fieldName, numTokens) where numTokens does not count overlap tokens if discountOverlaps is true by default or true for this specific field.

Overrides:
computeNorm in class DefaultSimilarity
Parameters:
fieldName - field name
state - current processing state for this field
Returns:
the calculated float norm

computeLengthNorm

public float computeLengthNorm(String fieldName,
                               int numTerms)
Implemented as: 1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 ) .

This degrades to 1/sqrt(x) when min and max are both 1 and steepness is 0.5

:TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.

See Also:
setLengthNormFactors(int, int, float), An SVG visualization of this function

tf

public float tf(int freq)
Delegates to baselineTf

Overrides:
tf in class Similarity
Parameters:
freq - the frequency of a term within a document
Returns:
a score factor based on a term's within-document frequency
See Also:
baselineTf(float)

baselineTf

public float baselineTf(float freq)
Implemented as: (x <= min) ? base : sqrt(x+(base**2)-min) ...but with a special case check for 0.

This degrates to sqrt(x) when min and base are both 0

See Also:
setBaselineTfFactors(float, float), An SVG visualization of this function

hyperbolicTf

public float hyperbolicTf(float freq)
Uses a hyperbolic tangent function that allows for a hard max... tf(x)=min+(max-min)/2*(((base**(x-xoffset)-base**-(x-xoffset))/(base**(x-xoffset)+base**-(x-xoffset)))+1)

This code is provided as a convenience for subclasses that want to use a hyperbolic tf function.

See Also:
setHyperbolicTfFactors(float, float, double, float), An SVG visualization of this function