| 
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
org.apache.lucene.search.similarities.IBSimilarity
public class IBSimilarity
Provides a framework for the family of information-based models, as described in Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241.
The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where
The framework described in the paper has many similarities to the DFR
 framework (see DFRSimilarity). It is possible that the two
 Similarities will be merged at one point.
To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Distribution: Probabilistic distribution used to
         model term occurrence
         DistributionLL: Log-logisticDistributionLL: Smoothed power-lawLambda: λw parameter of the
         probability distribution
         
     Normalization: Term frequency normalization 
         Any supported DFR normalization (listed in
                      DFRSimilarity)
     
DFRSimilarity| Nested Class Summary | 
|---|
| Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity | 
|---|
Similarity.ExactSimScorer, Similarity.SimWeight, Similarity.SloppySimScorer | 
| Field Summary | |
|---|---|
protected  Distribution | 
distribution
The probabilistic distribution used to model term occurrence.  | 
protected  Lambda | 
lambda
The lambda (λw) parameter.  | 
protected  Normalization | 
normalization
The term frequency normalization.  | 
| Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase | 
|---|
discountOverlaps | 
| Constructor Summary | |
|---|---|
IBSimilarity(Distribution distribution,
             Lambda lambda,
             Normalization normalization)
Creates IBSimilarity from the three components.  | 
|
| Method Summary | |
|---|---|
protected  void | 
explain(Explanation expl,
        BasicStats stats,
        int doc,
        float freq,
        float docLen)
Subclasses should implement this method to explain the score.  | 
 Distribution | 
getDistribution()
Returns the distribution  | 
 Lambda | 
getLambda()
Returns the distribution's lambda parameter  | 
 Normalization | 
getNormalization()
Returns the term frequency normalization  | 
protected  float | 
score(BasicStats stats,
      float freq,
      float docLen)
Scores the document doc. | 
 String | 
toString()
The name of IB methods follow the pattern IB <distribution> <lambda><normalization>. | 
| Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase | 
|---|
computeNorm, computeWeight, decodeNormValue, encodeNormValue, exactSimScorer, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, sloppySimScorer | 
| Methods inherited from class org.apache.lucene.search.similarities.Similarity | 
|---|
coord, queryNorm | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait | 
| Field Detail | 
|---|
protected final Distribution distribution
protected final Lambda lambda
protected final Normalization normalization
| Constructor Detail | 
|---|
public IBSimilarity(Distribution distribution,
                    Lambda lambda,
                    Normalization normalization)
 Note that null values are not allowed:
 if you want no normalization, instead pass 
 Normalization.NoNormalization.
distribution - probabilistic distribution modeling term occurrencelambda - distribution's λw parameternormalization - term frequency normalization| Method Detail | 
|---|
protected float score(BasicStats stats,
                      float freq,
                      float docLen)
SimilarityBasedoc.
 Subclasses must apply their scoring formula in this class.
score in class SimilarityBasestats - the corpus level statistics.freq - the term frequency.docLen - the document length.
protected void explain(Explanation expl,
                       BasicStats stats,
                       int doc,
                       float freq,
                       float docLen)
SimilarityBaseexpl
 already contains the score, the name of the class and the doc id, as well
 as the term frequency and its explanation; subclasses can add additional
 clauses to explain details of their scoring formulae.
 The default implementation does nothing.
explain in class SimilarityBaseexpl - the explanation to extend with details.stats - the corpus level statistics.doc - the document id.freq - the term frequency.docLen - the document length.public String toString()
IB <distribution> <lambda><normalization>. The name of the
 distribution is the same as in the original paper; for the names of lambda
 parameters, refer to the javadoc of the Lambda classes.
toString in class SimilarityBasepublic Distribution getDistribution()
public Lambda getLambda()
public Normalization getNormalization()
  | 
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||