CarmelTopKTermPruningPolicy (Lucene 3.6.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.index.pruning
Class CarmelTopKTermPruningPolicy

java.lang.Object
  org.apache.lucene.index.pruning.PruningPolicy
      org.apache.lucene.index.pruning.TermPruningPolicy
          org.apache.lucene.index.pruning.CarmelTopKTermPruningPolicy

public class CarmelTopKTermPruningPolicy
extends TermPruningPolicy
extends TermPruningPolicy

Pruning policy with a search quality parameterized guarantee - configuration of this policy allows to specify two parameters: k and ε such that:

For any OR query with r terms, the score of each of the top k results in the original index, should be "practically the same" as the score that document in the pruned index: the scores difference should not exceed r * ε.

See the following paper for more details about this method: Static index pruning for information retrieval systems, D. Carmel at al, ACM SIGIR 2001 .

The claim of this pruning technique is, quoting from the above paper:

Prune the index in such a way that a human "cannot distinguish the difference" between the results of a search engine whose index is pruned and one whose index is not pruned.

For indexes with a large number of terms this policy might be too slow. In such situations, the uniform pruning approach in CarmelUniformTermPruningPolicy will be faster, though it might produce inferior search quality, as that policy does not pose a theoretical guarantee on resulted search quality.

TODO implement also CarmelTermPruningDeltaTopPolicy

Nested Class Summary
`static class`	`CarmelTopKTermPruningPolicy.ByDocComparator`

Field Summary
`static float`	`DEFAULT_EPSILON` Default largest meaningless score difference
`static int`	`DEFAULT_R` Default number of query terms
`static int`	`DEFAULT_TOP_K` Default number of guaranteed top K scores

Fields inherited from class org.apache.lucene.index.pruning.TermPruningPolicy
`fieldFlags, in`

Fields inherited from class org.apache.lucene.index.pruning.PruningPolicy
`DEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR`

Constructor Summary
`CarmelTopKTermPruningPolicy(IndexReader in, Map<String,Integer> fieldFlags)` Constructor with default parameters
`CarmelTopKTermPruningPolicy(IndexReader in, Map<String,Integer> fieldFlags, int k, float epsilon, int r, Similarity sim)` Constructor with specific settings

Method Summary
`void`	`initPositionsTerm(TermPositions tp, Term t)` Called when moving `TermPositions` to a new `Term`.
`boolean`	`pruneAllPositions(TermPositions termPositions, Term t)` Prune all postings per term (invoked once per term per doc)
`int`	`pruneSomePositions(int docNum, int[] positions, Term curTerm)` Prune some postings per term (invoked once per term per doc).
`boolean`	`pruneTermEnum(TermEnum te)` Pruning of all postings for a term (invoked once per term).
`int`	`pruneTermVectorTerms(int docNumber, String field, String[] terms, int[] freqs, TermFreqVector tfv)` Pruning of individual terms in term vectors.

Methods inherited from class org.apache.lucene.index.pruning.TermPruningPolicy
`pruneAllFieldPostings, prunePayload, pruneWholeTermVector`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

DEFAULT_TOP_K

public static final int DEFAULT_TOP_K

Default number of guaranteed top K scores

See Also:: Constant Field Values

DEFAULT_R

public static final int DEFAULT_R

Default number of query terms

See Also:: Constant Field Values

DEFAULT_EPSILON

public static final float DEFAULT_EPSILON

Default largest meaningless score difference

See Also:: Constant Field Values

Constructor Detail

CarmelTopKTermPruningPolicy

public CarmelTopKTermPruningPolicy(IndexReader in,
                                   Map<String,Integer> fieldFlags)

Constructor with default parameters

See Also:: DEFAULT_TOP_K, DEFAULT_EPSILON, DEFAULT_R, DefaultSimilarity, CarmelTopKTermPruningPolicy(IndexReader, Map, int, float, int, Similarity)

CarmelTopKTermPruningPolicy

public CarmelTopKTermPruningPolicy(IndexReader in,
                                   Map<String,Integer> fieldFlags,
                                   int k,
                                   float epsilon,
                                   int r,
                                   Similarity sim)

Constructor with specific settings

Parameters:: in - reader for original index; k - number of guaranteed top scores. Each top K results in the pruned index is either also an original top K result or its original score is indistinguishable from some original top K result.; epsilon - largest meaningless score difference Results whose scores difference is smaller or equal to epsilon are considered indistinguishable.; r - maximal number of terms in a query for which search quaility in pruned index is guaranteed; sim - similarity to use when selecting top docs fir each index term. When null, DefaultSimilarity is used.

Method Detail