TermPruningPolicy (Lucene 3.6.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.index.pruning
Class TermPruningPolicy

java.lang.Object
  org.apache.lucene.index.pruning.PruningPolicy
      org.apache.lucene.index.pruning.TermPruningPolicy

Direct Known Subclasses:: CarmelTopKTermPruningPolicy, CarmelUniformTermPruningPolicy, RIDFTermPruningPolicy, TFTermPruningPolicy

public abstract class TermPruningPolicy
extends PruningPolicy
extends PruningPolicy

Policy for producing smaller index out of an input index, by examining its terms and removing from the index some or all of their data as follows:

all terms of a certain field - see pruneAllFieldPostings(String)
all data of a certain term - see pruneTermEnum(TermEnum)
all positions of a certain term in a certain document - see #pruneAllPositions(TermPositions, Term)
some positions of a certain term in a certain document - see #pruneSomePositions(int, int[], Term)

The pruned, smaller index would, for many types of queries return nearly identical top-N results as compared with the original index, but with increased performance.

Pruning of indexes is handy for producing small first-tier indexes that fit completely in RAM, and store these indexes using IndexWriter.addIndexes(IndexReader...)

Interestingly, if the input index is optimized (i.e. doesn't contain deletions), then the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve internal document id-s so that they are in sync with the original index. This means that all other auxiliary information not necessary for first-tier processing, such as some stored fields, can also be removed, to be quickly retrieved on-demand from the original index using the same internal document id. See StorePruningPolicy for information about removing stored fields.

Please note that while this family of policies method produces good results for term queries it often leads to poor results for phrase queries (because postings are removed without considering whether they belong to an important phrase).

Aggressive pruning policies produce smaller indexes - search performance increases, and recall decreases (i.e. search quality deteriorates).

See the following papers for a discussion of this problem and the proposed solutions to improve the quality of a pruned index (not implemented here):

Pruned query evaluation using pre-computed impacts, V. Anh et al, ACM SIGIR 2006

A document-centric approach to static index pruning in text retrieval systems, S. Buettcher et al, ACM SIGIR 2006

Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee, A. Ntoulas et al, ACM SIGIR 2007.

Field Summary
`protected Map<String,Integer>`	`fieldFlags` Pruning operations to be conducted on fields.
`protected IndexReader`	`in`

Fields inherited from class org.apache.lucene.index.pruning.PruningPolicy
`DEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR`

Constructor Summary
`protected`	`TermPruningPolicy(IndexReader in, Map<String,Integer> fieldFlags)` Construct a policy.

Method Summary
`abstract void`	`initPositionsTerm(TermPositions in, Term t)` Called when moving `TermPositions` to a new `Term`.
`boolean`	`pruneAllFieldPostings(String field)` Pruning of all postings for a field
`abstract boolean`	`pruneAllPositions(TermPositions termPositions, Term t)` Prune all postings per term (invoked once per term per doc)
`boolean`	`prunePayload(TermPositions in, Term curTerm)` Called when checking for the presence of payload for the current term at a current position
`abstract int`	`pruneSomePositions(int docNum, int[] positions, Term curTerm)` Prune some postings per term (invoked once per term per doc).
`abstract boolean`	`pruneTermEnum(TermEnum te)` Pruning of all postings for a term (invoked once per term).
`abstract int`	`pruneTermVectorTerms(int docNumber, String field, String[] terms, int[] freqs, TermFreqVector v)` Pruning of individual terms in term vectors.
`boolean`	`pruneWholeTermVector(int docNumber, String field)` Term vector pruning.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

fieldFlags

protected Map<String,Integer> fieldFlags

Pruning operations to be conducted on fields.

in

protected IndexReader in

Constructor Detail