org.apache.lucene.index.pruning
Class TFTermPruningPolicy

java.lang.Object
  extended by org.apache.lucene.index.pruning.PruningPolicy
      extended by org.apache.lucene.index.pruning.TermPruningPolicy
          extended by org.apache.lucene.index.pruning.TFTermPruningPolicy

public class TFTermPruningPolicy
extends TermPruningPolicy

Policy for producing smaller index out of an input index, by removing postings data for those terms where their in-document frequency is below a specified threshold.

Larger threshold value will produce a smaller index. See TermPruningPolicy for size vs performance considerations.

This implementation uses simple term frequency thresholds to remove all postings from documents where a given term occurs rarely (i.e. its TF in a document is smaller than the threshold).

Threshold values in this method are expressed as absolute term frequencies.


Field Summary
protected  int curThr
           
protected  int defThreshold
           
protected  Map<String,Integer> thresholds
           
 
Fields inherited from class org.apache.lucene.index.pruning.TermPruningPolicy
fieldFlags, in
 
Fields inherited from class org.apache.lucene.index.pruning.PruningPolicy
DEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR
 
Constructor Summary
TFTermPruningPolicy(IndexReader in, Map<String,Integer> fieldFlags, Map<String,Integer> thresholds, int defThreshold)
           
 
Method Summary
 void initPositionsTerm(TermPositions in, Term t)
          Called when moving TermPositions to a new Term.
 boolean pruneAllPositions(TermPositions termPositions, Term t)
          Prune all postings per term (invoked once per term per doc)
 int pruneSomePositions(int docNum, int[] positions, Term curTerm)
          Prune some postings per term (invoked once per term per doc).
 boolean pruneTermEnum(TermEnum te)
          Pruning of all postings for a term (invoked once per term).
 int pruneTermVectorTerms(int docNumber, String field, String[] terms, int[] freqs, TermFreqVector tfv)
          Pruning of individual terms in term vectors.
 
Methods inherited from class org.apache.lucene.index.pruning.TermPruningPolicy
pruneAllFieldPostings, prunePayload, pruneWholeTermVector
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

thresholds

protected Map<String,Integer> thresholds

defThreshold

protected int defThreshold

curThr

protected int curThr
Constructor Detail

TFTermPruningPolicy

public TFTermPruningPolicy(IndexReader in,
                           Map<String,Integer> fieldFlags,
                           Map<String,Integer> thresholds,
                           int defThreshold)
Method Detail

pruneTermEnum

public boolean pruneTermEnum(TermEnum te)
                      throws IOException
Description copied from class: TermPruningPolicy
Pruning of all postings for a term (invoked once per term).

Specified by:
pruneTermEnum in class TermPruningPolicy
Parameters:
te - positioned term enum.
Returns:
true if all postings for this term should be removed, false otherwise.
Throws:
IOException

initPositionsTerm

public void initPositionsTerm(TermPositions in,
                              Term t)
                       throws IOException
Description copied from class: TermPruningPolicy
Called when moving TermPositions to a new Term.

Specified by:
initPositionsTerm in class TermPruningPolicy
Parameters:
in - input term positions
t - current term
Throws:
IOException

pruneAllPositions

public boolean pruneAllPositions(TermPositions termPositions,
                                 Term t)
                          throws IOException
Description copied from class: TermPruningPolicy
Prune all postings per term (invoked once per term per doc)

Specified by:
pruneAllPositions in class TermPruningPolicy
Parameters:
termPositions - positioned term positions. Implementations MUST NOT advance this by calling TermPositions methods that advance either the position pointer (next, skipTo) or term pointer (seek).
t - current term
Returns:
true if the current posting should be removed, false otherwise.
Throws:
IOException

pruneTermVectorTerms

public int pruneTermVectorTerms(int docNumber,
                                String field,
                                String[] terms,
                                int[] freqs,
                                TermFreqVector tfv)
                         throws IOException
Description copied from class: TermPruningPolicy
Pruning of individual terms in term vectors.

Specified by:
pruneTermVectorTerms in class TermPruningPolicy
Parameters:
docNumber - document number
field - field name
terms - array of terms
freqs - array of term frequencies
tfv - the original term frequency vector
Returns:
0 if no terms are to be removed, positive number to indicate how many terms need to be removed. The same number of entries in the terms array must be set to null to indicate which terms to remove.
Throws:
IOException

pruneSomePositions

public int pruneSomePositions(int docNum,
                              int[] positions,
                              Term curTerm)
Description copied from class: TermPruningPolicy
Prune some postings per term (invoked once per term per doc).

Specified by:
pruneSomePositions in class TermPruningPolicy
Parameters:
docNum - current document number
positions - original term positions in the document (and indirectly term frequency)
curTerm - current term
Returns:
0 if no postings are to be removed, or positive number to indicate how many postings need to be removed. The same number of entries in the positions array must be set to -1 to indicate which positions to remove.