|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.lucene.index.pruning.PruningPolicy
org.apache.lucene.index.pruning.TermPruningPolicy
org.apache.lucene.index.pruning.TFTermPruningPolicy
public class TFTermPruningPolicy
Policy for producing smaller index out of an input index, by removing postings data for those terms where their in-document frequency is below a specified threshold.
Larger threshold value will produce a smaller index.
See TermPruningPolicy for size vs performance considerations.
This implementation uses simple term frequency thresholds to remove all postings from documents where a given term occurs rarely (i.e. its TF in a document is smaller than the threshold).
Threshold values in this method are expressed as absolute term frequencies.
| Field Summary | |
|---|---|
protected int |
curThr
|
protected int |
defThreshold
|
protected Map<String,Integer> |
thresholds
|
| Fields inherited from class org.apache.lucene.index.pruning.TermPruningPolicy |
|---|
fieldFlags, in |
| Fields inherited from class org.apache.lucene.index.pruning.PruningPolicy |
|---|
DEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR |
| Constructor Summary | |
|---|---|
TFTermPruningPolicy(IndexReader in,
Map<String,Integer> fieldFlags,
Map<String,Integer> thresholds,
int defThreshold)
|
|
| Method Summary | |
|---|---|
void |
initPositionsTerm(TermPositions in,
Term t)
Called when moving TermPositions to a new Term. |
boolean |
pruneAllPositions(TermPositions termPositions,
Term t)
Prune all postings per term (invoked once per term per doc) |
int |
pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
Prune some postings per term (invoked once per term per doc). |
boolean |
pruneTermEnum(TermEnum te)
Pruning of all postings for a term (invoked once per term). |
int |
pruneTermVectorTerms(int docNumber,
String field,
String[] terms,
int[] freqs,
TermFreqVector tfv)
Pruning of individual terms in term vectors. |
| Methods inherited from class org.apache.lucene.index.pruning.TermPruningPolicy |
|---|
pruneAllFieldPostings, prunePayload, pruneWholeTermVector |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected Map<String,Integer> thresholds
protected int defThreshold
protected int curThr
| Constructor Detail |
|---|
public TFTermPruningPolicy(IndexReader in,
Map<String,Integer> fieldFlags,
Map<String,Integer> thresholds,
int defThreshold)
| Method Detail |
|---|
public boolean pruneTermEnum(TermEnum te)
throws IOException
TermPruningPolicy
pruneTermEnum in class TermPruningPolicyte - positioned term enum.
IOException
public void initPositionsTerm(TermPositions in,
Term t)
throws IOException
TermPruningPolicyTermPositions to a new Term.
initPositionsTerm in class TermPruningPolicyin - input term positionst - current term
IOException
public boolean pruneAllPositions(TermPositions termPositions,
Term t)
throws IOException
TermPruningPolicy
pruneAllPositions in class TermPruningPolicytermPositions - positioned term positions. Implementations MUST NOT
advance this by calling TermPositions methods that advance either
the position pointer (next, skipTo) or term pointer (seek).t - current term
IOException
public int pruneTermVectorTerms(int docNumber,
String field,
String[] terms,
int[] freqs,
TermFreqVector tfv)
throws IOException
TermPruningPolicy
pruneTermVectorTerms in class TermPruningPolicydocNumber - document numberfield - field nameterms - array of termsfreqs - array of term frequenciestfv - the original term frequency vector
IOException
public int pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
TermPruningPolicy
pruneSomePositions in class TermPruningPolicydocNum - current document numberpositions - original term positions in the document (and indirectly
term frequency)curTerm - current term
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||