|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.index.pruning.PruningPolicy org.apache.lucene.index.pruning.TermPruningPolicy
public abstract class TermPruningPolicy
Policy for producing smaller index out of an input index, by examining its terms and removing from the index some or all of their data as follows:
pruneAllFieldPostings(String)
pruneTermEnum(TermEnum)
The pruned, smaller index would, for many types of queries return nearly identical top-N results as compared with the original index, but with increased performance.
Pruning of indexes is handy for producing small first-tier indexes that fit
completely in RAM, and store these indexes using IndexWriter.addIndexes(IndexReader...)
Interestingly, if the input index is optimized (i.e. doesn't contain deletions),
then the index produced via IndexWriter.addIndexes(IndexReader[])
will preserve internal document
id-s so that they are in sync with the original index. This means that
all other auxiliary information not necessary for first-tier processing, such
as some stored fields, can also be removed, to be quickly retrieved on-demand
from the original index using the same internal document id. See
StorePruningPolicy
for information about removing stored fields.
Please note that while this family of policies method produces good results for term queries it often leads to poor results for phrase queries (because postings are removed without considering whether they belong to an important phrase).
Aggressive pruning policies produce smaller indexes - search performance increases, and recall decreases (i.e. search quality deteriorates).
See the following papers for a discussion of this problem and the
proposed solutions to improve the quality of a pruned index (not implemented
here):
Field Summary | |
---|---|
protected Map<String,Integer> |
fieldFlags
Pruning operations to be conducted on fields. |
protected IndexReader |
in
|
Fields inherited from class org.apache.lucene.index.pruning.PruningPolicy |
---|
DEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR |
Constructor Summary | |
---|---|
protected |
TermPruningPolicy(IndexReader in,
Map<String,Integer> fieldFlags)
Construct a policy. |
Method Summary | |
---|---|
abstract void |
initPositionsTerm(TermPositions in,
Term t)
Called when moving TermPositions to a new Term . |
boolean |
pruneAllFieldPostings(String field)
Pruning of all postings for a field |
abstract boolean |
pruneAllPositions(TermPositions termPositions,
Term t)
Prune all postings per term (invoked once per term per doc) |
boolean |
prunePayload(TermPositions in,
Term curTerm)
Called when checking for the presence of payload for the current term at a current position |
abstract int |
pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
Prune some postings per term (invoked once per term per doc). |
abstract boolean |
pruneTermEnum(TermEnum te)
Pruning of all postings for a term (invoked once per term). |
abstract int |
pruneTermVectorTerms(int docNumber,
String field,
String[] terms,
int[] freqs,
TermFreqVector v)
Pruning of individual terms in term vectors. |
boolean |
pruneWholeTermVector(int docNumber,
String field)
Term vector pruning. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected Map<String,Integer> fieldFlags
protected IndexReader in
Constructor Detail |
---|
protected TermPruningPolicy(IndexReader in, Map<String,Integer> fieldFlags)
in
- input readerfieldFlags
- a map, where keys are field names and values
are bitwise-OR flags of operations to be performed (see
PruningPolicy
for more details).Method Detail |
---|
public boolean pruneWholeTermVector(int docNumber, String field) throws IOException
docNumber
- document numberfield
- field name
PruningPolicy.DEL_VECTOR
flag).
IOException
public boolean pruneAllFieldPostings(String field) throws IOException
field
- field name
PruningPolicy.DEL_POSTINGS
).
IOException
public abstract void initPositionsTerm(TermPositions in, Term t) throws IOException
TermPositions
to a new Term
.
in
- input term positionst
- current term
IOException
public boolean prunePayload(TermPositions in, Term curTerm)
in
- positioned term positionscurTerm
- current term associated with these positions
public abstract int pruneTermVectorTerms(int docNumber, String field, String[] terms, int[] freqs, TermFreqVector v) throws IOException
docNumber
- document numberfield
- field nameterms
- array of termsfreqs
- array of term frequenciesv
- the original term frequency vector
IOException
public abstract boolean pruneTermEnum(TermEnum te) throws IOException
te
- positioned term enum.
IOException
public abstract boolean pruneAllPositions(TermPositions termPositions, Term t) throws IOException
termPositions
- positioned term positions. Implementations MUST NOT
advance this by calling TermPositions
methods that advance either
the position pointer (next, skipTo) or term pointer (seek).t
- current term
IOException
public abstract int pruneSomePositions(int docNum, int[] positions, Term curTerm)
docNum
- current document numberpositions
- original term positions in the document (and indirectly
term frequency)curTerm
- current term
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |