org.apache.lucene.misc
Class HighFreqTerms
java.lang.Object
org.apache.lucene.misc.HighFreqTerms
public class HighFreqTerms
- extends Object
HighFreqTerms
class extracts the top n most frequent terms
(by document frequency ) from an existing Lucene index and reports their
document frequency. If used with the -t flag it also reports their
total tf (total number of occurences) in order of highest total tf
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULTnumTerms
public static final int DEFAULTnumTerms
- See Also:
- Constant Field Values
numTerms
public static int numTerms
HighFreqTerms
public HighFreqTerms()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
getHighFreqTerms
public static org.apache.lucene.misc.TermStats[] getHighFreqTerms(IndexReader reader,
int numTerms,
String field)
throws Exception
- Parameters:
reader
- numTerms
- field
-
- Returns:
- TermStats[] ordered by terms with highest docFreq first.
- Throws:
Exception
sortByTotalTermFreq
public static org.apache.lucene.misc.TermStats[] sortByTotalTermFreq(IndexReader reader,
org.apache.lucene.misc.TermStats[] terms)
throws Exception
- Takes array of TermStats. For each term looks up the tf for each doc
containing the term and stores the total in the output array of TermStats.
Output array is sorted by highest total tf.
- Parameters:
reader
- terms
- TermStats[]
- Returns:
- TermStats[]
- Throws:
Exception
getTotalTermFreq
public static long getTotalTermFreq(IndexReader reader,
Term term)
throws Exception
- Throws:
Exception