org.apache.lucene.misc
Class HighFreqTerms

java.lang.Object
  extended by org.apache.lucene.misc.HighFreqTerms

public class HighFreqTerms
extends Object

HighFreqTerms class extracts the top n most frequent terms (by document frequency ) from an existing Lucene index and reports their document frequency. If used with the -t flag it also reports their total tf (total number of occurences) in order of highest total tf


Field Summary
static int DEFAULTnumTerms
           
static int numTerms
           
 
Constructor Summary
HighFreqTerms()
           
 
Method Summary
static org.apache.lucene.misc.TermStats[] getHighFreqTerms(IndexReader reader, int numTerms, String field)
           
static long getTotalTermFreq(IndexReader reader, Term term)
           
static void main(String[] args)
           
static org.apache.lucene.misc.TermStats[] sortByTotalTermFreq(IndexReader reader, org.apache.lucene.misc.TermStats[] terms)
          Takes array of TermStats.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULTnumTerms

public static final int DEFAULTnumTerms
See Also:
Constant Field Values

numTerms

public static int numTerms
Constructor Detail

HighFreqTerms

public HighFreqTerms()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

getHighFreqTerms

public static org.apache.lucene.misc.TermStats[] getHighFreqTerms(IndexReader reader,
                                                                  int numTerms,
                                                                  String field)
                                                           throws Exception
Parameters:
reader -
numTerms -
field -
Returns:
TermStats[] ordered by terms with highest docFreq first.
Throws:
Exception

sortByTotalTermFreq

public static org.apache.lucene.misc.TermStats[] sortByTotalTermFreq(IndexReader reader,
                                                                     org.apache.lucene.misc.TermStats[] terms)
                                                              throws Exception
Takes array of TermStats. For each term looks up the tf for each doc containing the term and stores the total in the output array of TermStats. Output array is sorted by highest total tf.

Parameters:
reader -
terms - TermStats[]
Returns:
TermStats[]
Throws:
Exception

getTotalTermFreq

public static long getTotalTermFreq(IndexReader reader,
                                    Term term)
                             throws Exception
Throws:
Exception