org.apache.lucene.benchmark.utils
Class ExtractReuters
java.lang.Object
org.apache.lucene.benchmark.utils.ExtractReuters
public class ExtractReuters
- extends Object
Split the Reuters SGML documents into Simple Text files containing: Title,
Date, Dateline, Body
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ExtractReuters
public ExtractReuters(File reutersDir,
File outputDir)
extract
public void extract()
extractFile
protected void extractFile(File sgmFile)
- Override if you wish to change what is extracted
- Parameters:
sgmFile
-
main
public static void main(String[] args)