org.apache.lucene.benchmark.byTask.feeds
Class TrecFTParser

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
      extended by org.apache.lucene.benchmark.byTask.feeds.TrecFTParser

public class TrecFTParser
extends TrecDocParser

Parser for the FT docs in trec disks 4+5 collection format


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
TrecDocParser.ParsePathType
 
Field Summary
 
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
DEFAULT_PATH_TYPE
 
Constructor Summary
TrecFTParser()
           
 
Method Summary
 DocData parse(DocData docData, String name, TrecContentSource trecSrc, StringBuilder docBuf, TrecDocParser.ParsePathType pathType)
          parse the text prepared in docBuf into a result DocData, no synchronization is required.
 
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
extract, pathType, stripTags, stripTags
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TrecFTParser

public TrecFTParser()
Method Detail

parse

public DocData parse(DocData docData,
                     String name,
                     TrecContentSource trecSrc,
                     StringBuilder docBuf,
                     TrecDocParser.ParsePathType pathType)
              throws IOException,
                     InterruptedException
Description copied from class: TrecDocParser
parse the text prepared in docBuf into a result DocData, no synchronization is required.

Specified by:
parse in class TrecDocParser
Parameters:
docData - reusable result
name - name that should be set to the result
trecSrc - calling trec content source
docBuf - text to parse
pathType - type of parsed file, or null if unknown - may be used by parsers to alter their behavior according to the file path type.
Throws:
IOException
InterruptedException