org.apache.lucene.benchmark.byTask.feeds
Class TrecFBISParser

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
      extended by org.apache.lucene.benchmark.byTask.feeds.TrecFBISParser

public class TrecFBISParser
extends TrecDocParser

Parser for the FBIS docs in trec disks 4+5 collection format


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
TrecDocParser.ParsePathType
 
Field Summary
 
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
DEFAULT_PATH_TYPE
 
Constructor Summary
TrecFBISParser()
           
 
Method Summary
 DocData parse(DocData docData, String name, TrecContentSource trecSrc, StringBuilder docBuf, TrecDocParser.ParsePathType pathType)
          parse the text prepared in docBuf into a result DocData, no synchronization is required.
 
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
extract, pathType, stripTags, stripTags
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TrecFBISParser

public TrecFBISParser()
Method Detail

parse

public DocData parse(DocData docData,
                     String name,
                     TrecContentSource trecSrc,
                     StringBuilder docBuf,
                     TrecDocParser.ParsePathType pathType)
              throws IOException,
                     InterruptedException
Description copied from class: TrecDocParser
parse the text prepared in docBuf into a result DocData, no synchronization is required.

Specified by:
parse in class TrecDocParser
Parameters:
docData - reusable result
name - name that should be set to the result
trecSrc - calling trec content source
docBuf - text to parse
pathType - type of parsed file, or null if unknown - may be used by parsers to alter their behavior according to the file path type.
Throws:
IOException
InterruptedException