org.apache.lucene.benchmark.byTask.feeds
Class TrecParserByPath

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
      extended by org.apache.lucene.benchmark.byTask.feeds.TrecParserByPath

public class TrecParserByPath
extends TrecDocParser

Parser for trec docs which selects the parser to apply according to the source files path, defaulting to TrecGov2Parser.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
TrecDocParser.ParsePathType
 
Field Summary
 
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
DEFAULT_PATH_TYPE
 
Constructor Summary
TrecParserByPath()
           
 
Method Summary
 DocData parse(DocData docData, String name, TrecContentSource trecSrc, StringBuilder docBuf, TrecDocParser.ParsePathType pathType)
          parse the text prepared in docBuf into a result DocData, no synchronization is required.
 
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.TrecDocParser
extract, pathType, stripTags, stripTags
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TrecParserByPath

public TrecParserByPath()
Method Detail

parse

public DocData parse(DocData docData,
                     String name,
                     TrecContentSource trecSrc,
                     StringBuilder docBuf,
                     TrecDocParser.ParsePathType pathType)
              throws IOException,
                     InterruptedException
Description copied from class: TrecDocParser
parse the text prepared in docBuf into a result DocData, no synchronization is required.

Specified by:
parse in class TrecDocParser
Parameters:
docData - reusable result
name - name that should be set to the result
trecSrc - calling trec content source
docBuf - text to parse
pathType - type of parsed file, or null if unknown - may be used by parsers to alter their behavior according to the file path type.
Throws:
IOException
InterruptedException