org.apache.lucene.benchmark.byTask.feeds
Interface HTMLParser

All Known Implementing Classes:
DemoHTMLParser

public interface HTMLParser

HTML Parsing Interface for test purposes


Method Summary
 DocData parse(DocData docData, String name, Date date, String title, Reader reader, DateFormat dateFormat)
          Parse the input Reader and return DocData.
 

Method Detail

parse

DocData parse(DocData docData,
              String name,
              Date date,
              String title,
              Reader reader,
              DateFormat dateFormat)
              throws IOException,
                     InterruptedException
Parse the input Reader and return DocData. The provided name,title,date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.

Parameters:
docData - result reused
name - name of the result doc data.
date - date of the result doc data. If null, attempt to set by parsed data.
title - title of the result doc data. If null, attempt to set by parsed data.
reader - reader of html text to parse.
dateFormat - date formatter to use for extracting the date.
Returns:
Parsed doc data.
Throws:
IOException
InterruptedException