org.apache.nutch.parse
Interface ParseFilter

All Superinterfaces:
Configurable, FieldPluggable, Pluggable
All Known Implementing Classes:
CCParseFilter, HTMLLanguageParser, JSParseFilter, RelTagParser

public interface ParseFilter
extends FieldPluggable, Configurable

Extension point for DOM-based parsers. Permits one to add additional metadata to parses provided by the html or tika plugins. All plugins found which implement this extension point are run sequentially on the parse.


Field Summary
static String X_POINT_ID
          The name of the extension point.
 
Method Summary
 Parse filter(String url, WebPage page, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse, given the DOM tree of a page.
 
Methods inherited from interface org.apache.nutch.plugin.FieldPluggable
getFields
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

X_POINT_ID

static final String X_POINT_ID
The name of the extension point.

Method Detail

filter

Parse filter(String url,
             WebPage page,
             Parse parse,
             HTMLMetaTags metaTags,
             DocumentFragment doc)
Adds metadata or otherwise modifies a parse, given the DOM tree of a page.



Copyright © 2012 The Apache Software Foundation