org.apache.nutch.parse
Interface ParseFilter
- All Superinterfaces:
- Configurable, FieldPluggable, Pluggable
- All Known Implementing Classes:
- CCParseFilter, HTMLLanguageParser, JSParseFilter, RelTagParser
public interface ParseFilter
- extends FieldPluggable, Configurable
Extension point for DOM-based parsers. Permits one to add additional
metadata to parses provided by the html or tika plugins. All plugins found which implement this extension
point are run sequentially on the parse.
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
Parse filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
- Adds metadata or otherwise modifies a parse, given
the DOM tree of a page.
Copyright © 2012 The Apache Software Foundation