org.apache.nutch.indexer
Interface IndexingFilter
- All Superinterfaces:
- Configurable, FieldPluggable, Pluggable
- All Known Implementing Classes:
- AnchorIndexingFilter, BasicIndexingFilter, CCIndexingFilter, FeedIndexingFilter, LanguageIndexingFilter, MoreIndexingFilter, RelTagIndexingFilter, SubcollectionIndexingFilter, TLDIndexingFilter
public interface IndexingFilter
- extends FieldPluggable, Configurable
Extension point for indexing. Permits one to add metadata to the indexed
fields. All plugins found which implement this extension point are run
sequentially on the parse.
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
NutchDocument filter(NutchDocument doc,
String url,
WebPage page)
throws IndexingException
- Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value.
- Parameters:
doc
- document instance for collecting fieldsurl
- page urlpage
-
- Returns:
- modified (or a new) document instance, or null (meaning the document
should be discarded)
- Throws:
IndexingException
Copyright © 2012 The Apache Software Foundation