org.apache.nutch.indexer.feed
Class FeedIndexingFilter
java.lang.Object
org.apache.nutch.indexer.feed.FeedIndexingFilter
- All Implemented Interfaces:
- Configurable, IndexingFilter, FieldPluggable, Pluggable
public class FeedIndexingFilter
- extends Object
- implements IndexingFilter
- Since:
- NUTCH-444
An
IndexingFilter
implementation to pull out the
relevant extracted Metadata
fields from the RSS feeds
and into the index.
- Author:
- dogacan, mattmann
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
dateFormatStr
public static final String dateFormatStr
- See Also:
- Constant Field Values
FeedIndexingFilter
public FeedIndexingFilter()
filter
public NutchDocument filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
throws IndexingException
- Extracts out the relevant fields:
- FEED_AUTHOR
- FEED_TAGS
- FEED_PUBLISHED
- FEED_UPDATED
- FEED
And sends them to the Indexer
for indexing within the Nutch
index.
- Throws:
IndexingException
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
- Returns:
- the
Configuration
object used to configure
this IndexingFilter
.
setConf
public void setConf(Configuration conf)
- Sets the
Configuration
object used to configure this
IndexingFilter
.
- Specified by:
setConf
in interface Configurable
- Parameters:
conf
- The Configuration
object used to configure
this IndexingFilter
.
Copyright © 2012 The Apache Software Foundation