org.creativecommons.nutch
Class CCIndexingFilter

java.lang.Object
  extended by org.creativecommons.nutch.CCIndexingFilter
All Implemented Interfaces:
Configurable, IndexingFilter, FieldPluggable, Pluggable

public class CCIndexingFilter
extends Object
implements IndexingFilter

Adds basic searchable fields to a document.


Field Summary
static String FIELD
          The name of the document field we use.
static org.slf4j.Logger LOG
           
 
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
 
Constructor Summary
CCIndexingFilter()
           
 
Method Summary
 void addUrlFeatures(NutchDocument doc, String urlString)
          Add the features represented by a license URL.
 NutchDocument filter(NutchDocument doc, String url, WebPage page)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
 Configuration getConf()
           
 Collection<WebPage.Field> getFields()
           
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG

FIELD

public static String FIELD
The name of the document field we use.

Constructor Detail

CCIndexingFilter

public CCIndexingFilter()
Method Detail

addUrlFeatures

public void addUrlFeatures(NutchDocument doc,
                           String urlString)
Add the features represented by a license URL. Urls are of the form "http://creativecommons.org/licenses/xx-xx/xx/xx", where "xx" names a license feature.


setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable

getFields

public Collection<WebPage.Field> getFields()
Specified by:
getFields in interface FieldPluggable

filter

public NutchDocument filter(NutchDocument doc,
                            String url,
                            WebPage page)
                     throws IndexingException
Description copied from interface: IndexingFilter
Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.

Specified by:
filter in interface IndexingFilter
Parameters:
doc - document instance for collecting fields
url - page url
Returns:
modified (or a new) document instance, or null (meaning the document should be discarded)
Throws:
IndexingException


Copyright © 2012 The Apache Software Foundation