org.creativecommons.nutch
Class CCParseFilter

java.lang.Object
  extended by org.creativecommons.nutch.CCParseFilter
All Implemented Interfaces:
Configurable, ParseFilter, FieldPluggable, Pluggable

public class CCParseFilter
extends Object
implements ParseFilter

Adds metadata identifying the Creative Commons license used, if any.


Nested Class Summary
static class CCParseFilter.Walker
          Walks DOM tree, looking for RDF in comments and licenses in anchors.
 
Field Summary
static org.slf4j.Logger LOG
           
 
Fields inherited from interface org.apache.nutch.parse.ParseFilter
X_POINT_ID
 
Constructor Summary
CCParseFilter()
           
 
Method Summary
 Parse filter(String url, WebPage page, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
 Configuration getConf()
           
 Collection<WebPage.Field> getFields()
           
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

CCParseFilter

public CCParseFilter()
Method Detail

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable

getFields

public Collection<WebPage.Field> getFields()
Specified by:
getFields in interface FieldPluggable

filter

public Parse filter(String url,
                    WebPage page,
                    Parse parse,
                    HTMLMetaTags metaTags,
                    DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.

Specified by:
filter in interface ParseFilter


Copyright © 2012 The Apache Software Foundation