org.creativecommons.nutch
Class CCParseFilter
java.lang.Object
org.creativecommons.nutch.CCParseFilter
- All Implemented Interfaces:
- Configurable, ParseFilter, FieldPluggable, Pluggable
public class CCParseFilter
- extends Object
- implements ParseFilter
Adds metadata identifying the Creative Commons license used, if any.
Nested Class Summary |
static class |
CCParseFilter.Walker
Walks DOM tree, looking for RDF in comments and licenses in anchors. |
Field Summary |
static org.slf4j.Logger |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
CCParseFilter
public CCParseFilter()
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
getFields
public Collection<WebPage.Field> getFields()
- Specified by:
getFields
in interface FieldPluggable
filter
public Parse filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
- Adds metadata or otherwise modifies a parse of an HTML document, given
the DOM tree of a page.
- Specified by:
filter
in interface ParseFilter
Copyright © 2012 The Apache Software Foundation