org.apache.nutch.parse.html
Class HtmlParser
java.lang.Object
org.apache.nutch.parse.html.HtmlParser
- All Implemented Interfaces:
- Configurable, Parser, FieldPluggable, Pluggable
public class HtmlParser
- extends Object
- implements Parser
Field Summary |
static org.slf4j.Logger |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
HtmlParser
public HtmlParser()
getParse
public Parse getParse(String url,
WebPage page)
- Description copied from interface:
Parser
This method parses content in WebPage instance
- Specified by:
getParse
in interface Parser
- Parameters:
url
- Page's URL
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
getFields
public Collection<WebPage.Field> getFields()
- Specified by:
getFields
in interface FieldPluggable
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2012 The Apache Software Foundation