org.apache.nutch.parse
Class ParseUtil
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.parse.ParseUtil
- All Implemented Interfaces:
- Configurable
public class ParseUtil
- extends Configured
A Utility class containing methods to simply perform parsing utilities such
as iterating through a preferred list of Parsers to obtain
Parse objects.
- Author:
- mattmann, Jérôme Charron, Sébastien Le Callonnec
|
Field Summary |
static org.slf4j.Logger |
LOG
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
ParseUtil
public ParseUtil(Configuration conf)
- Parameters:
conf -
getConf
public Configuration getConf()
- Specified by:
getConf in interface Configurable- Overrides:
getConf in class Configured
setConf
public void setConf(Configuration conf)
- Specified by:
setConf in interface Configurable- Overrides:
setConf in class Configured
parse
public Parse parse(String url,
WebPage page)
throws ParserNotFound,
ParseException
- Performs a parse by iterating through a List of preferred
Parsers
until a successful parse is performed and a Parse object is
returned. If the parse is unsuccessful, a message is logged to the
WARNING level, and an empty parse is returned.
- Throws:
ParserNotFound - If there is no suitable parser found.
ParseException - If there is an error parsing.
process
public URLWebPage process(String key,
WebPage page)
- Parses given web page and stores parsed content within page. Returns
a pair of if a meta-redirect is discovered
- Parameters:
key - page -
- Returns:
- newly-discovered webpage (via a meta-redirect)
Copyright © 2012 The Apache Software Foundation