org.apache.nutch.parse
Class ParseUtil
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.parse.ParseUtil
- All Implemented Interfaces:
- Configurable
public class ParseUtil
- extends Configured
A Utility class containing methods to simply perform parsing utilities such
as iterating through a preferred list of Parser
s to obtain
Parse
objects.
- Author:
- mattmann, Jérôme Charron, Sébastien Le Callonnec
Field Summary |
static org.slf4j.Logger |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
ParseUtil
public ParseUtil(Configuration conf)
- Parameters:
conf
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
- Overrides:
getConf
in class Configured
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
- Overrides:
setConf
in class Configured
parse
public Parse parse(String url,
WebPage page)
throws ParserNotFound,
ParseException
- Performs a parse by iterating through a List of preferred
Parser
s
until a successful parse is performed and a Parse
object is
returned. If the parse is unsuccessful, a message is logged to the
WARNING
level, and an empty parse is returned.
- Throws:
ParserNotFound
- If there is no suitable parser found.
ParseException
- If there is an error parsing.
process
public URLWebPage process(String key,
WebPage page)
- Parses given web page and stores parsed content within page. Returns
a pair of if a meta-redirect is discovered
- Parameters:
key
- page
-
- Returns:
- newly-discovered webpage (via a meta-redirect)
Copyright © 2012 The Apache Software Foundation