org.apache.nutch.parse
Class ParserJob
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.util.NutchTool
org.apache.nutch.parse.ParserJob
- All Implemented Interfaces:
- Configurable, Tool
public class ParserJob
- extends NutchTool
- implements Tool
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
SKIP_TRUNCATED
public static final String SKIP_TRUNCATED
- See Also:
- Constant Field Values
ParserJob
public ParserJob()
ParserJob
public ParserJob(Configuration conf)
isTruncated
public static boolean isTruncated(String url,
WebPage page)
- Checks if the page's content is truncated.
- Parameters:
url
- page
-
- Returns:
- If the page is truncated
true
. When it is not,
or when it could be determined, false
.
getFields
public Collection<WebPage.Field> getFields(Job job)
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
- Overrides:
getConf
in class Configured
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
- Overrides:
setConf
in class Configured
run
public Map<String,Object> run(Map<String,Object> args)
throws Exception
- Description copied from class:
NutchTool
- Runs the tool, using a map of arguments.
May return results, or null.
- Specified by:
run
in class NutchTool
- Throws:
Exception
parse
public int parse(String batchId,
boolean shouldResume,
boolean force)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Specified by:
run
in interface Tool
- Throws:
Exception
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2012 The Apache Software Foundation