org.apache.nutch.parse
Class ParserJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.util.NutchTool
          extended by org.apache.nutch.parse.ParserJob
All Implemented Interfaces:
Configurable, Tool

public class ParserJob
extends NutchTool
implements Tool


Nested Class Summary
static class ParserJob.ParserMapper
           
 
Field Summary
static org.slf4j.Logger LOG
           
static String SKIP_TRUNCATED
           
 
Fields inherited from class org.apache.nutch.util.NutchTool
currentJob, currentJobNum, numJobs, results, status
 
Constructor Summary
ParserJob()
           
ParserJob(Configuration conf)
           
 
Method Summary
 Configuration getConf()
           
 Collection<WebPage.Field> getFields(Job job)
           
static boolean isTruncated(String url, WebPage page)
          Checks if the page's content is truncated.
static void main(String[] args)
           
 int parse(String batchId, boolean shouldResume, boolean force)
           
 Map<String,Object> run(Map<String,Object> args)
          Runs the tool, using a map of arguments.
 int run(String[] args)
           
 void setConf(Configuration conf)
           
 
Methods inherited from class org.apache.nutch.util.NutchTool
getProgress, getStatus, killJob, stopJob
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG

SKIP_TRUNCATED

public static final String SKIP_TRUNCATED
See Also:
Constant Field Values
Constructor Detail

ParserJob

public ParserJob()

ParserJob

public ParserJob(Configuration conf)
Method Detail

isTruncated

public static boolean isTruncated(String url,
                                  WebPage page)
Checks if the page's content is truncated.

Parameters:
url -
page -
Returns:
If the page is truncated true. When it is not, or when it could be determined, false.

getFields

public Collection<WebPage.Field> getFields(Job job)

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable
Overrides:
getConf in class Configured

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable
Overrides:
setConf in class Configured

run

public Map<String,Object> run(Map<String,Object> args)
                       throws Exception
Description copied from class: NutchTool
Runs the tool, using a map of arguments. May return results, or null.

Specified by:
run in class NutchTool
Throws:
Exception

parse

public int parse(String batchId,
                 boolean shouldResume,
                 boolean force)
          throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Specified by:
run in interface Tool
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2012 The Apache Software Foundation