org.apache.nutch.crawl
Class WebTableReader

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.util.NutchTool
          extended by org.apache.nutch.crawl.WebTableReader
All Implemented Interfaces:
Configurable, Tool

public class WebTableReader
extends NutchTool
implements Tool

Displays information about the entries of the webtable


Nested Class Summary
static class WebTableReader.WebTableRegexMapper
          Filters the entries from the table based on a regex
static class WebTableReader.WebTableStatCombiner
           
static class WebTableReader.WebTableStatMapper
           
static class WebTableReader.WebTableStatReducer
           
 
Field Summary
static org.slf4j.Logger LOG
           
 
Fields inherited from class org.apache.nutch.util.NutchTool
currentJob, currentJobNum, numJobs, results, status
 
Constructor Summary
WebTableReader()
           
 
Method Summary
static void main(String[] args)
           
 void processDumpJob(String output, Configuration config, String regex, boolean content, boolean headers, boolean links, boolean text)
           
 void processStatJob(boolean sort)
           
 Map<String,Object> run(Map<String,Object> args)
          Runs the tool, using a map of arguments.
 int run(String[] args)
           
 
Methods inherited from class org.apache.nutch.util.NutchTool
getProgress, getStatus, killJob, stopJob
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

WebTableReader

public WebTableReader()
Method Detail

processStatJob

public void processStatJob(boolean sort)
                    throws Exception
Throws:
Exception

processDumpJob

public void processDumpJob(String output,
                           Configuration config,
                           String regex,
                           boolean content,
                           boolean headers,
                           boolean links,
                           boolean text)
                    throws IOException,
                           ClassNotFoundException,
                           InterruptedException
Throws:
IOException
ClassNotFoundException
InterruptedException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Specified by:
run in interface Tool
Throws:
Exception

run

public Map<String,Object> run(Map<String,Object> args)
                       throws Exception
Description copied from class: NutchTool
Runs the tool, using a map of arguments. May return results, or null.

Specified by:
run in class NutchTool
Throws:
Exception


Copyright © 2012 The Apache Software Foundation