org.apache.nutch.fetcher
Class FetcherJob.FetcherMapper
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>
org.apache.gora.mapreduce.GoraMapper<String,WebPage,IntWritable,FetchEntry>
org.apache.nutch.fetcher.FetcherJob.FetcherMapper
- Enclosing class:
- FetcherJob
public static class FetcherJob.FetcherMapper
- extends org.apache.gora.mapreduce.GoraMapper<String,WebPage,IntWritable,FetchEntry>
Mapper class for Fetcher.
This class reads the random integer written by GeneratorJob as its key
while outputting the actual key and value arguments through a
FetchEntry instance.
This approach (combined with the use of PartitionUrlByHost) makes
sure that Fetcher is still polite while also randomizing the key order. If
one host has a huge number of URLs in your table while other hosts have
not, FetcherReducer will not be stuck on one host but process URLs
from other hosts as well.
| Methods inherited from class org.apache.gora.mapreduce.GoraMapper |
initMapperJob, initMapperJob, initMapperJob, initMapperJob, initMapperJob |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
FetcherJob.FetcherMapper
public FetcherJob.FetcherMapper()
setup
protected void setup(Mapper.Context context)
- Overrides:
setup in class Mapper<String,WebPage,IntWritable,FetchEntry>
map
protected void map(String key,
WebPage page,
Mapper.Context context)
throws IOException,
InterruptedException
- Overrides:
map in class Mapper<String,WebPage,IntWritable,FetchEntry>
- Throws:
IOException
InterruptedException
Copyright © 2012 The Apache Software Foundation