Package org.apache.nutch.crawl

Crawl control code.

See:
          Description

Interface Summary
FetchSchedule This interface defines the contract for implementations that manipulate fetch times and re-fetch intervals.
 

Class Summary
AbstractFetchSchedule This class provides common methods for implementations of FetchSchedule.
AdaptiveFetchSchedule This class implements an adaptive re-fetch algorithm.
Crawler  
CrawlStatus  
DbUpdateMapper  
DbUpdateReducer  
DbUpdaterJob  
DefaultFetchSchedule This class implements the default re-fetch schedule.
FetchScheduleFactory Creates and caches a FetchSchedule implementation.
GeneratorJob  
GeneratorJob.SelectorEntry  
GeneratorJob.SelectorEntryComparator  
GeneratorMapper  
GeneratorReducer Reduce class for generate The #reduce() method write a random integer to all generated URLs.
InjectorJob This class takes a flat file of URLs and adds them to the of pages to be crawled.
InjectorJob.InjectorMapper  
InjectorJob.UrlMapper  
MD5Signature Default implementation of a page signature.
NutchWritable  
Signature  
SignatureComparator  
SignatureFactory Factory class, which instantiates a Signature implementation according to the current Configuration configuration.
TextProfileSignature An implementation of a page signature.
URLPartitioner Partition urls by host, domain name or IP depending on the value of the parameter 'partition.url.mode' which can be 'byHost', 'byDomain' or 'byIP'
URLPartitioner.FetchEntryPartitioner  
URLPartitioner.SelectorEntryPartitioner  
URLWebPage  
UrlWithScore A writable comparable container for an url with score.
UrlWithScore.UrlOnlyPartitioner A partitioner by {url}.
UrlWithScore.UrlScoreComparator Compares by {url,score}.
UrlWithScore.UrlScoreComparator.UrlOnlyComparator Compares by {url}.
WebTableReader Displays information about the entries of the webtable
WebTableReader.WebTableRegexMapper Filters the entries from the table based on a regex
WebTableReader.WebTableStatCombiner  
WebTableReader.WebTableStatMapper  
WebTableReader.WebTableStatReducer  
 

Package org.apache.nutch.crawl Description

Crawl control code.



Copyright © 2012 The Apache Software Foundation