Index (apache-nutch 2.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

A B C D E F G H I J K L M N O P R S T U V W X Y Z _

A

abort(String, String) - Method in class org.apache.nutch.api.impl.RAMJobManager
abort(String, String) - Method in interface org.apache.nutch.api.JobManager
AbstractFetchSchedule - Class in org.apache.nutch.crawl: This class provides common methods for implementations of FetchSchedule.
AbstractFetchSchedule() - Constructor for class org.apache.nutch.crawl.AbstractFetchSchedule
AbstractFetchSchedule(Configuration) - Constructor for class org.apache.nutch.crawl.AbstractFetchSchedule
AbstractTestbedHandler - Class in org.apache.nutch.tools.proxy
AbstractTestbedHandler() - Constructor for class org.apache.nutch.tools.proxy.AbstractTestbedHandler
accept - Variable in class org.apache.nutch.protocol.http.api.HttpBase: The "Accept" request header value.
accept() - Method in class org.apache.nutch.urlfilter.api.RegexRule: Return if this rule is used for filtering-in or out.
acceptLanguage - Variable in class org.apache.nutch.protocol.http.api.HttpBase: The "Accept-Language" request header value.
ACCESS_DENIED - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes: Access denied - authorization required, but missing/incorrect.
AdaptiveFetchSchedule - Class in org.apache.nutch.crawl: This class implements an adaptive re-fetch algorithm.
AdaptiveFetchSchedule() - Constructor for class org.apache.nutch.crawl.AdaptiveFetchSchedule
add(String, String) - Method in class org.apache.nutch.indexer.NutchDocument
add(String, String) - Method in class org.apache.nutch.metadata.Metadata: Add a metadata name/value mapping.
add(String, String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
add(byte[], byte[]) - Static method in class org.apache.nutch.util.Bytes
add(byte[], byte[], byte[]) - Static method in class org.apache.nutch.util.Bytes
add(E) - Method in class org.apache.nutch.util.Histogram
add(E, float) - Method in class org.apache.nutch.util.Histogram
add(Histogram<E>) - Method in class org.apache.nutch.util.Histogram
addAttribute(String, String) - Method in class org.apache.nutch.plugin.Extension: Adds a attribute and is only used until model creation at plugin system start up.
addClassToConf(Configuration, Class<? extends NutchIndexWriter>) - Static method in class org.apache.nutch.indexer.NutchIndexWriterFactory
addClue(String, String, int) - Method in class org.apache.nutch.util.EncodingDetector
addClue(String, String) - Method in class org.apache.nutch.util.EncodingDetector
addDependency(String) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a dependency
addExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a exported library with a relative path to the plugin directory.
addExtension(Extension) - Method in class org.apache.nutch.plugin.ExtensionPoint: Install a corresponding extension to this extension point.
addExtension(Extension) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a extension.
addExtensionPoint(ExtensionPoint) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a extension point.
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
addMeta(String, String) - Method in class org.apache.nutch.metadata.MetaWrapper: Add metadata.
addMyHeader(HttpServletResponse, String, String) - Method in class org.apache.nutch.tools.proxy.AbstractTestbedHandler
addNotExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a not exported library with a plugin directory relative path.
addPatternBackward(String) - Method in class org.apache.nutch.util.TrieStringMatcher: Adds any necessary nodes to the trie so that the given String can be decoded in reverse and the first character is represented by a terminal node.
addPatternForward(String) - Method in class org.apache.nutch.util.TrieStringMatcher: Adds any necessary nodes to the trie so that the given String can be decoded and the last character is represented by a terminal node.
addTiming(String, String, long) - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
addToArgs(Utf8) - Method in class org.apache.nutch.storage.ParseStatus
addToArgs(Utf8) - Method in class org.apache.nutch.storage.ProtocolStatus
addUrlFeatures(NutchDocument, String) - Method in class org.creativecommons.nutch.CCIndexingFilter: Add the features represented by a license URL.
AdminResource - Class in org.apache.nutch.api
AdminResource() - Constructor for class org.apache.nutch.api.AdminResource
ALL_BATCH_ID_STR - Static variable in interface org.apache.nutch.metadata.Nutch
ALL_CRAWL_ID - Static variable in interface org.apache.nutch.metadata.Nutch
AnchorIndexingFilter - Class in org.apache.nutch.indexer.anchor: Indexing filter that indexes all inbound anchor text for a document.
AnchorIndexingFilter() - Constructor for class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
APIInfoResource - Class in org.apache.nutch.api
APIInfoResource() - Constructor for class org.apache.nutch.api.APIInfoResource
append(Node) - Method in class org.apache.nutch.parse.html.DOMBuilder: Append a node to the current container.
APPLICATION_NAME - Static variable in interface org.apache.nutch.metadata.Office
ArcInputFormat - Class in org.apache.nutch.tools.arc: A input format the reads arc files.
ArcInputFormat() - Constructor for class org.apache.nutch.tools.arc.ArcInputFormat
ArcRecordReader - Class in org.apache.nutch.tools.arc: The ArchRecordReader class provides a record reader which reads records from arc files.
ArcRecordReader(Configuration, FileSplit) - Constructor for class org.apache.nutch.tools.arc.ArcRecordReader: Constructor that sets the configuration and file split.
ARG_BATCH - Static variable in interface org.apache.nutch.metadata.Nutch: Batch id to select.
ARG_CLASS - Static variable in interface org.apache.nutch.metadata.Nutch: Class to run as a NutchTool.
ARG_CRAWL - Static variable in interface org.apache.nutch.metadata.Nutch: Crawl id to use.
ARG_CURTIME - Static variable in interface org.apache.nutch.metadata.Nutch: The notion of current time.
ARG_DEPTH - Static variable in interface org.apache.nutch.metadata.Nutch: Depth (number of cycles) of a crawl.
ARG_FILTER - Static variable in interface org.apache.nutch.metadata.Nutch: Apply URLFilters.
ARG_FORCE - Static variable in interface org.apache.nutch.metadata.Nutch: Force processing even if there are locks or inconsistencies.
ARG_NORMALIZE - Static variable in interface org.apache.nutch.metadata.Nutch: Apply URLNormalizers.
ARG_NUMTASKS - Static variable in interface org.apache.nutch.metadata.Nutch: Number of fetcher tasks.
ARG_RESUME - Static variable in interface org.apache.nutch.metadata.Nutch: Resume previously aborted op.
ARG_SEEDDIR - Static variable in interface org.apache.nutch.metadata.Nutch: a path to a directory containing a list of seed URLs.
ARG_SEEDLIST - Static variable in interface org.apache.nutch.metadata.Nutch: Whitespace-separated list of seed URLs.
ARG_SOLR - Static variable in interface org.apache.nutch.metadata.Nutch: Solr URL.
ARG_SORT - Static variable in interface org.apache.nutch.metadata.Nutch: Sort statistics.
ARG_THREADS - Static variable in interface org.apache.nutch.metadata.Nutch: Number of fetcher threads (per map task).
ARG_TOPN - Static variable in interface org.apache.nutch.metadata.Nutch: Generate topN scoring URLs.
args - Variable in class org.apache.nutch.api.JobStatus
ARGS - Static variable in interface org.apache.nutch.api.Params
attrName - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
AUTHOR - Static variable in interface org.apache.nutch.metadata.Office
autoDetectClues(WebPage, boolean) - Method in class org.apache.nutch.util.EncodingDetector
AutomatonURLFilter - Class in org.apache.nutch.urlfilter.automaton: RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for Java^TM.
AutomatonURLFilter() - Constructor for class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
AutomatonURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
autoResolveContentType(String, String, byte[]) - Method in class org.apache.nutch.util.MimeUtil: A facade interface to trying all the possible mime type resolution strategies available within Tika.

B

BasicIndexingFilter - Class in org.apache.nutch.indexer.basic: Adds basic searchable fields to a document.
BasicIndexingFilter() - Constructor for class org.apache.nutch.indexer.basic.BasicIndexingFilter
BasicURLNormalizer - Class in org.apache.nutch.net.urlnormalizer.basic: Converts URLs to a normal form .
BasicURLNormalizer() - Constructor for class org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
BATCH_ID - Static variable in class org.apache.nutch.crawl.GeneratorJob
batchId - Variable in class org.apache.nutch.indexer.IndexerJob.IndexerMapper
Benchmark - Class in org.apache.nutch.tools
Benchmark() - Constructor for class org.apache.nutch.tools.Benchmark
benchmark(int, int, int, int, long, String) - Method in class org.apache.nutch.tools.Benchmark
Benchmark.BenchmarkResults - Class in org.apache.nutch.tools
Benchmark.BenchmarkResults() - Constructor for class org.apache.nutch.tools.Benchmark.BenchmarkResults
binarySearch(byte[][], byte[], int, int, RawComparator<byte[]>) - Static method in class org.apache.nutch.util.Bytes: Binary search for keys in indexes.
BLOCKED - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes: Thread was blocked http.max.delays times during fetching.
BlockedException - Exception in org.apache.nutch.protocol.http.api
BlockedException(String) - Constructor for exception org.apache.nutch.protocol.http.api.BlockedException
BOOST_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
BUFFER_SIZE - Static variable in class org.apache.nutch.protocol.http.api.HttpBase
Bytes - Class in org.apache.nutch.util: Utility class that handles byte arrays, conversions to/from other types, comparisons, hash code generation, manufacturing keys for HashMaps or HashSets, etc.
Bytes() - Constructor for class org.apache.nutch.util.Bytes
Bytes.ByteArrayComparator - Class in org.apache.nutch.util: Byte array comparator class.
Bytes.ByteArrayComparator() - Constructor for class org.apache.nutch.util.Bytes.ByteArrayComparator: Constructor
BYTES_COMPARATOR - Static variable in class org.apache.nutch.util.Bytes: Pass this to TreeMaps where byte [] are keys.
BYTES_RAWCOMPARATOR - Static variable in class org.apache.nutch.util.Bytes: Use comparing byte arrays, byte-by-byte
bytesToVint(byte[]) - Static method in class org.apache.nutch.util.Bytes

C

CACHING_FORBIDDEN_ALL - Static variable in interface org.apache.nutch.metadata.Nutch: Don't show either original forbidden content or summaries.
CACHING_FORBIDDEN_CONTENT - Static variable in interface org.apache.nutch.metadata.Nutch: Don't show original forbidden content, but show summaries.
CACHING_FORBIDDEN_KEY - Static variable in interface org.apache.nutch.metadata.Nutch: Sites may request that search engines don't provide access to cached documents.
CACHING_FORBIDDEN_KEY_UTF8 - Static variable in interface org.apache.nutch.metadata.Nutch
CACHING_FORBIDDEN_NONE - Static variable in interface org.apache.nutch.metadata.Nutch: Show both original forbidden content and summaries (default).
calculate(WebPage) - Method in class org.apache.nutch.crawl.MD5Signature
calculate(WebPage) - Method in class org.apache.nutch.crawl.Signature
calculate(WebPage) - Method in class org.apache.nutch.crawl.TextProfileSignature
calculateLastFetchTime(WebPage) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule: This method return the last fetch time of the CrawlDatum
calculateLastFetchTime(WebPage) - Method in interface org.apache.nutch.crawl.FetchSchedule: Calculates last fetch time of the given CrawlDatum.
canStop() - Method in class org.apache.nutch.api.NutchServer
CCIndexingFilter - Class in org.creativecommons.nutch: Adds basic searchable fields to a document.
CCIndexingFilter() - Constructor for class org.creativecommons.nutch.CCIndexingFilter
CCParseFilter - Class in org.creativecommons.nutch: Adds metadata identifying the Creative Commons license used, if any.
CCParseFilter() - Constructor for class org.creativecommons.nutch.CCParseFilter
CCParseFilter.Walker - Class in org.creativecommons.nutch: Walks DOM tree, looking for RDF in comments and licenses in anchors.
cdata(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of cdata.
CHAR_ENCODING_FOR_CONVERSION - Static variable in interface org.apache.nutch.metadata.Nutch
CHARACTER_COUNT - Static variable in interface org.apache.nutch.metadata.Office
characters(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of character data.
charactersRaw(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: If available, when the disable-output-escaping attribute is used, output raw text without escaping.
CHARSET_UTF8 - Static variable in class org.apache.nutch.parse.feed.FeedParser
CHECK_BLOCKING - Static variable in interface org.apache.nutch.protocol.Protocol: Property name.
CHECK_ROBOTS - Static variable in interface org.apache.nutch.protocol.Protocol: Property name.
checkClientTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
checkMark(WebPage) - Method in enum org.apache.nutch.storage.Mark
checkServerTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
childLen - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
children - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
childrenList - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
chooseRepr(String, String, boolean) - Static method in class org.apache.nutch.util.URLUtil: Given two urls, a src and a destination of a redirect, it returns the representative url.
CircularDependencyException - Exception in org.apache.nutch.plugin: CircularDependencyException will be thrown if a circular dependency is detected.
CircularDependencyException(Throwable) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
CircularDependencyException(String) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
cleanMimeType(String) - Static method in class org.apache.nutch.util.MimeUtil: Cleans a MimeType name by removing out the actual MimeType, from a string of the form:
cleanup(Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatCombiner
cleanup(Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatReducer
cleanup(Mapper<String, WebPage, String, NutchDocument>.Context) - Method in class org.apache.nutch.indexer.IndexerJob.IndexerMapper
cleanup(Reducer<Text, SolrDeleteDuplicates.SolrRecord, Text, SolrDeleteDuplicates.SolrRecord>.Context) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
clear() - Method in class org.apache.nutch.metadata.Metadata: Remove all mappings from metadata.
clearClues() - Method in class org.apache.nutch.util.EncodingDetector: Clears all clues.
Client - Class in org.apache.nutch.protocol.ftp: Client.java encapsulates functionalities necessary for nutch to get dir list and retrieve file from an FTP server.
Client() - Constructor for class org.apache.nutch.protocol.ftp.Client
close() - Method in class org.apache.nutch.api.DbReader
close() - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatMapper
close() - Method in class org.apache.nutch.host.HostDb
close() - Method in interface org.apache.nutch.indexer.NutchIndexWriter
close() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
close() - Method in class org.apache.nutch.indexer.solr.SolrWriter
close() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Closes the record reader resources.
close() - Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsMapper
closeReaders(SequenceFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils: Closes a group of SequenceFile readers.
closeReaders(MapFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils: Closes a group of MapFile readers.
CMD - Static variable in interface org.apache.nutch.api.Params
CollectionManager - Class in org.apache.nutch.collection
CollectionManager(Configuration) - Constructor for class org.apache.nutch.collection.CollectionManager
CollectionManager() - Constructor for class org.apache.nutch.collection.CollectionManager: Used for testing
CommandRunner - Class in org.apache.nutch.util
CommandRunner() - Constructor for class org.apache.nutch.util.CommandRunner
comment(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Report an XML comment anywhere in the document.
COMMENTS - Static variable in interface org.apache.nutch.metadata.Office
COMMIT_SIZE - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
compare(byte[], byte[]) - Static method in class org.apache.nutch.crawl.SignatureComparator
compare(UrlWithScore, UrlWithScore) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator
compare(UrlWithScore, UrlWithScore) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator.UrlOnlyComparator
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator.UrlOnlyComparator
compare(byte[], byte[]) - Method in class org.apache.nutch.util.Bytes.ByteArrayComparator
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.util.Bytes.ByteArrayComparator
compareTo(GeneratorJob.SelectorEntry) - Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
compareTo(UrlWithScore) - Method in class org.apache.nutch.crawl.UrlWithScore
compareTo(byte[], byte[]) - Static method in class org.apache.nutch.util.Bytes
compareTo(byte[], int, int, byte[], int, int) - Static method in class org.apache.nutch.util.Bytes: Lexographically compare two arrays.
compareTo(TrieStringMatcher.TrieNode) - Method in class org.apache.nutch.util.TrieStringMatcher.TrieNode
conf - Variable in class org.apache.nutch.plugin.Plugin
conf - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
CONF_ID - Static variable in interface org.apache.nutch.api.Params
confId - Variable in class org.apache.nutch.api.JobStatus
ConfManager - Interface in org.apache.nutch.api
confMgr - Static variable in class org.apache.nutch.api.NutchApp
ConfResource - Class in org.apache.nutch.api
ConfResource() - Constructor for class org.apache.nutch.api.ConfResource
contains(String) - Method in class org.apache.nutch.storage.Host
Content - Class in org.apache.nutch.protocol
Content() - Constructor for class org.apache.nutch.protocol.Content
Content(String, String, byte[], String, Metadata, Configuration) - Constructor for class org.apache.nutch.protocol.Content
Content(String, String, byte[], String, Metadata, MimeUtil) - Constructor for class org.apache.nutch.protocol.Content
CONTENT_DISPOSITION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_ENCODING - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_LANGUAGE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_LENGTH - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_LOCATION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_MD5 - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_TYPE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_TYPE_UTF8 - Static variable in class org.apache.nutch.util.EncodingDetector
CONTRIBUTOR - Static variable in interface org.apache.nutch.metadata.DublinCore: An entity responsible for making contributions to the content of the resource.
COVERAGE - Static variable in interface org.apache.nutch.metadata.DublinCore: The extent or scope of the content of the resource.
CRAWL_ID - Static variable in interface org.apache.nutch.api.Params
CRAWL_ID_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
CRAWLDB_ADDITIONS_ALLOWED - Static variable in class org.apache.nutch.crawl.DbUpdateReducer
Crawler - Class in org.apache.nutch.crawl
Crawler() - Constructor for class org.apache.nutch.crawl.Crawler
CrawlStatus - Class in org.apache.nutch.crawl
CrawlStatus() - Constructor for class org.apache.nutch.crawl.CrawlStatus
create(String, Map<String, String>, boolean) - Method in interface org.apache.nutch.api.ConfManager
create(Map<String, Object>) - Method in class org.apache.nutch.api.ConfResource
create(String, Map<String, String>, boolean) - Method in class org.apache.nutch.api.impl.RAMConfManager
create(String, JobManager.JobType, String, Map<String, Object>) - Method in class org.apache.nutch.api.impl.RAMJobManager
create(String, JobManager.JobType, String, Map<String, Object>) - Method in interface org.apache.nutch.api.JobManager
create(Map<String, Object>) - Method in class org.apache.nutch.api.JobResource
create() - Static method in class org.apache.nutch.util.NutchConfiguration: Create a Configuration for Nutch.
create(boolean, Properties) - Static method in class org.apache.nutch.util.NutchConfiguration: Create a Configuration from supplied properties.
createInboundRoot() - Method in class org.apache.nutch.api.NutchApp: Creates a root Restlet that will receive all incoming calls.
createIndexJob(Configuration, String, String) - Method in class org.apache.nutch.indexer.IndexerJob
createKey() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Creates a new instance of the Text object for the key.
createLockFile(FileSystem, Path, boolean) - Static method in class org.apache.nutch.util.LockUtil: Create a lock file.
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase: Creates a new RegexRule.
createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
createSocket(String, int, InetAddress, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory: Attempts to get a new socket connection to the given host within the given time limit.
createSocket(String, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
createSocket(Socket, String, int, boolean) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
createSubCollection(String, String) - Method in class org.apache.nutch.collection.CollectionManager: Create a new subcollection.
createValue() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Creates a new instance of the BytesWritable object for the key
createWebStore(Configuration, Class<K>, Class<V>) - Static method in class org.apache.nutch.storage.StorageUtils: Creates a store for the given persistentClass.
CreativeCommons - Interface in org.apache.nutch.metadata: A collection of Creative Commons properties names.
CREATOR - Static variable in interface org.apache.nutch.metadata.DublinCore: An entity primarily responsible for making the content of the resource.
currentJob - Variable in class org.apache.nutch.util.NutchTool
currentJobNum - Variable in class org.apache.nutch.util.NutchTool

D

DATE - Static variable in interface org.apache.nutch.metadata.DublinCore: A date associated with an event in the life cycle of the resource.
dateFormatStr - Static variable in class org.apache.nutch.indexer.feed.FeedIndexingFilter
DbReader - Class in org.apache.nutch.api
DbReader(Configuration, String) - Constructor for class org.apache.nutch.api.DbReader
DbResource - Class in org.apache.nutch.api
DbResource() - Constructor for class org.apache.nutch.api.DbResource
DbUpdateMapper - Class in org.apache.nutch.crawl
DbUpdateMapper() - Constructor for class org.apache.nutch.crawl.DbUpdateMapper
DbUpdateReducer - Class in org.apache.nutch.crawl
DbUpdateReducer() - Constructor for class org.apache.nutch.crawl.DbUpdateReducer
DbUpdaterJob - Class in org.apache.nutch.crawl
DbUpdaterJob() - Constructor for class org.apache.nutch.crawl.DbUpdaterJob
DbUpdaterJob(Configuration) - Constructor for class org.apache.nutch.crawl.DbUpdaterJob
debug - Variable in class org.apache.nutch.tools.proxy.AbstractTestbedHandler
dedup(String) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
DEFAULT_BOOST - Static variable in class org.apache.nutch.util.domain.DomainSuffix
DEFAULT_CONF - Static variable in class org.apache.nutch.api.ConfResource
DEFAULT_DELAY - Static variable in class org.apache.nutch.tools.proxy.DelayHandler
DEFAULT_FILE_NAME - Static variable in class org.apache.nutch.collection.CollectionManager
DEFAULT_HOSTDB_CONCURRENCY_LEVEL - Static variable in class org.apache.nutch.host.HostDb
DEFAULT_LRU_SIZE - Static variable in class org.apache.nutch.host.HostDb
DEFAULT_PLUGIN - Static variable in class org.apache.nutch.parse.ParserFactory: Wildcard for default plugins.
DEFAULT_STATUS - Static variable in class org.apache.nutch.util.domain.DomainSuffix
DefaultFetchSchedule - Class in org.apache.nutch.crawl: This class implements the default re-fetch schedule.
DefaultFetchSchedule() - Constructor for class org.apache.nutch.crawl.DefaultFetchSchedule
defaultInterval - Variable in class org.apache.nutch.crawl.AbstractFetchSchedule
deflate(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils: Returns a deflated copy of the input array.
DeflateUtils - Class in org.apache.nutch.util: A collection of utility methods for working on deflated data.
DeflateUtils() - Constructor for class org.apache.nutch.util.DeflateUtils
DelayHandler - Class in org.apache.nutch.tools.proxy
DelayHandler(int) - Constructor for class org.apache.nutch.tools.proxy.DelayHandler
delete(String) - Method in interface org.apache.nutch.api.ConfManager
delete(String) - Method in class org.apache.nutch.api.impl.RAMConfManager
deleteMeta(String) - Method in class org.apache.nutch.scoring.ScoreDatum
deleteSubCollection(String) - Method in class org.apache.nutch.collection.CollectionManager: Delete named subcollection
DESCR - Static variable in class org.apache.nutch.api.AdminResource
DESCR - Static variable in class org.apache.nutch.api.ConfResource
DESCR - Static variable in class org.apache.nutch.api.DbResource
DESCR - Static variable in class org.apache.nutch.api.JobResource
DESCRIPTION - Static variable in interface org.apache.nutch.metadata.DublinCore: An account of the content of the resource.
DIGEST_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
DIR_NAME - Static variable in class org.apache.nutch.protocol.Content
disconnect() - Method in class org.apache.nutch.protocol.ftp.Client: Closes the connection to the FTP server and restores connection parameters to the default values.
distributeScoreToOutlinks(String, WebPage, Collection<ScoreDatum>, int) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
distributeScoreToOutlinks(String, WebPage, Collection<ScoreDatum>, int) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Get cash on hand, divide it by the number of outlinks and apply.
distributeScoreToOutlinks(String, WebPage, Collection<ScoreDatum>, int) - Method in interface org.apache.nutch.scoring.ScoringFilter: Distribute score value from the current page to all its outlinked pages.
distributeScoreToOutlinks(String, WebPage, Collection<ScoreDatum>, int) - Method in class org.apache.nutch.scoring.ScoringFilters
distributeScoreToOutlinks(String, WebPage, Collection<ScoreDatum>, int) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
DmozParser - Class in org.apache.nutch.tools: Utility that converts DMOZ RDF into a flat file of URLs to be injected.
DmozParser() - Constructor for class org.apache.nutch.tools.DmozParser
doFilter(ServletRequest, ServletResponse, FilterChain) - Method in class org.apache.nutch.tools.proxy.LogDebugHandler
doInit() - Method in class org.apache.nutch.api.DbResource
DomainStatistics - Class in org.apache.nutch.util.domain: Extracts some very basic statistics about domains from the crawldb
DomainStatistics() - Constructor for class org.apache.nutch.util.domain.DomainStatistics
DomainStatistics.DomainStatisticsCombiner - Class in org.apache.nutch.util.domain
DomainStatistics.DomainStatisticsCombiner() - Constructor for class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsCombiner
DomainStatistics.DomainStatisticsMapper - Class in org.apache.nutch.util.domain
DomainStatistics.DomainStatisticsMapper() - Constructor for class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsMapper
DomainStatistics.DomainStatisticsReducer - Class in org.apache.nutch.util.domain
DomainStatistics.DomainStatisticsReducer() - Constructor for class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsReducer
DomainStatistics.MyCounter - Enum in org.apache.nutch.util.domain
DomainSuffix - Class in org.apache.nutch.util.domain: This class represents the last part of the host name, which is operated by authoritives, not individuals.
DomainSuffix(String, DomainSuffix.Status, float) - Constructor for class org.apache.nutch.util.domain.DomainSuffix
DomainSuffix(String) - Constructor for class org.apache.nutch.util.domain.DomainSuffix
DomainSuffix.Status - Enum in org.apache.nutch.util.domain: Enumeration of the status of the tld.
DomainSuffixes - Class in org.apache.nutch.util.domain: Storage class for DomainSuffix objects Note: this class is singleton
DomainURLFilter - Class in org.apache.nutch.urlfilter.domain: Filters URLs based on a file containing domain suffixes, domain names, and hostnames.
DomainURLFilter() - Constructor for class org.apache.nutch.urlfilter.domain.DomainURLFilter: Default constructor.
DomainURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.domain.DomainURLFilter: Constructor that specifies the domain file to use.
DOMBuilder - Class in org.apache.nutch.parse.html: This class takes SAX events (in addition to some extra events that SAX doesn't handle yet) and adds the result to a document or document fragment.
DOMBuilder(Document, Node) - Constructor for class org.apache.nutch.parse.html.DOMBuilder: DOMBuilder instance constructor...
DOMBuilder(Document, DocumentFragment) - Constructor for class org.apache.nutch.parse.html.DOMBuilder: DOMBuilder instance constructor...
DOMBuilder(Document) - Constructor for class org.apache.nutch.parse.html.DOMBuilder: DOMBuilder instance constructor...
DOMContentUtils - Class in org.apache.nutch.parse.html: A collection of methods for extracting content from DOM trees.
DOMContentUtils(Configuration) - Constructor for class org.apache.nutch.parse.html.DOMContentUtils
DOMContentUtils - Class in org.apache.nutch.parse.tika: A collection of methods for extracting content from DOM trees.
DOMContentUtils(Configuration) - Constructor for class org.apache.nutch.parse.tika.DOMContentUtils
DOMContentUtils.LinkParams - Class in org.apache.nutch.parse.html
DOMContentUtils.LinkParams(String, String, int) - Constructor for class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
DomUtil - Class in org.apache.nutch.util
DomUtil() - Constructor for class org.apache.nutch.util.DomUtil
DublinCore - Interface in org.apache.nutch.metadata: A collection of Dublin Core metadata names.
DummySSLProtocolSocketFactory - Class in org.apache.nutch.protocol.httpclient
DummySSLProtocolSocketFactory() - Constructor for class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory: Constructor for DummySSLProtocolSocketFactory.
DummyX509TrustManager - Class in org.apache.nutch.protocol.httpclient
DummyX509TrustManager(KeyStore) - Constructor for class org.apache.nutch.protocol.httpclient.DummyX509TrustManager: Constructor for DummyX509TrustManager.

E

elapsedTime(long, long) - Static method in class org.apache.nutch.util.TimingUtil: Calculate the elapsed time between two times specified in milliseconds.
elName - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
EMPTY_BYTE_ARRAY - Static variable in class org.apache.nutch.util.Bytes: An empty instance.
EmptyRobotRules - Class in org.apache.nutch.protocol
EmptyRobotRules() - Constructor for class org.apache.nutch.protocol.EmptyRobotRules
encode(String) - Static method in class org.apache.nutch.html.Entities
EncodingDetector - Class in org.apache.nutch.util: A simple class for detecting character encodings.
EncodingDetector(Configuration) - Constructor for class org.apache.nutch.util.EncodingDetector
endCDATA() - Method in class org.apache.nutch.parse.html.DOMBuilder: Report the end of a CDATA section.
endDocument() - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of the end of a document.
endDTD() - Method in class org.apache.nutch.parse.html.DOMBuilder: Report the end of DTD declarations.
endElement(String, String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of the end of an element.
endEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder: Report the end of an entity.
endPrefixMapping(String) - Method in class org.apache.nutch.parse.html.DOMBuilder: End the scope of a prefix-URI mapping.
Entities - Class in org.apache.nutch.html
Entities() - Constructor for class org.apache.nutch.html.Entities
entityReference(String) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notivication of a entityReference.
equals(Object) - Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
equals(Object) - Method in class org.apache.nutch.metadata.Metadata
equals(Object) - Method in class org.apache.nutch.parse.Outlink
equals(Object) - Method in class org.apache.nutch.protocol.Content
equals(Object) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
equals(byte[], byte[]) - Static method in class org.apache.nutch.util.Bytes
ESTIMATED_HEAP_TAX - Static variable in class org.apache.nutch.util.Bytes: Estimate of size cost to pay beyond payload in jvm for instance of byte [].
evaluate() - Method in class org.apache.nutch.util.CommandRunner
EXCEPTION - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes: Unspecified exception occured.
exec() - Method in class org.apache.nutch.util.CommandRunner
execute() - Method in class org.apache.nutch.api.AdminResource
Extension - Class in org.apache.nutch.plugin: An Extension is a kind of listener descriptor that will be installed on a concrete ExtensionPoint that acts as kind of Publisher.
Extension(PluginDescriptor, String, String, String, Configuration, PluginRepository) - Constructor for class org.apache.nutch.plugin.Extension
ExtensionPoint - Class in org.apache.nutch.plugin: The ExtensionPoint provide meta information of a extension point.
ExtensionPoint(String, String, String) - Constructor for class org.apache.nutch.plugin.ExtensionPoint: Constructor
ExtParser - Class in org.apache.nutch.parse.ext: A wrapper that invokes external command to do real parsing job.
ExtParser() - Constructor for class org.apache.nutch.parse.ext.ExtParser
extractText(InputStream, String, List) - Method in class org.apache.nutch.parse.zip.ZipTextExtractor

F

FAILED - Static variable in interface org.apache.nutch.parse.ParseStatusCodes: General failure.
FAILED - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes: Content was not retrieved.
FAILED_EXCEPTION - Static variable in interface org.apache.nutch.parse.ParseStatusCodes: Parsing failed.
FAILED_INVALID_FORMAT - Static variable in interface org.apache.nutch.parse.ParseStatusCodes: Parsing failed.
FAILED_MISSING_CONTENT - Static variable in interface org.apache.nutch.parse.ParseStatusCodes: Parsing failed.
FAILED_MISSING_PARTS - Static variable in interface org.apache.nutch.parse.ParseStatusCodes: Parsing failed.
FAILED_TRUNCATED - Static variable in interface org.apache.nutch.parse.ParseStatusCodes: Parsing failed.
FakeHandler - Class in org.apache.nutch.tools.proxy
FakeHandler(FakeHandler.Mode, FakeHandler.Mode, int, int, int, int) - Constructor for class org.apache.nutch.tools.proxy.FakeHandler: Create fake pages.
FakeHandler.Mode - Enum in org.apache.nutch.tools.proxy: Create links to hosts generated from a pool of numHosts/numPages random names.
Feed - Interface in org.apache.nutch.metadata: A collection of Feed property names extracted by the ROME library.
FEED - Static variable in interface org.apache.nutch.metadata.Feed
FEED_AUTHOR - Static variable in interface org.apache.nutch.metadata.Feed
FEED_PUBLISHED - Static variable in interface org.apache.nutch.metadata.Feed
FEED_TAGS - Static variable in interface org.apache.nutch.metadata.Feed
FEED_UPDATED - Static variable in interface org.apache.nutch.metadata.Feed
FeedIndexingFilter - Class in org.apache.nutch.indexer.feed
FeedIndexingFilter() - Constructor for class org.apache.nutch.indexer.feed.FeedIndexingFilter
FeedParser - Class in org.apache.nutch.parse.feed
FeedParser() - Constructor for class org.apache.nutch.parse.feed.FeedParser
fetch(String, int, boolean, int) - Method in class org.apache.nutch.fetcher.FetcherJob: Run fetcher.
FETCH_STATUS_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
FETCH_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
FetchEntry - Class in org.apache.nutch.fetcher
FetchEntry() - Constructor for class org.apache.nutch.fetcher.FetchEntry
FetchEntry(Configuration, String, WebPage) - Constructor for class org.apache.nutch.fetcher.FetchEntry
FetcherJob - Class in org.apache.nutch.fetcher: Multi-threaded fetcher.
FetcherJob() - Constructor for class org.apache.nutch.fetcher.FetcherJob
FetcherJob(Configuration) - Constructor for class org.apache.nutch.fetcher.FetcherJob
FetcherJob.FetcherMapper - Class in org.apache.nutch.fetcher: Mapper class for Fetcher.
FetcherJob.FetcherMapper() - Constructor for class org.apache.nutch.fetcher.FetcherJob.FetcherMapper
FetcherReducer - Class in org.apache.nutch.fetcher
FetcherReducer() - Constructor for class org.apache.nutch.fetcher.FetcherReducer
FetchSchedule - Interface in org.apache.nutch.crawl: This interface defines the contract for implementations that manipulate fetch times and re-fetch intervals.
FetchScheduleFactory - Class in org.apache.nutch.crawl: Creates and caches a FetchSchedule implementation.
FIELD - Static variable in class org.creativecommons.nutch.CCIndexingFilter: The name of the document field we use.
FIELD_NAME - Static variable in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter: Doc field name
FieldPluggable - Interface in org.apache.nutch.plugin
File - Class in org.apache.nutch.protocol.file: File.java deals with file: scheme.
File() - Constructor for class org.apache.nutch.protocol.file.File
FileError - Exception in org.apache.nutch.protocol.file: Thrown for File error codes.
FileError(int) - Constructor for exception org.apache.nutch.protocol.file.FileError
FileException - Exception in org.apache.nutch.protocol.file
FileException() - Constructor for exception org.apache.nutch.protocol.file.FileException
FileException(String) - Constructor for exception org.apache.nutch.protocol.file.FileException
FileException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.file.FileException
FileException(Throwable) - Constructor for exception org.apache.nutch.protocol.file.FileException
fileLen - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
FileResponse - Class in org.apache.nutch.protocol.file: FileResponse.java mimics file replies as http response.
FileResponse(URL, WebPage, File, Configuration) - Constructor for class org.apache.nutch.protocol.file.FileResponse
filter(String, WebPage, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser: Scan the HTML document looking at possible indications of content language
1.
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
filter(String) - Method in class org.apache.nutch.collection.Subcollection: Simple "indexOf" currentFilter for matching patterns.
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.feed.FeedIndexingFilter: Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch index.
filter(NutchDocument, String, WebPage) - Method in interface org.apache.nutch.indexer.IndexingFilter: Adds fields or otherwise modifies the document that will be indexed for a parse.
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.indexer.IndexingFilters: Run all defined filters.
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
filter(NutchDocument, String, WebPage) - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
filter(String, WebPage, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.microformats.reltag.RelTagParser
filter(String) - Method in interface org.apache.nutch.net.URLFilter
filter(String) - Method in class org.apache.nutch.net.URLFilters: Run all defined filters.
filter(String, WebPage, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.js.JSParseFilter
filter(String, WebPage, Parse, HTMLMetaTags, DocumentFragment) - Method in interface org.apache.nutch.parse.ParseFilter: Adds metadata or otherwise modifies a parse, given the DOM tree of a page.
filter(String, WebPage, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.ParseFilters: Run all defined filters.
filter(String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter(String) - Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
filter(String) - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
filter(String) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
filter(String) - Method in class org.apache.nutch.urlfilter.validator.UrlValidator
filter(NutchDocument, String, WebPage) - Method in class org.creativecommons.nutch.CCIndexingFilter
filter(String, WebPage, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.creativecommons.nutch.CCParseFilter: Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
finalize() - Method in class org.apache.nutch.plugin.Plugin
finalize() - Method in class org.apache.nutch.plugin.PluginRepository
finalize() - Method in class org.apache.nutch.protocol.ftp.Ftp
findAuthentication(Metadata) - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
FORCE - Static variable in interface org.apache.nutch.api.Params
forceRefetch(String, WebPage, boolean) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule: This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching.
forceRefetch(String, WebPage, boolean) - Method in interface org.apache.nutch.crawl.FetchSchedule: This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching.
FORMAT - Static variable in interface org.apache.nutch.metadata.DublinCore: Typically, Format may include the media-type or dimensions of the resource.
format - Static variable in class org.apache.nutch.net.protocols.HttpDateFormat
forName(String) - Method in class org.apache.nutch.util.MimeUtil: A facade interface to Tika's underlying MimeTypes.forName(String) method.
fromHexString(String) - Static method in class org.apache.nutch.util.StringUtil: Convert a String containing consecutive (no inside whitespace) hexadecimal digits into a corresponding byte array.
FSUtils - Class in org.apache.nutch.util: Utility methods for common filesystem operations.
FSUtils() - Constructor for class org.apache.nutch.util.FSUtils
Ftp - Class in org.apache.nutch.protocol.ftp: Ftp.java deals with ftp: scheme.
Ftp() - Constructor for class org.apache.nutch.protocol.ftp.Ftp
FtpError - Exception in org.apache.nutch.protocol.ftp: Thrown for Ftp error codes.
FtpError(int) - Constructor for exception org.apache.nutch.protocol.ftp.FtpError
FtpException - Exception in org.apache.nutch.protocol.ftp: Superclass for important exceptions thrown during FTP talk, that must be handled with care.
FtpException() - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpException(String) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpException(Throwable) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpExceptionBadSystResponse - Exception in org.apache.nutch.protocol.ftp: Exception indicating bad reply of SYST command.
FtpExceptionCanNotHaveDataConnection - Exception in org.apache.nutch.protocol.ftp: Exception indicating failure of opening data connection.
FtpExceptionControlClosedByForcedDataClose - Exception in org.apache.nutch.protocol.ftp: Exception indicating control channel is closed by server end, due to forced closure of data channel at client (our) end.
FtpExceptionUnknownForcedDataClose - Exception in org.apache.nutch.protocol.ftp: Exception indicating unrecognizable reply from server after forced closure of data channel by client (our) side.
FtpResponse - Class in org.apache.nutch.protocol.ftp: FtpResponse.java mimics ftp replies as http response.
FtpResponse(URL, WebPage, Ftp, Configuration) - Constructor for class org.apache.nutch.protocol.ftp.FtpResponse

G

generate(long, long, boolean, boolean) - Method in class org.apache.nutch.crawl.GeneratorJob: Mark URLs ready for fetching.
GENERATE_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
GENERATE_UPDATE_CRAWLDB - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_COUNT_MODE - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_COUNT_VALUE_DOMAIN - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_COUNT_VALUE_HOST - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_COUNT_VALUE_IP - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_CUR_TIME - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_DELAY - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_FILTER - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_MAX_COUNT - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_MIN_SCORE - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_NORMALISE - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_RANDOM_SEED - Static variable in class org.apache.nutch.crawl.GeneratorJob
GENERATOR_TOP_N - Static variable in class org.apache.nutch.crawl.GeneratorJob
GeneratorJob - Class in org.apache.nutch.crawl
GeneratorJob() - Constructor for class org.apache.nutch.crawl.GeneratorJob
GeneratorJob(Configuration) - Constructor for class org.apache.nutch.crawl.GeneratorJob
GeneratorJob.SelectorEntry - Class in org.apache.nutch.crawl
GeneratorJob.SelectorEntry() - Constructor for class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
GeneratorJob.SelectorEntry(String, float) - Constructor for class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
GeneratorJob.SelectorEntryComparator - Class in org.apache.nutch.crawl
GeneratorJob.SelectorEntryComparator() - Constructor for class org.apache.nutch.crawl.GeneratorJob.SelectorEntryComparator
GeneratorMapper - Class in org.apache.nutch.crawl
GeneratorMapper() - Constructor for class org.apache.nutch.crawl.GeneratorMapper
GeneratorReducer - Class in org.apache.nutch.crawl: Reduce class for generate The #reduce() method write a random integer to all generated URLs.
GeneratorReducer() - Constructor for class org.apache.nutch.crawl.GeneratorReducer
generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Use WebPage.getScore().
generatorSortValue(String, WebPage, float) - Method in interface org.apache.nutch.scoring.ScoringFilter: This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.
generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.ScoringFilters: Calculate a sort value for Generate.
generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
GenericWritableConfigurable - Class in org.apache.nutch.util: A generic Writable wrapper that can inject Configuration to Configurables
GenericWritableConfigurable() - Constructor for class org.apache.nutch.util.GenericWritableConfigurable
get(String) - Method in interface org.apache.nutch.api.ConfManager
get(Variant) - Method in class org.apache.nutch.api.DbResource
get(String) - Method in class org.apache.nutch.api.impl.RAMConfManager
get(String, String) - Method in class org.apache.nutch.api.impl.RAMJobManager
get(String, String) - Method in interface org.apache.nutch.api.JobManager
get(String) - Method in class org.apache.nutch.host.HostDb
get(String) - Method in class org.apache.nutch.metadata.Metadata: Get the value associated to a metadata name.
get(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
get(Configuration) - Static method in class org.apache.nutch.plugin.PluginRepository
get(int) - Method in class org.apache.nutch.storage.Host
get(int) - Method in class org.apache.nutch.storage.ParseStatus
get(int) - Method in class org.apache.nutch.storage.ProtocolStatus
get(int) - Method in class org.apache.nutch.storage.WebPage
get(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes: Return the DomainSuffix object for the extension, if extension is a top level domain returned object will be an instance of TopLevelDomain
get(Configuration) - Static method in class org.apache.nutch.util.ObjectCache
getAccept() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getAcceptedIssuers() - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
getAcceptLanguage() - Method in class org.apache.nutch.protocol.http.api.HttpBase: Value of "Accept-Language" request header sent by Nutch.
getAliases() - Method in class org.apache.nutch.parse.ParsePluginList
getAll() - Method in class org.apache.nutch.collection.CollectionManager: Returns all collections
getAnchor() - Method in class org.apache.nutch.parse.Outlink
getAnchor() - Method in class org.apache.nutch.scoring.ScoreDatum
getArg(ParseStatus, int) - Static method in class org.apache.nutch.parse.ParseStatusUtils
getArgs() - Method in class org.apache.nutch.storage.ParseStatus
getArgs() - Method in class org.apache.nutch.storage.ProtocolStatus
getAsMap(String) - Method in interface org.apache.nutch.api.ConfManager
getAsMap(String) - Method in class org.apache.nutch.api.impl.RAMConfManager
getAttribute(String) - Method in class org.apache.nutch.plugin.Extension: Returns a attribute value, that is setuped in the manifest file and is definied by the extension point xml schema.
getAuthentication(String, Configuration) - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: This method is responsible for providing Basic authentication information.
getBase(Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: If Node contains a BASE tag then it's HREF is returned.
getBaseHref() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getBaseUrl() - Method in class org.apache.nutch.protocol.Content: The base url for relative links contained in the content.
getBaseUrl() - Method in class org.apache.nutch.storage.WebPage
getBasicPattern() - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Provides a pattern which can be used by an outside resource to determine if this class can provide credentials based on simple header information.
getBlackListString() - Method in class org.apache.nutch.collection.Subcollection: Returns blacklist String
getBoost() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
getBoost() - Method in class org.apache.nutch.util.domain.DomainSuffix
getByHostName(String) - Method in class org.apache.nutch.host.HostDb
getClassLoader() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a cached classloader for a plugin.
getClazz() - Method in class org.apache.nutch.plugin.Extension: Returns the full class name of the extension point implementation
getCode() - Method in interface org.apache.nutch.net.protocols.Response: Returns the response code.
getCode(int) - Method in exception org.apache.nutch.protocol.file.FileError
getCode() - Method in class org.apache.nutch.protocol.file.FileResponse: Returns the response code.
getCode(int) - Method in exception org.apache.nutch.protocol.ftp.FtpError
getCode() - Method in class org.apache.nutch.protocol.ftp.FtpResponse: Returns the response code.
getCode() - Method in class org.apache.nutch.protocol.http.HttpResponse
getCode() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getCode() - Method in class org.apache.nutch.storage.ProtocolStatus
getCollectionManager(Configuration) - Static method in class org.apache.nutch.collection.CollectionManager
getCommand() - Method in class org.apache.nutch.util.CommandRunner
getConf() - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
getConf() - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
getConf() - Method in class org.apache.nutch.crawl.URLPartitioner.FetchEntryPartitioner
getConf() - Method in class org.apache.nutch.crawl.URLPartitioner
getConf() - Method in class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
getConf() - Method in class org.apache.nutch.host.HostDbUpdateJob
getConf() - Method in class org.apache.nutch.host.HostInjectorJob
getConf() - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
getConf() - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
getConf() - Method in class org.apache.nutch.indexer.feed.FeedIndexingFilter
getConf() - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
getConf() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
getConf() - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagParser
getConf() - Method in class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
getConf() - Method in class org.apache.nutch.parse.ext.ExtParser
getConf() - Method in class org.apache.nutch.parse.feed.FeedParser
getConf() - Method in class org.apache.nutch.parse.html.HtmlParser
getConf() - Method in class org.apache.nutch.parse.js.JSParseFilter
getConf() - Method in class org.apache.nutch.parse.ParserChecker
getConf() - Method in class org.apache.nutch.parse.ParserJob
getConf() - Method in class org.apache.nutch.parse.ParseUtil
getConf() - Method in class org.apache.nutch.parse.swf.SWFParser
getConf() - Method in class org.apache.nutch.parse.tika.TikaParser
getConf() - Method in class org.apache.nutch.parse.zip.ZipParser
getConf() - Method in class org.apache.nutch.protocol.file.File
getConf() - Method in class org.apache.nutch.protocol.ftp.Ftp
getConf() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getConf() - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
getConf() - Method in class org.apache.nutch.protocol.sftp.Sftp
getConf() - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
getConf() - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
getConf() - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
getConf() - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
getConf() - Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
getConf() - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
getConf() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
getConf() - Method in class org.apache.nutch.urlfilter.validator.UrlValidator
getConf() - Method in class org.apache.nutch.util.domain.DomainStatistics
getConf() - Method in class org.apache.nutch.util.GenericWritableConfigurable
getConf() - Method in class org.creativecommons.nutch.CCIndexingFilter
getConf() - Method in class org.creativecommons.nutch.CCParseFilter
getContent() - Method in interface org.apache.nutch.net.protocols.Response: Returns the full content of the response.
getContent() - Method in class org.apache.nutch.protocol.Content: The binary content retrieved.
getContent() - Method in class org.apache.nutch.protocol.file.FileResponse
getContent() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
getContent() - Method in class org.apache.nutch.protocol.http.HttpResponse
getContent() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getContent() - Method in class org.apache.nutch.protocol.ProtocolOutput
getContent() - Method in class org.apache.nutch.storage.WebPage
getContentType() - Method in exception org.apache.nutch.parse.ParserNotFound
getContentType() - Method in class org.apache.nutch.protocol.Content: The media type of the retrieved content.
getContentType() - Method in class org.apache.nutch.storage.WebPage
getCopyMap() - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
getCount(E) - Method in class org.apache.nutch.util.Histogram
getCountryName() - Method in class org.apache.nutch.util.domain.TopLevelDomain: Returns the country name if TLD is Country Code TLD
getCrawlDelay() - Method in class org.apache.nutch.protocol.EmptyRobotRules
getCrawlDelay(HttpBase, URL) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
getCrawlDelay() - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Get Crawl-Delay, in milliseconds.
getCrawlDelay() - Method in interface org.apache.nutch.protocol.RobotRules: Get Crawl-Delay, in milliseconds.
getCredentials() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication: Gets the credentials generated by the HttpAuthentication object.
getCredentials() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Gets the Basic credentials generated by this HttpBasicAuthentication object
getCurrentKey() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
getCurrentNode() - Method in class org.apache.nutch.parse.html.DOMBuilder: Get the node currently being processed.
getCurrentValue() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
getDataStoreClass(Configuration) - Static method in class org.apache.nutch.storage.StorageUtils
getDatum() - Method in class org.apache.nutch.crawl.URLWebPage
getDefaultConfig() - Static method in class org.apache.nutch.parse.tika.TikaConfig: Provides a default configuration (TikaConfig).
getDefaultConfig(Parser) - Static method in class org.apache.nutch.parse.tika.TikaConfig: Deprecated. This method will be removed in Apache Tika 1.0
getDependencies() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array of plugin ids.
getDescriptor() - Method in class org.apache.nutch.plugin.Extension: return the plugin descriptor.
getDescriptor() - Method in class org.apache.nutch.plugin.Plugin: Returns the plugin descriptor
getDocBegin() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
getDocumentMeta() - Method in class org.apache.nutch.indexer.NutchDocument
getDom(InputStream) - Static method in class org.apache.nutch.util.DomUtil: Returns parsed dom tree or null if any error
getDomain() - Method in class org.apache.nutch.util.domain.DomainSuffix
getDomainName(URL) - Static method in class org.apache.nutch.util.URLUtil: Returns the domain name of the url.
getDomainName(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the domain name of the url.
getDomainSuffix(URL) - Static method in class org.apache.nutch.util.URLUtil: Returns the DomainSuffix corresponding to the last public part of the hostname
getDomainSuffix(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the DomainSuffix corresponding to the last public part of the hostname
getEmptyParse(Exception, Configuration) - Static method in class org.apache.nutch.parse.ParseStatusUtils
getEmptyParse(int, String, Configuration) - Static method in class org.apache.nutch.parse.ParseStatusUtils
getExitValue() - Method in class org.apache.nutch.util.CommandRunner
getExpireTime() - Method in class org.apache.nutch.protocol.EmptyRobotRules
getExpireTime() - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Get expire time
getExpireTime() - Method in interface org.apache.nutch.protocol.RobotRules: Get expire time
getExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array exported librareis as URLs
getExtensionInstance() - Method in class org.apache.nutch.plugin.Extension: Return an instance of the extension implementatio.
getExtensionPoint(String) - Method in class org.apache.nutch.plugin.PluginRepository: Returns a extension point indentified by a extension point id.
getExtensions(String) - Method in class org.apache.nutch.parse.ParserFactory: Finds the best-suited parse plugin for a given contentType.
getExtensions() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns a array of extensions that listen to this extension point
getExtensions() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns an array of extensions.
getExtenstionPoints() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array of extension points.
getFetchInterval() - Method in class org.apache.nutch.storage.WebPage
getFetchSchedule(Configuration) - Static method in class org.apache.nutch.crawl.FetchScheduleFactory: Return the FetchSchedule implementation.
getFetchTime() - Method in class org.apache.nutch.storage.WebPage
getFieldNames() - Method in class org.apache.nutch.indexer.NutchDocument
getFields() - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
getFields() - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
getFields() - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
getFields() - Method in interface org.apache.nutch.crawl.FetchSchedule
getFields() - Method in class org.apache.nutch.crawl.MD5Signature
getFields() - Method in class org.apache.nutch.crawl.Signature
getFields(Configuration) - Static method in class org.apache.nutch.crawl.SignatureFactory
getFields() - Method in class org.apache.nutch.crawl.TextProfileSignature
getFields(Job) - Method in class org.apache.nutch.fetcher.FetcherJob
getFields() - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
getFields() - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
getFields() - Method in class org.apache.nutch.indexer.IndexingFilters
getFields() - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
getFields() - Method in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
getFields() - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
getFields() - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
getFields() - Method in class org.apache.nutch.microformats.reltag.RelTagParser
getFields() - Method in class org.apache.nutch.parse.html.HtmlParser
getFields() - Method in class org.apache.nutch.parse.js.JSParseFilter
getFields() - Method in class org.apache.nutch.parse.ParseFilters
getFields() - Method in class org.apache.nutch.parse.ParserFactory
getFields(Job) - Method in class org.apache.nutch.parse.ParserJob
getFields() - Method in class org.apache.nutch.parse.tika.TikaParser
getFields() - Method in interface org.apache.nutch.plugin.FieldPluggable
getFields() - Method in class org.apache.nutch.protocol.file.File
getFields() - Method in class org.apache.nutch.protocol.ftp.Ftp
getFields() - Method in class org.apache.nutch.protocol.http.Http
getFields() - Method in class org.apache.nutch.protocol.httpclient.Http
getFields() - Method in class org.apache.nutch.protocol.ProtocolFactory
getFields() - Method in class org.apache.nutch.protocol.sftp.Sftp
getFields() - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
getFields() - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
getFields() - Method in class org.apache.nutch.scoring.ScoringFilters
getFields() - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
getFields() - Method in class org.creativecommons.nutch.CCIndexingFilter
getFields() - Method in class org.creativecommons.nutch.CCParseFilter
getFieldValue(String) - Method in class org.apache.nutch.indexer.NutchDocument
getFieldValues(String) - Method in class org.apache.nutch.indexer.NutchDocument
getFirst() - Method in class org.apache.nutch.util.Pair
getFParsePluginsFile() - Method in class org.apache.nutch.parse.ParsePluginsReader
getFromHeaders(Utf8) - Method in class org.apache.nutch.storage.WebPage
getFromInlinks(Utf8) - Method in class org.apache.nutch.storage.Host
getFromInlinks(Utf8) - Method in class org.apache.nutch.storage.WebPage
getFromMarkers(Utf8) - Method in class org.apache.nutch.storage.WebPage
getFromMetadata(Utf8) - Method in class org.apache.nutch.storage.Host
getFromMetadata(Utf8) - Method in class org.apache.nutch.storage.WebPage
getFromOutlinks(Utf8) - Method in class org.apache.nutch.storage.Host
getFromOutlinks(Utf8) - Method in class org.apache.nutch.storage.WebPage
getGeneralTags() - Method in class org.apache.nutch.parse.HTMLMetaTags: Returns all collected values of the general meta tags.
getHeader(String) - Method in interface org.apache.nutch.net.protocols.Response: Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.file.FileResponse: Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.ftp.FtpResponse: Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.http.HttpResponse
getHeader(String) - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getHeaders() - Method in interface org.apache.nutch.net.protocols.Response: Returns all the headers.
getHeaders() - Method in class org.apache.nutch.protocol.http.HttpResponse
getHeaders() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getHeaders() - Method in class org.apache.nutch.storage.WebPage
getHost(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the lowercased hostname for the url or null if the url is not well formed.
getHostSegments(URL) - Static method in class org.apache.nutch.util.URLUtil: Partitions of the hostname of the url by "."
getHostSegments(String) - Static method in class org.apache.nutch.util.URLUtil: Partitions of the hostname of the url by "."
getHttpEquivTags() - Method in class org.apache.nutch.parse.HTMLMetaTags: Returns all collected values of the "http-equiv" meta tags.
getId() - Method in class org.apache.nutch.collection.Subcollection
getId() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
getId() - Method in class org.apache.nutch.plugin.Extension: Return the unique id of the extension.
getId() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns the unique id of the extension point.
getIndex() - Method in enum org.apache.nutch.storage.Host.Field
getIndex() - Method in enum org.apache.nutch.storage.ParseStatus.Field
getIndex() - Method in enum org.apache.nutch.storage.ProtocolStatus.Field
getIndex() - Method in enum org.apache.nutch.storage.WebPage.Field
getInlinks() - Method in class org.apache.nutch.storage.Host
getInlinks() - Method in class org.apache.nutch.storage.WebPage
getInstance(Configuration) - Static method in class org.apache.nutch.indexer.solr.SolrMappingReader
getInstance() - Static method in class org.apache.nutch.util.domain.DomainSuffixes: Singleton instance, lazy instantination
getInt(String, int) - Method in class org.apache.nutch.storage.Host
getIP_Header() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getKey() - Method in class org.apache.nutch.fetcher.FetchEntry
getKeyMap() - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
getKeys() - Method in class org.apache.nutch.util.Histogram
getLastModified() - Method in class org.apache.nutch.storage.ProtocolStatus
getLength() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
getLocations() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
getLong(String, long) - Method in class org.apache.nutch.storage.Host
getMajorCode() - Method in class org.apache.nutch.storage.ParseStatus
getMarkers() - Method in class org.apache.nutch.storage.WebPage
getMaxContent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getMessage(ParseStatus) - Static method in class org.apache.nutch.parse.ParseStatusUtils: A convenience method.
getMessage(ProtocolStatus) - Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
getMeta(String) - Method in class org.apache.nutch.metadata.MetaWrapper: Get metadata.
getMeta(String) - Method in class org.apache.nutch.scoring.ScoreDatum
getMetadata() - Method in class org.apache.nutch.metadata.MetaWrapper: Get all metadata.
getMetadata() - Method in class org.apache.nutch.protocol.Content: Other protocol-specific data.
getMetadata() - Method in class org.apache.nutch.storage.Host
getMetadata() - Method in class org.apache.nutch.storage.WebPage
getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.html.HTMLMetaProcessor: Sets the indicators in robotsMeta to appropriate values, based on any META tags found under the given node.
getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.tika.HTMLMetaProcessor: Sets the indicators in robotsMeta to appropriate values, based on any META tags found under the given node.
getMetaValues(String) - Method in class org.apache.nutch.metadata.MetaWrapper: Get multiple metadata.
getMimeRepository() - Method in class org.apache.nutch.parse.tika.TikaConfig
getMimeType(String) - Method in class org.apache.nutch.util.MimeUtil: Facade interface to Tika's underlying MimeTypes.getMimeType(String) method.
getMimeType(File) - Method in class org.apache.nutch.util.MimeUtil: Facade interface to Tika's underlying MimeTypes.getMimeType(File) method.
getMinorCode() - Method in class org.apache.nutch.storage.ParseStatus
getModifiedTime() - Method in class org.apache.nutch.storage.WebPage
getName() - Method in class org.apache.nutch.collection.Subcollection
getName(byte) - Static method in class org.apache.nutch.crawl.CrawlStatus
getName() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns the name of the extension point.
getName() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the name of the plugin.
getName(int) - Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
getName() - Method in enum org.apache.nutch.storage.Host.Field
getName() - Method in enum org.apache.nutch.storage.ParseStatus.Field
getName() - Method in enum org.apache.nutch.storage.ProtocolStatus.Field
getName() - Method in enum org.apache.nutch.storage.WebPage.Field
getNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getNormalizedName(String) - Static method in class org.apache.nutch.metadata.SpellCheckedMetadata: Get the normalized name of metadata attribute name.
getNotExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array of libraries as URLs that are not exported by the plugin.
getNutchIndexWriters(Configuration) - Static method in class org.apache.nutch.indexer.NutchIndexWriterFactory
getObject(String) - Method in class org.apache.nutch.util.ObjectCache
getOutlinks(URL, ArrayList<Outlink>, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This method finds all anchors below the supplied DOM node, and creates appropriate Outlink records for each (relative to the supplied base URL), and adds them to the outlinks ArrayList.
getOutlinks(String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor: Extracts Outlink from given plain text.
getOutlinks(String, String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor: Extracts Outlink from given plain text and adds anchor to the extracted Outlinks
getOutlinks() - Method in class org.apache.nutch.parse.Parse
getOutlinks(URL, ArrayList, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils: This method finds all anchors below the supplied DOM node, and creates appropriate Outlink records for each (relative to the supplied base URL), and adds them to the outlinks ArrayList.
getOutlinks() - Method in class org.apache.nutch.storage.Host
getOutlinks() - Method in class org.apache.nutch.storage.WebPage
getPage(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the page for the url.
getParse(Content) - Method in class org.apache.nutch.parse.ext.ExtParser
getParse(Content) - Method in class org.apache.nutch.parse.feed.FeedParser: Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library.
getParse(String, WebPage) - Method in class org.apache.nutch.parse.html.HtmlParser
getParse(String, WebPage) - Method in class org.apache.nutch.parse.js.JSParseFilter
getParse(String, WebPage) - Method in interface org.apache.nutch.parse.Parser: This method parses content in WebPage instance
getParse(Content) - Method in class org.apache.nutch.parse.swf.SWFParser
getParse(String, WebPage) - Method in class org.apache.nutch.parse.tika.TikaParser
getParse(Content) - Method in class org.apache.nutch.parse.zip.ZipParser
getParser(String) - Method in class org.apache.nutch.parse.tika.TikaConfig: Returns the parser instance configured for the given MIME type.
getParserById(String) - Method in class org.apache.nutch.parse.ParserFactory: Function returns a Parser instance with the specified extId, representing its extension ID.
getParsers(String, String) - Method in class org.apache.nutch.parse.ParserFactory: Function returns an array of Parsers for a given content type.
getParsers() - Method in class org.apache.nutch.parse.tika.TikaConfig
getParseStatus() - Method in class org.apache.nutch.parse.Parse
getParseStatus() - Method in class org.apache.nutch.storage.WebPage
getPartition(IntWritable, FetchEntry, int) - Method in class org.apache.nutch.crawl.URLPartitioner.FetchEntryPartitioner
getPartition(String, int) - Method in class org.apache.nutch.crawl.URLPartitioner
getPartition(GeneratorJob.SelectorEntry, WebPage, int) - Method in class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
getPartition(UrlWithScore, NutchWritable, int) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlOnlyPartitioner
getPassAllFilter() - Static method in class org.apache.nutch.util.HadoopFSUtil: Returns PathFilter that passes all paths through.
getPassDirectoriesFilter(FileSystem) - Static method in class org.apache.nutch.util.HadoopFSUtil: Returns PathFilter that passes directories through.
getPaths(FileStatus[]) - Static method in class org.apache.nutch.util.HadoopFSUtil: Turns an array of FileStatus into an array of Paths.
getPluginClass() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the fully qualified name of the class which implements the abstarct Plugin class.
getPluginDescriptor(String) - Method in class org.apache.nutch.plugin.PluginRepository: Returns the descriptor of one plugin identified by a plugin id.
getPluginDescriptors() - Method in class org.apache.nutch.plugin.PluginRepository: Returns all registed plugin descriptors.
getPluginFolder(String) - Method in class org.apache.nutch.plugin.PluginManifestParser: Return the named plugin folder.
getPluginId() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the unique identifier of the plug-in or null.
getPluginInstance(PluginDescriptor) - Method in class org.apache.nutch.plugin.PluginRepository: Returns a instance of a plugin.
getPluginList(String) - Method in class org.apache.nutch.parse.ParsePluginList
getPluginPath() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the directory path of the plugin.
getPos() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Returns the current position in the file.
getPrevFetchTime() - Method in class org.apache.nutch.storage.WebPage
getPrevSignature() - Method in class org.apache.nutch.storage.WebPage
getProgress() - Method in class org.apache.nutch.crawl.Crawler
getProgress() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
getProgress() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Returns the percentage of progress in processing the file.
getProgress() - Method in class org.apache.nutch.util.NutchTool: Returns relative progress of the tool, a float in range [0,1].
getProtocol(String) - Method in class org.apache.nutch.protocol.ProtocolFactory: Returns the appropriate Protocol implementation for a url.
getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.file.File
getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.ftp.Ftp
getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.http.api.HttpBase
getProtocolOutput(String, WebPage) - Method in interface org.apache.nutch.protocol.Protocol: Returns the Content for a fetchlist entry.
getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.sftp.Sftp
getProtocolStatus() - Method in class org.apache.nutch.storage.WebPage
getProviderName() - Method in class org.apache.nutch.plugin.PluginDescriptor
getProxyHost() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getProxyPort() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getRealm() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication: Gets the realm used by the HttpAuthentication object during creation.
getRealm() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Gets the realm attribute of the HttpBasicAuthentication object.
getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.tools.arc.ArcInputFormat: Returns the RecordReader for reading the arc file.
getRecordWriter(TaskAttemptContext) - Method in class org.apache.nutch.indexer.IndexerOutputFormat
getRefresh() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getRefreshHref() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getRefreshTime() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getReprUrl() - Method in class org.apache.nutch.storage.WebPage
getResourceString(String, Locale) - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a I18N'd resource string.
getResponse(URL, WebPage, boolean) - Method in class org.apache.nutch.protocol.http.api.HttpBase
getResponse(URL, WebPage, boolean) - Method in class org.apache.nutch.protocol.http.Http
getResponse(URL, WebPage, boolean) - Method in class org.apache.nutch.protocol.httpclient.Http: Fetches the url with a configured HTTP client and gets the response.
getRetriesSinceFetch() - Method in class org.apache.nutch.storage.WebPage
getReversedHost(String) - Static method in class org.apache.nutch.util.TableUtil: Given a reversed url, returns the reversed host E.g "com.foo.bar:http:8983/to/index.html?a=b" -> "com.foo.bar"
getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.file.File
getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.ftp.Ftp
getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.http.api.HttpBase
getRobotRules(String, WebPage) - Method in interface org.apache.nutch.protocol.Protocol: Retrieve robot rules applicable for this url.
getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.sftp.Sftp
getRobotRulesSet(HttpBase, String) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
getRootNode() - Method in class org.apache.nutch.parse.html.DOMBuilder: Get the root node of the DOM being created.
getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase: Returns the name of the file of rules to use for a particular implementation.
getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter: Rules specified as a config property will override rules specified as a config file.
getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter: Rules specified as a config property will override rules specified as a config file.
getRuns() - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
getSchema() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns a path to the xml schema of a extension point.
getSchema() - Method in class org.apache.nutch.storage.Host
getSchema() - Method in class org.apache.nutch.storage.ParseStatus
getSchema() - Method in class org.apache.nutch.storage.ProtocolStatus
getSchema() - Method in class org.apache.nutch.storage.WebPage
getScopedRules() - Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
getScore() - Method in class org.apache.nutch.crawl.UrlWithScore
getScore() - Method in class org.apache.nutch.indexer.NutchDocument
getScore() - Method in class org.apache.nutch.scoring.ScoreDatum
getScore() - Method in class org.apache.nutch.storage.WebPage
getSecond() - Method in class org.apache.nutch.util.Pair
getSignature(Configuration) - Static method in class org.apache.nutch.crawl.SignatureFactory: Return the default Signature implementation.
getSignature() - Method in class org.apache.nutch.storage.WebPage
getSplits(JobContext) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
getStages() - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
getStatus() - Method in class org.apache.nutch.crawl.Crawler
getStatus() - Method in class org.apache.nutch.protocol.ProtocolOutput
getStatus() - Method in class org.apache.nutch.storage.WebPage
getStatus() - Method in class org.apache.nutch.util.domain.DomainSuffix
getStatus() - Method in class org.apache.nutch.util.NutchTool: Returns current status of the running tool.
getSubColection(String) - Method in class org.apache.nutch.collection.CollectionManager: Returns named subcollection
getSubCollections(String) - Method in class org.apache.nutch.collection.CollectionManager: Return names of collections url is part of
getSystemName() - Method in class org.apache.nutch.protocol.ftp.Client: Fetches the system type name from the server and returns the string.
getTargetPoint() - Method in class org.apache.nutch.plugin.Extension: Returns the Id of the extension point, that is implemented by this extension.
getText(StringBuilder, Node, boolean) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This method takes a StringBuilder and a DOM Node, and will append all the content text found beneath the DOM node to the StringBuilder.
getText(StringBuilder, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This is a convinience method, equivalent to getText(sb, node, false).
getText() - Method in class org.apache.nutch.parse.Parse
getText(StringBuffer, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils: This is a convinience method, equivalent to getText(sb, node, false).
getText() - Method in class org.apache.nutch.storage.WebPage
getThrownError() - Method in class org.apache.nutch.util.CommandRunner
getTikaConfig() - Method in class org.apache.nutch.parse.tika.TikaParser
getTimeout() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getTimeout() - Method in class org.apache.nutch.util.CommandRunner
getTitle(StringBuilder, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This method takes a StringBuffer and a DOM Node, and will append the content text found beneath the first title node to the StringBuffer.
getTitle() - Method in class org.apache.nutch.parse.Parse
getTitle(StringBuffer, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils: This method takes a StringBuffer and a DOM Node, and will append the content text found beneath the first title node to the StringBuffer.
getTitle() - Method in class org.apache.nutch.storage.WebPage
getToUrl() - Method in class org.apache.nutch.parse.Outlink
getTstamp() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
getType() - Method in class org.apache.nutch.util.domain.TopLevelDomain
getTypes() - Method in class org.apache.nutch.crawl.NutchWritable
getUniqueKey() - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
getUrl() - Method in class org.apache.nutch.crawl.URLWebPage
getUrl() - Method in class org.apache.nutch.crawl.UrlWithScore
getUrl() - Method in interface org.apache.nutch.net.protocols.Response: Returns the URL used to retrieve this response.
getUrl() - Method in exception org.apache.nutch.parse.ParserNotFound
getUrl() - Method in class org.apache.nutch.protocol.Content: The url fetched.
getUrl() - Method in class org.apache.nutch.protocol.http.HttpResponse
getUrl() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getUrl() - Method in exception org.apache.nutch.protocol.ProtocolNotFound
getUrl() - Method in class org.apache.nutch.scoring.ScoreDatum
getUseHttp11() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getUserAgent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getUUID(Configuration) - Static method in class org.apache.nutch.util.NutchConfiguration: Retrieve a Nutch UUID of this configuration object, or null if the configuration was created elsewhere.
getValue(String, String) - Method in class org.apache.nutch.storage.Host
getValue(E) - Method in class org.apache.nutch.util.Histogram
getValues(String) - Method in class org.apache.nutch.metadata.Metadata: Get the values associated to a metadata name.
getValues(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
getVersion() - Method in class org.apache.nutch.plugin.PluginDescriptor
getWaitForExit() - Method in class org.apache.nutch.util.CommandRunner
getWebPage() - Method in class org.apache.nutch.fetcher.FetchEntry
getWebPage() - Method in class org.apache.nutch.util.WebPageWritable
getWhiteList() - Method in class org.apache.nutch.collection.Subcollection: Returns whitelist
getWhiteListString() - Method in class org.apache.nutch.collection.Subcollection: Returns whitelist String
getWriter() - Method in class org.apache.nutch.parse.html.DOMBuilder: Return null since there is no Writer for this class.
GONE - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes: Resource is gone.
guessEncoding(WebPage, String) - Method in class org.apache.nutch.util.EncodingDetector: Guess the encoding with the previously specified list of clues.
GZIPUtils - Class in org.apache.nutch.util: A collection of utility methods for working on GZIPed data.
GZIPUtils() - Constructor for class org.apache.nutch.util.GZIPUtils

H

HadoopFSUtil - Class in org.apache.nutch.util
HadoopFSUtil() - Constructor for class org.apache.nutch.util.HadoopFSUtil
handle(String, HttpServletRequest, HttpServletResponse, int) - Method in class org.apache.nutch.tools.proxy.AbstractTestbedHandler
handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.AbstractTestbedHandler
handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.DelayHandler
handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.FakeHandler
handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.LogDebugHandler
handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.NotFoundHandler
hasCopy(String) - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
hashCode() - Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
hashCode() - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
hashCode(byte[]) - Static method in class org.apache.nutch.util.Bytes
hashCode(byte[], int) - Static method in class org.apache.nutch.util.Bytes
hasNext() - Method in class org.apache.nutch.util.NodeWalker: Returns true if there are more nodes on the current stack.
head(byte[], int) - Static method in class org.apache.nutch.util.Bytes
Histogram<E> - Class in org.apache.nutch.util
Histogram() - Constructor for class org.apache.nutch.util.Histogram
Host - Class in org.apache.nutch.storage
Host() - Constructor for class org.apache.nutch.storage.Host
Host(StateManager) - Constructor for class org.apache.nutch.storage.Host
Host.Field - Enum in org.apache.nutch.storage
HostDb - Class in org.apache.nutch.host: A caching wrapper for the host datastore.
HostDb(Configuration) - Constructor for class org.apache.nutch.host.HostDb
HOSTDB_CONCURRENCY_LEVEL - Static variable in class org.apache.nutch.host.HostDb
HOSTDB_LRU_SIZE - Static variable in class org.apache.nutch.host.HostDb
HostDbReader - Class in org.apache.nutch.host: Display entries from the hostDB.
HostDbReader() - Constructor for class org.apache.nutch.host.HostDbReader
HostDbUpdateJob - Class in org.apache.nutch.host: Scans the web table and create host entries for each unique host.
HostDbUpdateJob() - Constructor for class org.apache.nutch.host.HostDbUpdateJob
HostDbUpdateJob(Configuration) - Constructor for class org.apache.nutch.host.HostDbUpdateJob
HostDbUpdateJob.Mapper - Class in org.apache.nutch.host: Maps each WebPage to a host key.
HostDbUpdateJob.Mapper() - Constructor for class org.apache.nutch.host.HostDbUpdateJob.Mapper
HostDbUpdateReducer - Class in org.apache.nutch.host: Combines all WebPages with the same host key to create a Host object, with some statistics.
HostDbUpdateReducer() - Constructor for class org.apache.nutch.host.HostDbUpdateReducer
HostInjectorJob - Class in org.apache.nutch.host: Creates or updates an existing host table from a text file.
The files contain one host name per line, optionally followed by custom metadata separated by tabs with the metadata key is separated from the corresponding value by '='.
HostInjectorJob() - Constructor for class org.apache.nutch.host.HostInjectorJob
HostInjectorJob(Configuration) - Constructor for class org.apache.nutch.host.HostInjectorJob
HostInjectorJob.UrlMapper - Class in org.apache.nutch.host
HostInjectorJob.UrlMapper() - Constructor for class org.apache.nutch.host.HostInjectorJob.UrlMapper
HTMLLanguageParser - Class in org.apache.nutch.analysis.lang: Adds metadata identifying language of document if found We could also run statistical analysis here but we'd miss all other formats
HTMLLanguageParser() - Constructor for class org.apache.nutch.analysis.lang.HTMLLanguageParser
HTMLMetaProcessor - Class in org.apache.nutch.parse.html: Class for parsing META Directives from DOM trees.
HTMLMetaProcessor() - Constructor for class org.apache.nutch.parse.html.HTMLMetaProcessor
HTMLMetaProcessor - Class in org.apache.nutch.parse.tika: Class for parsing META Directives from DOM trees.
HTMLMetaProcessor() - Constructor for class org.apache.nutch.parse.tika.HTMLMetaProcessor
HTMLMetaTags - Class in org.apache.nutch.parse: This class holds the information about HTML "meta" tags extracted from a page.
HTMLMetaTags() - Constructor for class org.apache.nutch.parse.HTMLMetaTags
HTMLPARSEFILTER_ORDER - Static variable in class org.apache.nutch.parse.ParseFilters
HtmlParser - Class in org.apache.nutch.parse.html
HtmlParser() - Constructor for class org.apache.nutch.parse.html.HtmlParser
Http - Class in org.apache.nutch.protocol.http
Http() - Constructor for class org.apache.nutch.protocol.http.Http
Http - Class in org.apache.nutch.protocol.httpclient: This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server.
Http() - Constructor for class org.apache.nutch.protocol.httpclient.Http: Constructs this plugin.
HttpAuthentication - Interface in org.apache.nutch.protocol.httpclient: The base level of services required for Http Authentication
HttpAuthenticationException - Exception in org.apache.nutch.protocol.httpclient: Can be used to identify problems during creation of Authentication objects.
HttpAuthenticationException() - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with null as its detail message.
HttpAuthenticationException(String) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with the specified detail message.
HttpAuthenticationException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with the specified message and cause.
HttpAuthenticationException(Throwable) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with the specified cause and detail message from given clause if it is not null.
HttpAuthenticationFactory - Class in org.apache.nutch.protocol.httpclient: Provides the Http protocol implementation with the ability to authenticate when prompted.
HttpAuthenticationFactory(Configuration) - Constructor for class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
HttpBase - Class in org.apache.nutch.protocol.http.api
HttpBase() - Constructor for class org.apache.nutch.protocol.http.api.HttpBase: Creates a new instance of HttpBase
HttpBase(Logger) - Constructor for class org.apache.nutch.protocol.http.api.HttpBase: Creates a new instance of HttpBase
HttpBasicAuthentication - Class in org.apache.nutch.protocol.httpclient: Implementation of RFC 2617 Basic Authentication.
HttpBasicAuthentication(String, Configuration) - Constructor for class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Construct an HttpBasicAuthentication for the given challenge parameters.
HttpDateFormat - Class in org.apache.nutch.net.protocols: class to handle HTTP dates.
HttpDateFormat() - Constructor for class org.apache.nutch.net.protocols.HttpDateFormat
HttpException - Exception in org.apache.nutch.protocol.http.api
HttpException() - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpException(String) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpException(Throwable) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpHeaders - Interface in org.apache.nutch.metadata: A collection of HTTP header names.
HttpResponse - Class in org.apache.nutch.protocol.http: An HTTP response.
HttpResponse(HttpBase, URL, WebPage) - Constructor for class org.apache.nutch.protocol.http.HttpResponse
HttpResponse - Class in org.apache.nutch.protocol.httpclient: An HTTP response.

I

id - Variable in class org.apache.nutch.api.JobStatus
ID_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
IDENTIFIER - Static variable in interface org.apache.nutch.metadata.DublinCore: Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system.
IdentityPageReducer - Class in org.apache.nutch.util
IdentityPageReducer() - Constructor for class org.apache.nutch.util.IdentityPageReducer
ignorableWhitespace(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of ignorable whitespace in element content.
in - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
incrementBytes(byte[], long) - Static method in class org.apache.nutch.util.Bytes: Bytewise binary increment/deincrement of long contained in byte array on given amount.
index(String, WebPage) - Method in class org.apache.nutch.indexer.IndexUtil: Index a webpage.
IndexerJob - Class in org.apache.nutch.indexer
IndexerJob() - Constructor for class org.apache.nutch.indexer.IndexerJob
IndexerJob.IndexerMapper - Class in org.apache.nutch.indexer
IndexerJob.IndexerMapper() - Constructor for class org.apache.nutch.indexer.IndexerJob.IndexerMapper
IndexerOutputFormat - Class in org.apache.nutch.indexer
IndexerOutputFormat() - Constructor for class org.apache.nutch.indexer.IndexerOutputFormat
indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Dampen the boost value by scorePower.
indexerScore(String, NutchDocument, WebPage, float) - Method in interface org.apache.nutch.scoring.ScoringFilter: This method calculates a Lucene document boost.
indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.ScoringFilters
indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
IndexingException - Exception in org.apache.nutch.indexer
IndexingException() - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingException(String) - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingException(String, Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingException(Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingFilter - Interface in org.apache.nutch.indexer: Extension point for indexing.
INDEXINGFILTER_ORDER - Static variable in class org.apache.nutch.indexer.IndexingFilters
IndexingFilters - Class in org.apache.nutch.indexer: Creates and caches IndexingFilter implementing plugins.
IndexingFilters(Configuration) - Constructor for class org.apache.nutch.indexer.IndexingFilters
indexUtil - Variable in class org.apache.nutch.indexer.IndexerJob.IndexerMapper
IndexUtil - Class in org.apache.nutch.indexer: Utility to create an indexed document from a webpage.
IndexUtil(Configuration) - Constructor for class org.apache.nutch.indexer.IndexUtil
inflate(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils: Returns an inflated copy of the input array.
inflateBestEffort(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils: Returns an inflated copy of the input array.
inflateBestEffort(byte[], int) - Static method in class org.apache.nutch.util.DeflateUtils: Returns an inflated copy of the input array, truncated to sizeLimit bytes, if necessary.
init() - Method in class org.apache.nutch.collection.CollectionManager
init(FilterConfig) - Method in class org.apache.nutch.tools.proxy.LogDebugHandler
initialize(Element) - Method in class org.apache.nutch.collection.Subcollection: Initialize Subcollection from dom element
initialize(InputSplit, TaskAttemptContext) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
initializeSchedule(String, WebPage) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule: Initialize fetch schedule related data.
initializeSchedule(String, WebPage) - Method in interface org.apache.nutch.crawl.FetchSchedule: Initialize fetch schedule related data.
initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level.
initialScore(String, WebPage) - Method in interface org.apache.nutch.scoring.ScoringFilter: Set an initial score for newly discovered pages.
initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.ScoringFilters: Calculate a new initial score, used when adding newly discovered pages.
initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>, boolean) - Static method in class org.apache.nutch.storage.StorageUtils
initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>) - Static method in class org.apache.nutch.storage.StorageUtils
initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>, Class<? extends Partitioner<K, V>>) - Static method in class org.apache.nutch.storage.StorageUtils
initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>, Class<? extends Partitioner<K, V>>, boolean) - Static method in class org.apache.nutch.storage.StorageUtils
initReducerJob(Job, Class<? extends GoraReducer<K, V, String, WebPage>>) - Static method in class org.apache.nutch.storage.StorageUtils
inject(Path) - Method in class org.apache.nutch.crawl.InjectorJob
inject(Path) - Method in class org.apache.nutch.host.HostInjectorJob
injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
injectedScore(String, WebPage) - Method in interface org.apache.nutch.scoring.ScoringFilter: Set an initial score for newly injected pages.
injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.ScoringFilters: Calculate a new initial score, used when injecting new pages.
injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
InjectorJob - Class in org.apache.nutch.crawl: This class takes a flat file of URLs and adds them to the of pages to be crawled.
InjectorJob() - Constructor for class org.apache.nutch.crawl.InjectorJob
InjectorJob(Configuration) - Constructor for class org.apache.nutch.crawl.InjectorJob
InjectorJob.InjectorMapper - Class in org.apache.nutch.crawl
InjectorJob.InjectorMapper() - Constructor for class org.apache.nutch.crawl.InjectorJob.InjectorMapper
InjectorJob.UrlMapper - Class in org.apache.nutch.crawl
InjectorJob.UrlMapper() - Constructor for class org.apache.nutch.crawl.InjectorJob.UrlMapper
inlinks - Variable in class org.apache.nutch.storage.Host
IP_ADDRESS - Static variable in interface org.apache.nutch.metadata.HttpHeaders
ip_header - Variable in class org.apache.nutch.protocol.http.api.HttpBase: The "_ip" request header value.
isAllowed(URL) - Method in class org.apache.nutch.protocol.EmptyRobotRules
isAllowed(HttpBase, URL) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
isAllowed(URL) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.
isAllowed(String) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.
isAllowed(URL) - Method in interface org.apache.nutch.protocol.RobotRules: Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.
isClientTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
isDomainSuffix(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes: return whether the extension is a registered domain entry
isEmpty(String) - Static method in class org.apache.nutch.util.StringUtil: Checks if a string is empty (ie is null or empty).
isIgnoreCase() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
isMagic(byte[]) - Static method in class org.apache.nutch.tools.arc.ArcRecordReader: Returns true if the byte array passed matches the gzip header magic number.
isModeAccept() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
isMultiValued(String) - Method in class org.apache.nutch.metadata.Metadata: Returns true if named value is multivalued.
isRemoteVerificationEnabled() - Method in class org.apache.nutch.protocol.ftp.Client: Return whether or not verification of the remote host participating in data connections is enabled.
isRunning() - Method in class org.apache.nutch.api.NutchServer
isSameDomainName(URL, URL) - Static method in class org.apache.nutch.util.URLUtil: Returns whether the given urls have the same domain name.
isSameDomainName(String, String) - Static method in class org.apache.nutch.util.URLUtil: Returns whether the given urls have the same domain name.
isServerTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
isSuccess(ParseStatus) - Static method in class org.apache.nutch.parse.ParseStatusUtils
isTruncated(String, WebPage) - Static method in class org.apache.nutch.parse.ParserJob: Checks if the page's content is truncated.
isWhiteSpace(char) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Returns whether the specified ch conforms to the XML 1.0 definition of whitespace.
isWhiteSpace(char[], int, int) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Tell if the string is whitespace.
isWhiteSpace(StringBuffer) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Tell if the string is whitespace.
isWhiteSpace(String) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Tell if the string is whitespace.
iterateOnSplits(byte[], byte[], int) - Static method in class org.apache.nutch.util.Bytes: Iterate over keys within the passed inclusive range.
iterator(String[], String, String, String) - Method in class org.apache.nutch.api.DbReader
iterator() - Method in class org.apache.nutch.indexer.NutchDocument: Iterate over all fields.

J

JOB_CMD_ABORT - Static variable in interface org.apache.nutch.api.Params
JOB_CMD_GET - Static variable in interface org.apache.nutch.api.Params
JOB_CMD_STOP - Static variable in interface org.apache.nutch.api.Params
JOB_ID - Static variable in interface org.apache.nutch.api.Params
JOB_TYPE - Static variable in interface org.apache.nutch.api.Params
JobManager - Interface in org.apache.nutch.api
JobManager.JobType - Enum in org.apache.nutch.api
jobMgr - Static variable in class org.apache.nutch.api.NutchApp
JobResource - Class in org.apache.nutch.api
JobResource() - Constructor for class org.apache.nutch.api.JobResource
JobStatus - Class in org.apache.nutch.api
JobStatus(String, JobManager.JobType, String, Map<String, Object>, JobStatus.State, String) - Constructor for class org.apache.nutch.api.JobStatus
JobStatus.State - Enum in org.apache.nutch.api
JSParseFilter - Class in org.apache.nutch.parse.js: This class is a heuristic link extractor for JavaScript files and code snippets.
JSParseFilter() - Constructor for class org.apache.nutch.parse.js.JSParseFilter

K

KEYWORDS - Static variable in interface org.apache.nutch.metadata.Office
killJob() - Method in class org.apache.nutch.crawl.Crawler
killJob() - Method in class org.apache.nutch.util.NutchTool: Kill the job immediately.

L

LANGUAGE - Static variable in interface org.apache.nutch.metadata.DublinCore: A language of the intellectual content of the resource.
LanguageIndexingFilter - Class in org.apache.nutch.analysis.lang: An IndexingFilter that adds a lang (language) field to the document.
LanguageIndexingFilter() - Constructor for class org.apache.nutch.analysis.lang.LanguageIndexingFilter: Constructs a new Language Indexing Filter.
LAST_AUTHOR - Static variable in interface org.apache.nutch.metadata.Office
LAST_MODIFIED - Static variable in interface org.apache.nutch.metadata.HttpHeaders
LAST_PRINTED - Static variable in interface org.apache.nutch.metadata.Office
LAST_SAVED - Static variable in interface org.apache.nutch.metadata.Office
leftPad(String, int) - Static method in class org.apache.nutch.util.StringUtil: Returns a copy of s padded with leading spaces so that it's length is length.
LICENSE_LOCATION - Static variable in interface org.apache.nutch.metadata.CreativeCommons
LICENSE_URL - Static variable in interface org.apache.nutch.metadata.CreativeCommons
LinkAnalysisScoringFilter - Class in org.apache.nutch.scoring.link
LinkAnalysisScoringFilter() - Constructor for class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
list() - Method in interface org.apache.nutch.api.ConfManager
list() - Method in class org.apache.nutch.api.impl.RAMConfManager
list(String, JobStatus.State) - Method in class org.apache.nutch.api.impl.RAMJobManager
list(String, JobStatus.State) - Method in interface org.apache.nutch.api.JobManager
LOCATION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
LockUtil - Class in org.apache.nutch.util: Utility methods for handling application-level locking.
LockUtil() - Constructor for class org.apache.nutch.util.LockUtil
LOG - Static variable in class org.apache.nutch.analysis.lang.HTMLLanguageParser
LOG - Static variable in class org.apache.nutch.crawl.DbUpdateMapper
LOG - Static variable in class org.apache.nutch.crawl.DbUpdateReducer
LOG - Static variable in class org.apache.nutch.crawl.DbUpdaterJob
LOG - Static variable in class org.apache.nutch.crawl.FetchScheduleFactory
LOG - Static variable in class org.apache.nutch.crawl.GeneratorJob
LOG - Static variable in class org.apache.nutch.crawl.InjectorJob
LOG - Static variable in class org.apache.nutch.crawl.WebTableReader
LOG - Static variable in class org.apache.nutch.fetcher.FetcherJob
LOG - Static variable in class org.apache.nutch.fetcher.FetcherReducer
LOG - Static variable in class org.apache.nutch.host.HostDb
LOG - Static variable in class org.apache.nutch.host.HostDbReader
LOG - Static variable in class org.apache.nutch.host.HostDbUpdateJob
LOG - Static variable in class org.apache.nutch.host.HostInjectorJob
LOG - Static variable in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
LOG - Static variable in class org.apache.nutch.indexer.basic.BasicIndexingFilter
LOG - Static variable in class org.apache.nutch.indexer.IndexerJob
LOG - Static variable in class org.apache.nutch.indexer.IndexingFilters
LOG - Static variable in class org.apache.nutch.indexer.more.MoreIndexingFilter
LOG - Static variable in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
LOG - Static variable in class org.apache.nutch.indexer.solr.SolrIndexerJob
LOG - Static variable in class org.apache.nutch.indexer.solr.SolrMappingReader
LOG - Static variable in class org.apache.nutch.indexer.solr.SolrWriter
LOG - Static variable in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter: Logger
LOG - Static variable in class org.apache.nutch.indexer.tld.TLDIndexingFilter
LOG - Static variable in class org.apache.nutch.microformats.reltag.RelTagParser
LOG - Static variable in class org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
LOG - Static variable in class org.apache.nutch.net.URLNormalizers
LOG - Static variable in class org.apache.nutch.parse.ext.ExtParser
LOG - Static variable in class org.apache.nutch.parse.feed.FeedParser
LOG - Static variable in class org.apache.nutch.parse.html.HtmlParser
LOG - Static variable in class org.apache.nutch.parse.js.JSParseFilter
LOG - Static variable in class org.apache.nutch.parse.ParsePluginsReader
LOG - Static variable in class org.apache.nutch.parse.ParserChecker
LOG - Static variable in class org.apache.nutch.parse.ParserFactory
LOG - Static variable in class org.apache.nutch.parse.ParserJob
LOG - Static variable in class org.apache.nutch.parse.ParseUtil
LOG - Static variable in class org.apache.nutch.parse.swf.SWFParser
LOG - Static variable in class org.apache.nutch.parse.tika.TikaParser
LOG - Static variable in class org.apache.nutch.parse.zip.ZipTextExtractor
LOG - Static variable in class org.apache.nutch.plugin.PluginDescriptor
LOG - Static variable in class org.apache.nutch.plugin.PluginManifestParser
LOG - Static variable in class org.apache.nutch.plugin.PluginRepository
LOG - Static variable in class org.apache.nutch.protocol.file.File
LOG - Static variable in class org.apache.nutch.protocol.ftp.Ftp
LOG - Static variable in class org.apache.nutch.protocol.http.api.RobotRulesParser
LOG - Static variable in class org.apache.nutch.protocol.http.Http
LOG - Static variable in class org.apache.nutch.protocol.httpclient.Http
LOG - Static variable in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
LOG - Static variable in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
LOG - Static variable in class org.apache.nutch.protocol.ProtocolFactory
LOG - Static variable in class org.apache.nutch.tools.arc.ArcRecordReader
LOG - Static variable in class org.apache.nutch.tools.DmozParser
LOG - Static variable in class org.apache.nutch.tools.ResolveUrls
LOG - Static variable in class org.apache.nutch.util.EncodingDetector
LOG - Static variable in class org.creativecommons.nutch.CCIndexingFilter
LOG - Static variable in class org.creativecommons.nutch.CCParseFilter
logConf() - Method in class org.apache.nutch.protocol.http.api.HttpBase
LogDebugHandler - Class in org.apache.nutch.tools.proxy
LogDebugHandler() - Constructor for class org.apache.nutch.tools.proxy.LogDebugHandler
login(String, String) - Method in class org.apache.nutch.protocol.ftp.Client: Login to the FTP server using the provided username and password.
logout() - Method in class org.apache.nutch.protocol.ftp.Client: Logout of the FTP server by sending the QUIT command.
longestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher: Returns the longest prefix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher: Returns the longest suffix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher: Returns the longest substring of input that is matched by a pattern in the trie, or null if no match exists.




M

m_currentNode - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Current node
m_doc - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Root document
m_docFrag - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
First node of document fragment or null if not a DocumentFragment
m_elemStack - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Vector of element nodes
m_inCData - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Flag indicating that we are processing a CData section
main(String[]) - 
Static method in class org.apache.nutch.api.NutchServer
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.Crawler
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.DbUpdaterJob
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.GeneratorJob
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.InjectorJob
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.WebTableReader
 
main(String[]) - 
Static method in class org.apache.nutch.fetcher.FetcherJob
 
main(String[]) - 
Static method in class org.apache.nutch.host.HostDbReader
 
main(String[]) - 
Static method in class org.apache.nutch.host.HostDbUpdateJob
 
main(String[]) - 
Static method in class org.apache.nutch.host.HostInjectorJob
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.solr.SolrIndexerJob
 
main(String[]) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
main(String[]) - 
Static method in class org.apache.nutch.net.URLFilterChecker
 
main(String[]) - 
Static method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
Spits out patterns and substitutions that are in the configuration file.
main(String[]) - 
Static method in class org.apache.nutch.net.URLNormalizerChecker
 
main(String[]) - 
Static method in class org.apache.nutch.parse.feed.FeedParser
Runs a command line version of this Parser.
main(String[]) - 
Static method in class org.apache.nutch.parse.html.HtmlParser
 
main(String[]) - 
Static method in class org.apache.nutch.parse.js.JSParseFilter
 
main(String[]) - 
Static method in class org.apache.nutch.parse.ParsePluginsReader
Tests parsing of the parse-plugins.xml file.
main(String[]) - 
Static method in class org.apache.nutch.parse.ParserChecker
 
main(String[]) - 
Static method in class org.apache.nutch.parse.ParserJob
 
main(String[]) - 
Static method in class org.apache.nutch.parse.swf.SWFParser
Arguments are: 0.
main(String[]) - 
Static method in class org.apache.nutch.parse.tika.TikaParser
 
main(String[]) - 
Static method in class org.apache.nutch.plugin.PluginRepository
Loads all necessary dependencies for a selected plugin, and then runs one
 of the classes' main() method.
main(String[]) - 
Static method in class org.apache.nutch.protocol.Content
 
main(String[]) - 
Static method in class org.apache.nutch.protocol.file.File
For debugging.
main(String[]) - 
Static method in class org.apache.nutch.protocol.ftp.Ftp
For debugging.
main(HttpBase, String[]) - 
Static method in class org.apache.nutch.protocol.http.api.HttpBase
 
main(String[]) - 
Static method in class org.apache.nutch.protocol.http.api.RobotRulesParser
command-line main for testing
main(String[]) - 
Static method in class org.apache.nutch.protocol.http.Http
 
main(String[]) - 
Static method in class org.apache.nutch.protocol.httpclient.Http
Main method.
main(String[]) - 
Static method in class org.apache.nutch.storage.WebTableCreator
 
main(String[]) - 
Static method in class org.apache.nutch.tools.Benchmark
 
main(String[]) - 
Static method in class org.apache.nutch.tools.DmozParser
Command-line access.
main(String[]) - 
Static method in class org.apache.nutch.tools.proxy.TestbedProxy
 
main(String[]) - 
Static method in class org.apache.nutch.tools.ResolveUrls
Runs the resolve urls tool.
main(RegexURLFilterBase, String[]) - 
Static method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Filter the standard input using a RegexURLFilterBase.
main(String[]) - 
Static method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
 
main(String[]) - 
Static method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
main(String[]) - 
Static method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
main(String[]) - 
Static method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
main(String[]) - 
Static method in class org.apache.nutch.util.CommandRunner
 
main(String[]) - 
Static method in class org.apache.nutch.util.domain.DomainStatistics
 
main(String[]) - 
Static method in class org.apache.nutch.util.PrefixStringMatcher
 
main(String[]) - 
Static method in class org.apache.nutch.util.StringUtil
 
main(String[]) - 
Static method in class org.apache.nutch.util.SuffixStringMatcher
 
main(String[]) - 
Static method in class org.apache.nutch.util.URLUtil
For testing
majorCodes - 
Static variable in interface org.apache.nutch.parse.ParseStatusCodes
 
makeStatus(int) - 
Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
 
makeStatus(int, String) - 
Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
 
makeStatus(int, URL) - 
Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
 
map(String, WebPage, Mapper<String, WebPage, UrlWithScore, NutchWritable>.Context) - 
Method in class org.apache.nutch.crawl.DbUpdateMapper
 
map(String, WebPage, Mapper<String, WebPage, GeneratorJob.SelectorEntry, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.GeneratorMapper
 
map(String, WebPage, Mapper<String, WebPage, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.InjectorJob.InjectorMapper
 
map(LongWritable, Text, Mapper<LongWritable, Text, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.InjectorJob.UrlMapper
 
map(String, WebPage, Mapper<String, WebPage, Text, Text>.Context) - 
Method in class org.apache.nutch.crawl.WebTableReader.WebTableRegexMapper
 
map(String, WebPage, Mapper<String, WebPage, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatMapper
 
map(String, WebPage, Mapper<String, WebPage, IntWritable, FetchEntry>.Context) - 
Method in class org.apache.nutch.fetcher.FetcherJob.FetcherMapper
 
map(String, WebPage, Mapper<String, WebPage, Text, WebPage>.Context) - 
Method in class org.apache.nutch.host.HostDbUpdateJob.Mapper
 
map(LongWritable, Text, Mapper<LongWritable, Text, String, Host>.Context) - 
Method in class org.apache.nutch.host.HostInjectorJob.UrlMapper
 
map(String, WebPage, Mapper<String, WebPage, String, NutchDocument>.Context) - 
Method in class org.apache.nutch.indexer.IndexerJob.IndexerMapper
 
map(String, WebPage, Mapper<String, WebPage, String, WebPage>.Context) - 
Method in class org.apache.nutch.parse.ParserJob.ParserMapper
 
map(String, WebPage, Mapper<String, WebPage, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsMapper
 
mapCopyKey(String) - 
Method in class org.apache.nutch.indexer.solr.SolrMappingReader
 
mapKey(String) - 
Method in class org.apache.nutch.indexer.solr.SolrMappingReader
 
mapKey(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
 
mapKey(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
 
MAPPING_FILE - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
Mark - Enum in org.apache.nutch.storage
 
match(String) - 
Method in class org.apache.nutch.urlfilter.api.RegexRule
Checks if a url matches this rule.
matchChar(TrieStringMatcher.TrieNode, String, int) - 
Method in class org.apache.nutch.util.TrieStringMatcher
Returns the next TrieStringMatcher.TrieNode visited, given that you are at
 node, and the the next character in the input is 
 the idx'th character of s.
matches(String) - 
Method in class org.apache.nutch.util.PrefixStringMatcher
Returns true if the given String is matched by a
 prefix in the trie
matches(String) - 
Method in class org.apache.nutch.util.SuffixStringMatcher
Returns true if the given String is matched by a
 suffix in the trie
matches(String) - 
Method in class org.apache.nutch.util.TrieStringMatcher
Returns true if the given String is matched by a
 pattern in the trie
maxContent - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The length limit for downloaded content, in bytes.
maxInterval - 
Variable in class org.apache.nutch.crawl.AbstractFetchSchedule
 
MD5Signature - Class in org.apache.nutch.crawl
Default implementation of a page signature.
MD5Signature() - 
Constructor for class org.apache.nutch.crawl.MD5Signature
 
Metadata - Class in org.apache.nutch.metadata
A multi-valued metadata container.
Metadata() - 
Constructor for class org.apache.nutch.metadata.Metadata
Constructs a new, empty metadata.
metadata - 
Variable in class org.apache.nutch.storage.Host
 
MetaWrapper - Class in org.apache.nutch.metadata
This is a simple decorator that adds metadata to any Writable-s that can be
 serialized by NutchWritable.
MetaWrapper() - 
Constructor for class org.apache.nutch.metadata.MetaWrapper
 
MetaWrapper(Writable, Configuration) - 
Constructor for class org.apache.nutch.metadata.MetaWrapper
 
MetaWrapper(Metadata, Writable, Configuration) - 
Constructor for class org.apache.nutch.metadata.MetaWrapper
 
MimeUtil - Class in org.apache.nutch.util
 
MimeUtil(Configuration) - 
Constructor for class org.apache.nutch.util.MimeUtil
 
MIN_CONFIDENCE_KEY - 
Static variable in class org.apache.nutch.util.EncodingDetector
 
minorCodes - 
Static variable in class org.apache.nutch.parse.ParseStatusUtils
 
MissingDependencyException - Exception in org.apache.nutch.plugin
MissingDependencyException will be thrown if a plugin
 dependency cannot be found.
MissingDependencyException(Throwable) - 
Constructor for exception org.apache.nutch.plugin.MissingDependencyException
 
MissingDependencyException(String) - 
Constructor for exception org.apache.nutch.plugin.MissingDependencyException
 
MODIFIED - 
Static variable in interface org.apache.nutch.metadata.DublinCore
Date on which the resource was changed.
MoreIndexingFilter - Class in org.apache.nutch.indexer.more
Add (or reset) a few metaData properties as respective fields (if they are
 available), so that they can be displayed by more.jsp (called by search.jsp).
MoreIndexingFilter() - 
Constructor for class org.apache.nutch.indexer.more.MoreIndexingFilter
 
MOVED - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Resource has moved permanently.
msg - 
Variable in class org.apache.nutch.api.JobStatus
 



N

names() - 
Method in class org.apache.nutch.metadata.Metadata
Returns an array of the names contained in the metadata.
newInstance(StateManager) - 
Method in class org.apache.nutch.storage.Host
 
newInstance(StateManager) - 
Method in class org.apache.nutch.storage.ParseStatus
 
newInstance(StateManager) - 
Method in class org.apache.nutch.storage.ProtocolStatus
 
newInstance(StateManager) - 
Method in class org.apache.nutch.storage.WebPage
 
next(Text, BytesWritable) - 
Method in class org.apache.nutch.tools.arc.ArcRecordReader
Returns true if the next record in the split is read into the key and 
 value pair.
nextKeyValue() - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
 
nextNode() - 
Method in class org.apache.nutch.util.NodeWalker
Returns the next Node on the stack and pushes all of its
 children onto the stack, allowing us to walk the node tree without the
 use of recursion.
NO_THRESHOLD - 
Static variable in class org.apache.nutch.util.EncodingDetector
 
nodeChar - 
Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
NodeWalker - Class in org.apache.nutch.util
A utility class that allows the walking of any DOM tree using a stack 
 instead of recursion.
NodeWalker(Node) - 
Constructor for class org.apache.nutch.util.NodeWalker
Starts the Node tree from the root node.
normalize(String, String) - 
Method in class org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
 
normalize(String, String) - 
Method in interface org.apache.nutch.net.URLNormalizer
 
normalize(String, String) - 
Method in class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
 
normalize(String, String) - 
Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
 
normalize(String, String) - 
Method in class org.apache.nutch.net.URLNormalizers
Normalize
normalize() - 
Method in class org.apache.nutch.util.Histogram
 
NOTFETCHING - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Not fetching.
NOTFOUND - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Resource was not found.
NotFoundHandler - Class in org.apache.nutch.tools.proxy
 
NotFoundHandler() - 
Constructor for class org.apache.nutch.tools.proxy.NotFoundHandler
 
NOTMODIFIED - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Unchanged since the last fetch.
NOTPARSED - 
Static variable in interface org.apache.nutch.parse.ParseStatusCodes
Parsing was not performed.
numJobs - 
Variable in class org.apache.nutch.util.NutchTool
 
Nutch - Interface in org.apache.nutch.metadata
A collection of Nutch internal metadata constants.
NutchApp - Class in org.apache.nutch.api
 
NutchApp() - 
Constructor for class org.apache.nutch.api.NutchApp
 
NutchConfiguration - Class in org.apache.nutch.util
Utility to create Hadoop Configurations that include Nutch-specific
 resources.
NutchDocument - Class in org.apache.nutch.indexer
A NutchDocument is the unit of indexing.
NutchDocument() - 
Constructor for class org.apache.nutch.indexer.NutchDocument
 
nutchFetchIntervalMDName - 
Static variable in class org.apache.nutch.crawl.InjectorJob
metadata key reserved for setting a custom fetchInterval for a specific URL
NutchIndexWriter - Interface in org.apache.nutch.indexer
 
NutchIndexWriterFactory - Class in org.apache.nutch.indexer
 
NutchIndexWriterFactory() - 
Constructor for class org.apache.nutch.indexer.NutchIndexWriterFactory
 
NutchJob - Class in org.apache.nutch.util
A Job for Nutch jobs.
NutchJob(Configuration) - 
Constructor for class org.apache.nutch.util.NutchJob
 
NutchJob(Configuration, String) - 
Constructor for class org.apache.nutch.util.NutchJob
 
NutchJobConf - Class in org.apache.nutch.util
A JobConf for Nutch jobs.
NutchJobConf(Configuration) - 
Constructor for class org.apache.nutch.util.NutchJobConf
 
nutchScoreMDName - 
Static variable in class org.apache.nutch.crawl.InjectorJob
metadata key reserved for setting a custom score for a specific URL
NutchServer - Class in org.apache.nutch.api
 
NutchServer(int) - 
Constructor for class org.apache.nutch.api.NutchServer
 
NutchTool - Class in org.apache.nutch.util
 
NutchTool() - 
Constructor for class org.apache.nutch.util.NutchTool
 
NutchWritable - Class in org.apache.nutch.crawl
 
NutchWritable() - 
Constructor for class org.apache.nutch.crawl.NutchWritable
 
NutchWritable(Writable) - 
Constructor for class org.apache.nutch.crawl.NutchWritable
 



O

ObjectCache - Class in org.apache.nutch.util
 
Office - Interface in org.apache.nutch.metadata
A collection of "Office" documents properties names.
open(TaskAttemptContext, String) - 
Method in interface org.apache.nutch.indexer.NutchIndexWriter
 
open(TaskAttemptContext, String) - 
Method in class org.apache.nutch.indexer.solr.SolrWriter
 
OPICScoringFilter - Class in org.apache.nutch.scoring.opic
This plugin implements a variant of an Online Page Importance Computation
 (OPIC) score, described in this paper:
 
 Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003),
 Adaptive On-Line Page Importance Computation
 .
OPICScoringFilter() - 
Constructor for class org.apache.nutch.scoring.opic.OPICScoringFilter
 
org.apache.nutch.analysis.lang - package org.apache.nutch.analysis.lang
Text document language identifier.
org.apache.nutch.api - package org.apache.nutch.api
 
org.apache.nutch.api.impl - package org.apache.nutch.api.impl
 
org.apache.nutch.collection - package org.apache.nutch.collection

Subcollection is a subset of an index.
org.apache.nutch.crawl - package org.apache.nutch.crawl
Crawl control code.
org.apache.nutch.fetcher - package org.apache.nutch.fetcher
The Nutch robot.
org.apache.nutch.host - package org.apache.nutch.host
 
org.apache.nutch.html - package org.apache.nutch.html
 
org.apache.nutch.indexer - package org.apache.nutch.indexer
Maintain Lucene full-text indexes.
org.apache.nutch.indexer.anchor - package org.apache.nutch.indexer.anchor
An indexing plugin for inbound anchor text.
org.apache.nutch.indexer.basic - package org.apache.nutch.indexer.basic
A basic indexing plugin.
org.apache.nutch.indexer.feed - package org.apache.nutch.indexer.feed
 
org.apache.nutch.indexer.more - package org.apache.nutch.indexer.more
A more indexing plugin.
org.apache.nutch.indexer.solr - package org.apache.nutch.indexer.solr
 
org.apache.nutch.indexer.subcollection - package org.apache.nutch.indexer.subcollection
 
org.apache.nutch.indexer.tld - package org.apache.nutch.indexer.tld
Top Level Domain Indexing plugin.
org.apache.nutch.metadata - package org.apache.nutch.metadata
A Multi-valued Metadata container, and set
of constant fields for Nutch Metadata.
org.apache.nutch.microformats.reltag - package org.apache.nutch.microformats.reltag

A microformats Rel-Tag
Parser/Indexer/Querier plugin.
org.apache.nutch.net - package org.apache.nutch.net
 
org.apache.nutch.net.protocols - package org.apache.nutch.net.protocols
 
org.apache.nutch.net.urlnormalizer.basic - package org.apache.nutch.net.urlnormalizer.basic
 
org.apache.nutch.net.urlnormalizer.pass - package org.apache.nutch.net.urlnormalizer.pass
 
org.apache.nutch.net.urlnormalizer.regex - package org.apache.nutch.net.urlnormalizer.regex
 
org.apache.nutch.parse - package org.apache.nutch.parse
 
org.apache.nutch.parse.ext - package org.apache.nutch.parse.ext
 
org.apache.nutch.parse.feed - package org.apache.nutch.parse.feed
 
org.apache.nutch.parse.html - package org.apache.nutch.parse.html
An HTML document parsing plugin.
org.apache.nutch.parse.js - package org.apache.nutch.parse.js
 
org.apache.nutch.parse.swf - package org.apache.nutch.parse.swf
 
org.apache.nutch.parse.tika - package org.apache.nutch.parse.tika
 
org.apache.nutch.parse.zip - package org.apache.nutch.parse.zip
 
org.apache.nutch.plugin - package org.apache.nutch.plugin
The Nutch Plugin System.
org.apache.nutch.protocol - package org.apache.nutch.protocol
 
org.apache.nutch.protocol.file - package org.apache.nutch.protocol.file
Protocol plugin which supports retrieving local file resources.
org.apache.nutch.protocol.ftp - package org.apache.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol.
org.apache.nutch.protocol.http - package org.apache.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol.
org.apache.nutch.protocol.http.api - package org.apache.nutch.protocol.http.api
Common API used by HTTP plugins (http,
httpclient)
org.apache.nutch.protocol.httpclient - package org.apache.nutch.protocol.httpclient
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
org.apache.nutch.protocol.sftp - package org.apache.nutch.protocol.sftp
Protocol plugin which supports retrieving documents via the sftp protocol.
org.apache.nutch.scoring - package org.apache.nutch.scoring
 
org.apache.nutch.scoring.link - package org.apache.nutch.scoring.link
 
org.apache.nutch.scoring.opic - package org.apache.nutch.scoring.opic
 
org.apache.nutch.scoring.tld - package org.apache.nutch.scoring.tld
Top Level Domain Scoring plugin.
org.apache.nutch.storage - package org.apache.nutch.storage
 
org.apache.nutch.tools - package org.apache.nutch.tools
 
org.apache.nutch.tools.arc - package org.apache.nutch.tools.arc
 
org.apache.nutch.tools.proxy - package org.apache.nutch.tools.proxy
 
org.apache.nutch.urlfilter.api - package org.apache.nutch.urlfilter.api
 
org.apache.nutch.urlfilter.automaton - package org.apache.nutch.urlfilter.automaton

A url filter plugin based on
dk.brics.automaton Finite-State
Automata for Java^TM.
org.apache.nutch.urlfilter.domain - package org.apache.nutch.urlfilter.domain
A url filter plugin that filters by domain.
org.apache.nutch.urlfilter.prefix - package org.apache.nutch.urlfilter.prefix
A url filter plugin.
org.apache.nutch.urlfilter.regex - package org.apache.nutch.urlfilter.regex
A url filter plugin.
org.apache.nutch.urlfilter.suffix - package org.apache.nutch.urlfilter.suffix
 
org.apache.nutch.urlfilter.validator - package org.apache.nutch.urlfilter.validator
A url filter plugin that validates given urls.
org.apache.nutch.util - package org.apache.nutch.util
 
org.apache.nutch.util.domain - package org.apache.nutch.util.domain
 org.apache.nutch.util.domain
org.creativecommons.nutch - package org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.
ORIGINAL_CHAR_ENCODING - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
Outlink - Class in org.apache.nutch.parse
 
Outlink() - 
Constructor for class org.apache.nutch.parse.Outlink
 
Outlink(String, String) - 
Constructor for class org.apache.nutch.parse.Outlink
 
OutlinkExtractor - Class in org.apache.nutch.parse
Extractor to extract Outlinks 
 / URLs from plain text using Regular Expressions.
OutlinkExtractor() - 
Constructor for class org.apache.nutch.parse.OutlinkExtractor
 
outlinks - 
Variable in class org.apache.nutch.storage.Host
 



P

padHead(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
 
padTail(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
 
PAGE_COUNT - 
Static variable in interface org.apache.nutch.metadata.Office
 
Pair<F,S> - Class in org.apache.nutch.util
 
Pair(F, S) - 
Constructor for class org.apache.nutch.util.Pair
 
Params - Interface in org.apache.nutch.api
 
parse(InputStream) - 
Method in class org.apache.nutch.collection.CollectionManager
 
Parse - Class in org.apache.nutch.parse
 
Parse() - 
Constructor for class org.apache.nutch.parse.Parse
 
Parse(String, String, Outlink[], ParseStatus) - 
Constructor for class org.apache.nutch.parse.Parse
 
parse(Configuration) - 
Method in class org.apache.nutch.parse.ParsePluginsReader
Reads the parse-plugins.xml file and returns the
 #ParsePluginList defined by it.
parse(String, boolean, boolean) - 
Method in class org.apache.nutch.parse.ParserJob
 
parse(String, WebPage) - 
Method in class org.apache.nutch.parse.ParseUtil
Performs a parse by iterating through a List of preferred Parsers
 until a successful parse is performed and a Parse object is
 returned.
PARSE_KEY - 
Static variable in class org.apache.nutch.fetcher.FetcherJob
 
parseCharacterEncoding(Utf8) - 
Static method in class org.apache.nutch.util.EncodingDetector
Parse the character encoding from the specified content type header.
parseDmozFile(File, int, boolean, int, Pattern) - 
Method in class org.apache.nutch.tools.DmozParser
Iterate through all the items in this structured DMOZ file.
ParseException - Exception in org.apache.nutch.parse
 
ParseException() - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseException(String) - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseException(String, Throwable) - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseException(Throwable) - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseFilter - Interface in org.apache.nutch.parse
Extension point for DOM-based parsers.
ParseFilters - Class in org.apache.nutch.parse
Creates and caches ParseFilter implementing plugins.
ParseFilters(Configuration) - 
Constructor for class org.apache.nutch.parse.ParseFilters
 
parseList(ArrayList, String) - 
Method in class org.apache.nutch.collection.Subcollection
Create a list of patterns from chunk of text, patterns are separated with
 newline
parsePluginFolder(String[]) - 
Method in class org.apache.nutch.plugin.PluginManifestParser
Returns a list of all found plugin descriptors.
ParsePluginList - Class in org.apache.nutch.parse
This class represents a natural ordering for which parsing plugin should get
 called for a particular mimeType.
ParsePluginsReader - Class in org.apache.nutch.parse
A reader to load the information stored in the
 $NUTCH_HOME/conf/parse-plugins.xml file.
ParsePluginsReader() - 
Constructor for class org.apache.nutch.parse.ParsePluginsReader
Constructs a new ParsePluginsReader
Parser - Interface in org.apache.nutch.parse
A parser for content generated by a Protocol
 implementation.
ParserChecker - Class in org.apache.nutch.parse
Parser checker, useful for testing parser.
ParserChecker() - 
Constructor for class org.apache.nutch.parse.ParserChecker
 
ParserFactory - Class in org.apache.nutch.parse
Creates and caches Parser plugins.
ParserFactory(Configuration) - 
Constructor for class org.apache.nutch.parse.ParserFactory
 
ParserJob - Class in org.apache.nutch.parse
 
ParserJob() - 
Constructor for class org.apache.nutch.parse.ParserJob
 
ParserJob(Configuration) - 
Constructor for class org.apache.nutch.parse.ParserJob
 
ParserJob.ParserMapper - Class in org.apache.nutch.parse
 
ParserJob.ParserMapper() - 
Constructor for class org.apache.nutch.parse.ParserJob.ParserMapper
 
ParserNotFound - Exception in org.apache.nutch.parse
 
ParserNotFound(String) - 
Constructor for exception org.apache.nutch.parse.ParserNotFound
 
ParserNotFound(String, String) - 
Constructor for exception org.apache.nutch.parse.ParserNotFound
 
ParserNotFound(String, String, String) - 
Constructor for exception org.apache.nutch.parse.ParserNotFound
 
ParseStatus - Class in org.apache.nutch.storage
 
ParseStatus() - 
Constructor for class org.apache.nutch.storage.ParseStatus
 
ParseStatus(StateManager) - 
Constructor for class org.apache.nutch.storage.ParseStatus
 
ParseStatus.Field - Enum in org.apache.nutch.storage
 
ParseStatusCodes - Interface in org.apache.nutch.parse
 
ParseStatusUtils - Class in org.apache.nutch.parse
 
ParseStatusUtils() - 
Constructor for class org.apache.nutch.parse.ParseStatusUtils
 
ParseUtil - Class in org.apache.nutch.parse
A Utility class containing methods to simply perform parsing utilities such
 as iterating through a preferred list of Parsers to obtain
 Parse objects.
ParseUtil(Configuration) - 
Constructor for class org.apache.nutch.parse.ParseUtil
 
PARTITION_MODE_DOMAIN - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PARTITION_MODE_HOST - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PARTITION_MODE_IP - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PARTITION_MODE_KEY - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PARTITION_URL_SEED - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PassURLNormalizer - Class in org.apache.nutch.net.urlnormalizer.pass
This URLNormalizer doesn't change urls.
PassURLNormalizer() - 
Constructor for class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
 
PATH - 
Static variable in class org.apache.nutch.api.AdminResource
 
PATH - 
Static variable in class org.apache.nutch.api.ConfResource
 
PATH - 
Static variable in class org.apache.nutch.api.DbResource
 
PATH - 
Static variable in class org.apache.nutch.api.JobResource
 
PERM_REFRESH_TIME - 
Static variable in class org.apache.nutch.fetcher.FetcherJob
 
Pluggable - Interface in org.apache.nutch.plugin
Defines the capability of a class to be plugged into Nutch.
Plugin - Class in org.apache.nutch.plugin
A nutch-plugin is an container for a set of custom logic that provide
 extensions to the nutch core functionality or another plugin that provides an
 API for extending.
Plugin(PluginDescriptor, Configuration) - 
Constructor for class org.apache.nutch.plugin.Plugin
Constructor
PluginClassLoader - Class in org.apache.nutch.plugin
The PluginClassLoader contains only classes of the runtime
 libraries setuped in the plugin manifest file and exported libraries of
 plugins that are required pluguin.
PluginClassLoader(URL[], ClassLoader) - 
Constructor for class org.apache.nutch.plugin.PluginClassLoader
Construtor
PluginDescriptor - Class in org.apache.nutch.plugin
The PluginDescriptor provide access to all meta information of
 a nutch-plugin, as well to the internationalizable resources and the plugin
 own classloader.
PluginDescriptor(String, String, String, String, String, String, Configuration) - 
Constructor for class org.apache.nutch.plugin.PluginDescriptor
Constructor
PluginManifestParser - Class in org.apache.nutch.plugin
The PluginManifestParser parser just parse the manifest file
 in all plugin directories.
PluginManifestParser(Configuration, PluginRepository) - 
Constructor for class org.apache.nutch.plugin.PluginManifestParser
 
PluginRepository - Class in org.apache.nutch.plugin
The plugin repositority is a registry of all plugins.
PluginRepository(Configuration) - 
Constructor for class org.apache.nutch.plugin.PluginRepository
 
PluginRuntimeException - Exception in org.apache.nutch.plugin
PluginRuntimeException will be thrown until a exception in the
 plugin managemnt occurs.
PluginRuntimeException(Throwable) - 
Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
 
PluginRuntimeException(String) - 
Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
 
pos - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
PrefixStringMatcher - Class in org.apache.nutch.util
A class for efficiently matching Strings against a set
 of prefixes.
PrefixStringMatcher(String[]) - 
Constructor for class org.apache.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any prefix in the supplied array.
PrefixStringMatcher(Collection) - 
Constructor for class org.apache.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any prefix in the supplied    
 Collection.
PrefixURLFilter - Class in org.apache.nutch.urlfilter.prefix
Filters URLs based on a file of URL prefixes.
PrefixURLFilter() - 
Constructor for class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
PrefixURLFilter(String) - 
Constructor for class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
PrintCommandListener - Class in org.apache.nutch.protocol.ftp
This is a support class for logging all ftp command/reply traffic.
PrintCommandListener(Logger) - 
Constructor for class org.apache.nutch.protocol.ftp.PrintCommandListener
 
process(String, WebPage) - 
Method in class org.apache.nutch.parse.ParseUtil
Parses given web page and stores parsed content within page.
processDeflateEncoded(byte[], URL) - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
processDumpJob(String, Configuration, String, boolean, boolean, boolean, boolean) - 
Method in class org.apache.nutch.crawl.WebTableReader
 
processGzipEncoded(byte[], URL) - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
processingInstruction(String, String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of a processing instruction.
processStatJob(boolean) - 
Method in class org.apache.nutch.crawl.WebTableReader
 
PROP_NAME - 
Static variable in interface org.apache.nutch.api.Params
 
PROP_VALUE - 
Static variable in interface org.apache.nutch.api.Params
 
PROPS - 
Static variable in interface org.apache.nutch.api.Params
 
PROTO_NOT_FOUND - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
This protocol was not found.
PROTO_STATUS_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
Protocol - Interface in org.apache.nutch.protocol
A retriever of url content.
PROTOCOL_REDIR - 
Static variable in class org.apache.nutch.fetcher.FetcherJob
 
protocolCommandSent(ProtocolCommandEvent) - 
Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
 
ProtocolException - Exception in org.apache.nutch.net.protocols
Deprecated. Use ProtocolException instead.
ProtocolException() - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException(String) - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException(String, Throwable) - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException(Throwable) - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException - Exception in org.apache.nutch.protocol
 
ProtocolException() - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolException(String) - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolException(String, Throwable) - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolException(Throwable) - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolFactory - Class in org.apache.nutch.protocol
Creates and caches Protocol plugins.
ProtocolFactory(Configuration) - 
Constructor for class org.apache.nutch.protocol.ProtocolFactory
 
ProtocolNotFound - Exception in org.apache.nutch.protocol
 
ProtocolNotFound(String) - 
Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
 
ProtocolNotFound(String, String) - 
Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
 
ProtocolOutput - Class in org.apache.nutch.protocol
Simple aggregate to pass from protocol plugins both content and
 protocol status.
ProtocolOutput(Content, ProtocolStatus) - 
Constructor for class org.apache.nutch.protocol.ProtocolOutput
 
ProtocolOutput(Content) - 
Constructor for class org.apache.nutch.protocol.ProtocolOutput
 
protocolReplyReceived(ProtocolCommandEvent) - 
Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
 
ProtocolStatus - Class in org.apache.nutch.storage
 
ProtocolStatus() - 
Constructor for class org.apache.nutch.storage.ProtocolStatus
 
ProtocolStatus(StateManager) - 
Constructor for class org.apache.nutch.storage.ProtocolStatus
 
ProtocolStatus.Field - Enum in org.apache.nutch.storage
 
ProtocolStatusCodes - Interface in org.apache.nutch.protocol
 
ProtocolStatusUtils - Class in org.apache.nutch.protocol
 
ProtocolStatusUtils() - 
Constructor for class org.apache.nutch.protocol.ProtocolStatusUtils
 
proxyHost - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The proxy hostname.
proxyPort - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The proxy port.
PUBLISHER - 
Static variable in interface org.apache.nutch.metadata.DublinCore
An entity responsible for making the resource available.
put(String, Host) - 
Method in class org.apache.nutch.host.HostDb
 
put(int, Object) - 
Method in class org.apache.nutch.storage.Host
 
put(int, Object) - 
Method in class org.apache.nutch.storage.ParseStatus
 
put(int, Object) - 
Method in class org.apache.nutch.storage.ProtocolStatus
 
put(int, Object) - 
Method in class org.apache.nutch.storage.WebPage
 
putByte(byte[], int, byte) - 
Static method in class org.apache.nutch.util.Bytes
Write a single byte out to the specified byte array position.
putBytes(byte[], int, byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Put bytes at the specified byte array position.
putDouble(byte[], int, double) - 
Static method in class org.apache.nutch.util.Bytes
 
putFloat(byte[], int, float) - 
Static method in class org.apache.nutch.util.Bytes
 
putInt(byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Put an int value out to the specified byte array position.
putLong(byte[], int, long) - 
Static method in class org.apache.nutch.util.Bytes
Put a long value out to the specified byte array position.
putMark(WebPage, Utf8) - 
Method in enum org.apache.nutch.storage.Mark
 
putMark(WebPage, String) - 
Method in enum org.apache.nutch.storage.Mark
 
putShort(byte[], int, short) - 
Static method in class org.apache.nutch.util.Bytes
Put a short value out to the specified byte array position.
putToHeaders(Utf8, Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
putToInlinks(Utf8, Utf8) - 
Method in class org.apache.nutch.storage.Host
 
putToInlinks(Utf8, Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
putToMarkers(Utf8, Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
putToMetadata(Utf8, ByteBuffer) - 
Method in class org.apache.nutch.storage.Host
 
putToMetadata(Utf8, ByteBuffer) - 
Method in class org.apache.nutch.storage.WebPage
 
putToOutlinks(Utf8, Utf8) - 
Method in class org.apache.nutch.storage.Host
 
putToOutlinks(Utf8, Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 



R

RAMConfManager - Class in org.apache.nutch.api.impl
 
RAMConfManager() - 
Constructor for class org.apache.nutch.api.impl.RAMConfManager
 
RAMJobManager - Class in org.apache.nutch.api.impl
 
RAMJobManager() - 
Constructor for class org.apache.nutch.api.impl.RAMJobManager
 
read(DataInput) - 
Static method in class org.apache.nutch.parse.Outlink
 
read(DataInput) - 
Static method in class org.apache.nutch.protocol.Content
 
readByteArray(DataInput) - 
Static method in class org.apache.nutch.util.Bytes
Read byte-array written with a WritableableUtils.vint prefix.
readByteArrayThrowsRuntime(DataInput) - 
Static method in class org.apache.nutch.util.Bytes
Read byte-array written with a WritableableUtils.vint prefix.
readConfiguration(Reader) - 
Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
readFields(DataInput) - 
Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
 
readFields(DataInput) - 
Method in class org.apache.nutch.crawl.UrlWithScore
 
readFields(DataInput) - 
Method in class org.apache.nutch.fetcher.FetchEntry
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
readFields(DataInput) - 
Method in class org.apache.nutch.metadata.Metadata
 
readFields(DataInput) - 
Method in class org.apache.nutch.metadata.MetaWrapper
 
readFields(DataInput) - 
Method in class org.apache.nutch.parse.Outlink
 
readFields(DataInput) - 
Method in class org.apache.nutch.protocol.Content
 
readFields(DataInput) - 
Method in class org.apache.nutch.scoring.ScoreDatum
 
readFields(DataInput) - 
Method in class org.apache.nutch.util.GenericWritableConfigurable
 
readFields(DataInput) - 
Method in class org.apache.nutch.util.WebPageWritable
 
readSolrDocument(SolrDocument) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
readVLong(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
Reads a zero-compressed encoded long from input stream and returns it.
recordJobStatus(String, Job, Map<String, Object>) - 
Static method in class org.apache.nutch.util.ToolUtil
 
REDIR_EXCEEDED - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Too many redirects.
REDIRECT_DISCOVERED - 
Static variable in class org.apache.nutch.fetcher.FetcherJob
 
reduce(UrlWithScore, Iterable<NutchWritable>, Reducer<UrlWithScore, NutchWritable, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.DbUpdateReducer
 
reduce(GeneratorJob.SelectorEntry, Iterable<WebPage>, Reducer<GeneratorJob.SelectorEntry, WebPage, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.GeneratorReducer
 
reduce(Text, Iterable<LongWritable>, Reducer<Text, LongWritable, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatCombiner
 
reduce(Text, Iterable<LongWritable>, Reducer<Text, LongWritable, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatReducer
 
reduce(Text, Iterable<WebPage>, Reducer<Text, WebPage, String, Host>.Context) - 
Method in class org.apache.nutch.host.HostDbUpdateReducer
 
reduce(Text, Iterable<SolrDeleteDuplicates.SolrRecord>, Reducer<Text, SolrDeleteDuplicates.SolrRecord, Text, SolrDeleteDuplicates.SolrRecord>.Context) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
reduce(Text, Iterable<LongWritable>, Reducer<Text, LongWritable, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsCombiner
 
reduce(Text, Iterable<LongWritable>, Reducer<Text, LongWritable, LongWritable, Text>.Context) - 
Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsReducer
 
reduce(String, Iterable<WebPage>, Reducer<String, WebPage, String, WebPage>.Context) - 
Method in class org.apache.nutch.util.IdentityPageReducer
 
regexNormalize(String, String) - 
Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
This function does the replacements by iterating through all the regex
 patterns.
RegexRule - Class in org.apache.nutch.urlfilter.api
A generic regular expression rule.
RegexRule(boolean, String) - 
Constructor for class org.apache.nutch.urlfilter.api.RegexRule
Constructs a new regular expression rule.
RegexURLFilter - Class in org.apache.nutch.urlfilter.regex
Filters URLs based on a file of regular expressions using the
 Java Regex implementation.
RegexURLFilter() - 
Constructor for class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
RegexURLFilter(String) - 
Constructor for class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
RegexURLFilterBase - Class in org.apache.nutch.urlfilter.api
Generic URL filter based on
 regular expressions.
RegexURLFilterBase() - 
Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Constructs a new empty RegexURLFilterBase
RegexURLFilterBase(File) - 
Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Constructs a new RegexURLFilter and init it with a file of rules.
RegexURLFilterBase(String) - 
Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Constructs a new RegexURLFilter and inits it with a list of rules.
RegexURLFilterBase(Reader) - 
Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Constructs a new RegexURLFilter and init it with a Reader of rules.
RegexURLNormalizer - Class in org.apache.nutch.net.urlnormalizer.regex
Allows users to do regex substitutions on all/any URLs that are encountered,
 which is useful for stripping session IDs from URLs.
RegexURLNormalizer() - 
Constructor for class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
The default constructor which is called from UrlNormalizerFactory
 (normalizerClass.newInstance()) in method: getNormalizer()*
RegexURLNormalizer(Configuration) - 
Constructor for class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
 
RegexURLNormalizer(Configuration, String) - 
Constructor for class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
Constructor which can be passed the file name, so it doesn't look in the
 configuration files for it.
REL_TAG - 
Static variable in class org.apache.nutch.microformats.reltag.RelTagParser
 
RELATION - 
Static variable in interface org.apache.nutch.metadata.DublinCore
A reference to a related resource.
RelTagIndexingFilter - Class in org.apache.nutch.microformats.reltag
An IndexingFilter that add tag
 field(s) to the document.
RelTagIndexingFilter() - 
Constructor for class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
 
RelTagParser - Class in org.apache.nutch.microformats.reltag
Adds microformat rel-tags of document if found.
RelTagParser() - 
Constructor for class org.apache.nutch.microformats.reltag.RelTagParser
 
remove() - 
Method in class org.apache.nutch.api.ConfResource
 
remove(String) - 
Method in class org.apache.nutch.metadata.Metadata
Remove a metadata and all its associated values.
remove(String) - 
Method in class org.apache.nutch.metadata.SpellCheckedMetadata
 
removeField(String) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
removeFromHeaders(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
removeFromInlinks(Utf8) - 
Method in class org.apache.nutch.storage.Host
 
removeFromInlinks(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
removeFromMarkers(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
removeFromMetadata(Utf8) - 
Method in class org.apache.nutch.storage.Host
 
removeFromMetadata(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
removeFromOutlinks(Utf8) - 
Method in class org.apache.nutch.storage.Host
 
removeFromOutlinks(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
removeLockFile(FileSystem, Path) - 
Static method in class org.apache.nutch.util.LockUtil
Remove lock file.
removeMark(WebPage) - 
Method in enum org.apache.nutch.storage.Mark
 
removeMarkIfExist(WebPage) - 
Method in enum org.apache.nutch.storage.Mark
Remove the mark only if the mark is present on the page.
replace(FileSystem, Path, Path, boolean) - 
Static method in class org.apache.nutch.util.FSUtils
Replaces the current path with the new path and if set removes the old
 path.
REPR_URL_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
reset() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets all boolean values to false.
resolveEncodingAlias(String) - 
Static method in class org.apache.nutch.util.EncodingDetector
 
ResolveUrls - Class in org.apache.nutch.tools
A simple tool that will spin up multiple threads to resolve urls to ip
 addresses.
ResolveUrls(String) - 
Constructor for class org.apache.nutch.tools.ResolveUrls
Create a new ResolveUrls with a file from the local file system.
ResolveUrls(String, int) - 
Constructor for class org.apache.nutch.tools.ResolveUrls
Create a new ResolveUrls with a urls file and a number of threads for the
 Thread pool.
resolveUrls() - 
Method in class org.apache.nutch.tools.ResolveUrls
Creates a thread pool for resolving urls.
Response - Interface in org.apache.nutch.net.protocols
A response inteface.
result - 
Variable in class org.apache.nutch.api.JobStatus
 
results - 
Variable in class org.apache.nutch.util.NutchTool
 
RESUME_KEY - 
Static variable in class org.apache.nutch.fetcher.FetcherJob
 
retrieve() - 
Method in class org.apache.nutch.api.APIInfoResource
 
retrieve() - 
Method in class org.apache.nutch.api.ConfResource
 
retrieve() - 
Method in class org.apache.nutch.api.JobResource
 
retrieveFile(String, OutputStream, int) - 
Method in class org.apache.nutch.protocol.ftp.Client
 
retrieveList(String, List, int, FTPFileEntryParser) - 
Method in class org.apache.nutch.protocol.ftp.Client
 
RETRY - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Temporary failure.
reverseHost(String) - 
Static method in class org.apache.nutch.util.TableUtil
 
reverseUrl(String) - 
Static method in class org.apache.nutch.util.TableUtil
Reverses a url's domain.
reverseUrl(URL) - 
Static method in class org.apache.nutch.util.TableUtil
Reverses a url's domain.
REVISION_NUMBER - 
Static variable in interface org.apache.nutch.metadata.Office
 
rightPad(String, int) - 
Static method in class org.apache.nutch.util.StringUtil
Returns a copy of s padded with trailing spaces so
 that it's length is length.
RIGHTS - 
Static variable in interface org.apache.nutch.metadata.DublinCore
Information about rights held in and over the resource.
RobotRules - Interface in org.apache.nutch.protocol
This class holds the rules which were parsed from a robots.txt file, and can
 test paths against those rules.
RobotRulesParser - Class in org.apache.nutch.protocol.http.api
This class handles the parsing of robots.txt files.
RobotRulesParser(Configuration) - 
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser
 
RobotRulesParser.RobotRuleSet - Class in org.apache.nutch.protocol.http.api
This class holds the rules which were parsed from a robots.txt
 file, and can test paths against those rules.
RobotRulesParser.RobotRuleSet() - 
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
 
ROBOTS_DENIED - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Access denied by robots.txt rules.
root - 
Variable in class org.apache.nutch.util.TrieStringMatcher
 
RULES - 
Static variable in class org.apache.nutch.protocol.EmptyRobotRules
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.crawl.Crawler
 
run(String[]) - 
Method in class org.apache.nutch.crawl.Crawler
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.crawl.DbUpdaterJob
 
run(String[]) - 
Method in class org.apache.nutch.crawl.DbUpdaterJob
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.crawl.GeneratorJob
 
run(String[]) - 
Method in class org.apache.nutch.crawl.GeneratorJob
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.crawl.InjectorJob
 
run(String[]) - 
Method in class org.apache.nutch.crawl.InjectorJob
 
run(String[]) - 
Method in class org.apache.nutch.crawl.WebTableReader
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.crawl.WebTableReader
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.fetcher.FetcherJob
 
run(String[]) - 
Method in class org.apache.nutch.fetcher.FetcherJob
 
run(Reducer<IntWritable, FetchEntry, String, WebPage>.Context) - 
Method in class org.apache.nutch.fetcher.FetcherReducer
 
run(String[]) - 
Method in class org.apache.nutch.host.HostDbReader
 
run(String[]) - 
Method in class org.apache.nutch.host.HostDbUpdateJob
 
run(String[]) - 
Method in class org.apache.nutch.host.HostInjectorJob
 
run(String[]) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.indexer.solr.SolrIndexerJob
 
run(String[]) - 
Method in class org.apache.nutch.indexer.solr.SolrIndexerJob
 
run(String[]) - 
Method in class org.apache.nutch.parse.ParserChecker
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.parse.ParserJob
 
run(String[]) - 
Method in class org.apache.nutch.parse.ParserJob
 
run(String[]) - 
Method in class org.apache.nutch.tools.Benchmark
 
run(String[]) - 
Method in class org.apache.nutch.util.domain.DomainStatistics
 
run(Map<String, Object>) - 
Method in class org.apache.nutch.util.NutchTool
Runs the tool, using a map of arguments.



S

save() - 
Method in class org.apache.nutch.collection.CollectionManager
Save collections into file
saveDom(OutputStream, Element) - 
Static method in class org.apache.nutch.util.DomUtil
save dom into ouputstream
SCOPE_CRAWLDB - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used when updating the CrawlDb with new URLs.
SCOPE_DEFAULT - 
Static variable in class org.apache.nutch.net.URLNormalizers
Default scope.
SCOPE_FETCHER - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by FetcherJob when processing
 redirect URLs.
SCOPE_GENERATE_HOST_COUNT - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by GeneratorJob.
SCOPE_INJECT - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by InjectorJob.
SCOPE_LINKDB - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used when updating the LinkDb with new URLs.
SCOPE_OUTLINK - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used when constructing new Outlink instances.
SCOPE_PARTITION - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by URLPartitioner.
SCORE_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
ScoreDatum - Class in org.apache.nutch.scoring
 
ScoreDatum() - 
Constructor for class org.apache.nutch.scoring.ScoreDatum
 
ScoreDatum(float, String, String) - 
Constructor for class org.apache.nutch.scoring.ScoreDatum
 
ScoringFilter - Interface in org.apache.nutch.scoring
A contract defining behavior of scoring plugins.
ScoringFilterException - Exception in org.apache.nutch.scoring
Specialized exception for errors during scoring.
ScoringFilterException() - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilterException(String) - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilterException(String, Throwable) - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilterException(Throwable) - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilters - Class in org.apache.nutch.scoring
Creates and caches ScoringFilter implementing plugins.
ScoringFilters(Configuration) - 
Constructor for class org.apache.nutch.scoring.ScoringFilters
 
SECONDS_PER_DAY - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
 
SEGMENT_NAME_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
sendNoOp() - 
Method in class org.apache.nutch.protocol.ftp.Client
Sends a NOOP command to the FTP server.
server - 
Static variable in class org.apache.nutch.api.NutchApp
 
SERVER_URL - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
set(String, String) - 
Method in class org.apache.nutch.metadata.Metadata
Set metadata name/value.
set(String, String) - 
Method in class org.apache.nutch.metadata.SpellCheckedMetadata
 
setAll(Properties) - 
Method in class org.apache.nutch.metadata.Metadata
Copy All key-value pairs from properties.
setBaseHref(URL) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the baseHref.
setBaseUrl(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
setBlackList(String) - 
Method in class org.apache.nutch.collection.Subcollection
Set contents of blacklist from String
setClazz(String) - 
Method in class org.apache.nutch.plugin.Extension
Sets the Class that implement the concret extension and is only used until
 model creation at system start up.
setCode(int) - 
Method in class org.apache.nutch.storage.ProtocolStatus
 
setCommand(String) - 
Method in class org.apache.nutch.util.CommandRunner
 
setConf(Configuration) - 
Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
 
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
 
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.URLPartitioner.FetchEntryPartitioner
 
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
 
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.URLPartitioner
 
setConf(Configuration) - 
Method in class org.apache.nutch.host.HostDbUpdateJob
 
setConf(Configuration) - 
Method in class org.apache.nutch.host.HostInjectorJob
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.feed.FeedIndexingFilter
Sets the Configuration object used to configure this
 IndexingFilter.
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.microformats.reltag.RelTagParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
 
setConf(Configuration) - 
Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.ext.ExtParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.feed.FeedParser
Sets the Configuration object for this Parser.
setConf(Configuration) - 
Method in class org.apache.nutch.parse.html.DOMContentUtils
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.html.HtmlParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.js.JSParseFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.ParserChecker
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.ParserJob
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.ParseUtil
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.swf.SWFParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.tika.DOMContentUtils
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.tika.TikaParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.zip.ZipParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.file.File
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.http.Http
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.httpclient.Http
Reads the configuration from the Nutch configuration files and sets the
 configuration.
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.sftp.Sftp
 
setConf(Configuration) - 
Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
 
setConf(Configuration) - 
Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
Sets the configuration.
setConf(Configuration) - 
Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.urlfilter.validator.UrlValidator
 
setConf(Configuration) - 
Method in class org.apache.nutch.util.domain.DomainStatistics
 
setConf(Configuration) - 
Method in class org.apache.nutch.util.GenericWritableConfigurable
 
setConf(Configuration) - 
Method in class org.creativecommons.nutch.CCIndexingFilter
 
setConf(Configuration) - 
Method in class org.creativecommons.nutch.CCParseFilter
 
setContent(byte[]) - 
Method in class org.apache.nutch.protocol.Content
 
setContent(Content) - 
Method in class org.apache.nutch.protocol.ProtocolOutput
 
setContent(ByteBuffer) - 
Method in class org.apache.nutch.storage.WebPage
 
setContentType(String) - 
Method in class org.apache.nutch.protocol.Content
 
setContentType(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
setCrawlDelay(long) - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
Set Crawl-Delay, in milliseconds
setDataTimeout(int) - 
Method in class org.apache.nutch.protocol.ftp.Client
Sets the timeout in milliseconds to use for data connection.
setDatum(WebPage) - 
Method in class org.apache.nutch.crawl.URLWebPage
 
setDescriptor(PluginDescriptor) - 
Method in class org.apache.nutch.plugin.Extension
Sets the plugin descriptor and is only used until model creation at system
 start up.
setDocumentLocator(Locator) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive an object for locating the origin of SAX document events.
setExpireTime(long) - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
Change when the ruleset goes stale.
setFetchInterval(int) - 
Method in class org.apache.nutch.storage.WebPage
 
setFetchSchedule(String, WebPage, long, long, long, long, int) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
Sets the fetchInterval and fetchTime on a
 successfully fetched page.
setFetchSchedule(String, WebPage, long, long, long, long, int) - 
Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
 
setFetchSchedule(String, WebPage, long, long, long, long, int) - 
Method in class org.apache.nutch.crawl.DefaultFetchSchedule
 
setFetchSchedule(String, WebPage, long, long, long, long, int) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
Sets the fetchInterval and fetchTime on a
 successfully fetched page.
setFetchTime(long) - 
Method in class org.apache.nutch.storage.WebPage
 
setFileType(int) - 
Method in class org.apache.nutch.protocol.ftp.Client
Sets the file type to be transferred.
setFilterFromPath(boolean) - 
Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
setFollowTalk(boolean) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set followTalk
setFParsePluginsFile(String) - 
Method in class org.apache.nutch.parse.ParsePluginsReader
 
setId(String) - 
Method in class org.apache.nutch.plugin.Extension
Sets the unique extension Id and is only used until model creation at
 system start up.
setIDAttribute(String, Element) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Set an ID string to node association in the ID table.
setIgnoreCase(boolean) - 
Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
setInputStream(InputStream) - 
Method in class org.apache.nutch.util.CommandRunner
 
setKeepConnection(boolean) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set keepConnection
setLastModified(long) - 
Method in class org.apache.nutch.storage.ProtocolStatus
 
setMajorCode(int) - 
Method in class org.apache.nutch.storage.ParseStatus
 
setMaxContentLength(int) - 
Method in class org.apache.nutch.protocol.file.File
Set the point at which content is truncated.
setMaxContentLength(int) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set the point at which content is truncated.
setMeta(String, String) - 
Method in class org.apache.nutch.metadata.MetaWrapper
Set metadata.
setMeta(String, byte[]) - 
Method in class org.apache.nutch.scoring.ScoreDatum
 
setMetadata(Metadata) - 
Method in class org.apache.nutch.protocol.Content
Other protocol-specific data.
setMinorCode(int) - 
Method in class org.apache.nutch.storage.ParseStatus
 
setModeAccept(boolean) - 
Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
setModifiedTime(long) - 
Method in class org.apache.nutch.storage.WebPage
 
setNoCache() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noCache to true.
setNoFollow() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noFollow to true.
setNoIndex() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noIndex to true.
setObject(String, Object) - 
Method in class org.apache.nutch.util.ObjectCache
 
setOutlinks(Outlink[]) - 
Method in class org.apache.nutch.parse.Parse
 
setPageGoneSchedule(String, WebPage, long, long, long) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
This method specifies how to schedule refetching of pages
 marked as GONE.
setPageGoneSchedule(String, WebPage, long, long, long) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
This method specifies how to schedule refetching of pages
 marked as GONE.
setPageRetrySchedule(String, WebPage, long, long, long) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
This method adjusts the fetch schedule if fetching needs to be
 re-tried due to transient errors.
setPageRetrySchedule(String, WebPage, long, long, long) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
This method adjusts the fetch schedule if fetching needs to be
 re-tried due to transient errors.
setParseStatus(ParseStatus) - 
Method in class org.apache.nutch.parse.Parse
 
setParseStatus(ParseStatus) - 
Method in class org.apache.nutch.storage.WebPage
 
setPrevFetchTime(long) - 
Method in class org.apache.nutch.storage.WebPage
 
setPrevSignature(ByteBuffer) - 
Method in class org.apache.nutch.storage.WebPage
 
setProperty(String, String, String) - 
Method in interface org.apache.nutch.api.ConfManager
 
setProperty(String, String, String) - 
Method in class org.apache.nutch.api.impl.RAMConfManager
 
setProtocolStatus(ProtocolStatus) - 
Method in class org.apache.nutch.storage.WebPage
 
setRefresh(boolean) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets refresh to the supplied value.
setRefreshHref(URL) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the refreshHref.
setRefreshTime(int) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the refreshTime.
setRemoteVerificationEnabled(boolean) - 
Method in class org.apache.nutch.protocol.ftp.Client
Enable or disable verification that the remote host taking part
 of a data connection is the same as the host to which the control
 connection is attached.
setReprUrl(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
setRetriesSinceFetch(int) - 
Method in class org.apache.nutch.storage.WebPage
 
setScore(FloatWritable) - 
Method in class org.apache.nutch.crawl.UrlWithScore
 
setScore(float) - 
Method in class org.apache.nutch.crawl.UrlWithScore
 
setScore(float) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
setScore(float) - 
Method in class org.apache.nutch.scoring.ScoreDatum
 
setScore(float) - 
Method in class org.apache.nutch.storage.WebPage
 
setSignature(ByteBuffer) - 
Method in class org.apache.nutch.storage.WebPage
 
setStatus(ProtocolStatus) - 
Method in class org.apache.nutch.protocol.ProtocolOutput
 
setStatus(int) - 
Method in class org.apache.nutch.storage.WebPage
 
setStdErrorStream(OutputStream) - 
Method in class org.apache.nutch.util.CommandRunner
 
setStdOutputStream(OutputStream) - 
Method in class org.apache.nutch.util.CommandRunner
 
setText(String) - 
Method in class org.apache.nutch.parse.Parse
 
setText(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
setTimeout(int) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set the timeout.
setTimeout(int) - 
Method in class org.apache.nutch.util.CommandRunner
 
setTitle(String) - 
Method in class org.apache.nutch.parse.Parse
 
setTitle(Utf8) - 
Method in class org.apache.nutch.storage.WebPage
 
setup(Mapper<String, WebPage, UrlWithScore, NutchWritable>.Context) - 
Method in class org.apache.nutch.crawl.DbUpdateMapper
 
setup(Reducer<UrlWithScore, NutchWritable, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.DbUpdateReducer
 
setup(Mapper<String, WebPage, GeneratorJob.SelectorEntry, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.GeneratorMapper
 
setup(Reducer<GeneratorJob.SelectorEntry, WebPage, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.GeneratorReducer
 
setup(Mapper<String, WebPage, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.InjectorJob.InjectorMapper
 
setup(Mapper<LongWritable, Text, String, WebPage>.Context) - 
Method in class org.apache.nutch.crawl.InjectorJob.UrlMapper
 
setup(Mapper<String, WebPage, Text, Text>.Context) - 
Method in class org.apache.nutch.crawl.WebTableReader.WebTableRegexMapper
 
setup(Reducer<Text, LongWritable, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatCombiner
 
setup(Mapper<String, WebPage, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatMapper
 
setup(Mapper<String, WebPage, IntWritable, FetchEntry>.Context) - 
Method in class org.apache.nutch.fetcher.FetcherJob.FetcherMapper
 
setup(Mapper<LongWritable, Text, String, Host>.Context) - 
Method in class org.apache.nutch.host.HostInjectorJob.UrlMapper
 
setup(Mapper<String, WebPage, String, NutchDocument>.Context) - 
Method in class org.apache.nutch.indexer.IndexerJob.IndexerMapper
 
setup(Reducer<Text, SolrDeleteDuplicates.SolrRecord, Text, SolrDeleteDuplicates.SolrRecord>.Context) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
setup(Mapper<String, WebPage, String, WebPage>.Context) - 
Method in class org.apache.nutch.parse.ParserJob.ParserMapper
 
setup(Mapper<String, WebPage, Text, LongWritable>.Context) - 
Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsMapper
 
setUrl(String) - 
Method in class org.apache.nutch.crawl.URLWebPage
 
setUrl(Text) - 
Method in class org.apache.nutch.crawl.UrlWithScore
 
setUrl(String) - 
Method in class org.apache.nutch.crawl.UrlWithScore
 
setUrl(String) - 
Method in class org.apache.nutch.scoring.ScoreDatum
 
setWaitForExit(boolean) - 
Method in class org.apache.nutch.util.CommandRunner
 
setWebPage(WebPage) - 
Method in class org.apache.nutch.util.WebPageWritable
 
setWhiteList(ArrayList) - 
Method in class org.apache.nutch.collection.Subcollection
 
setWhiteList(String) - 
Method in class org.apache.nutch.collection.Subcollection
Set contents of whitelist from String
Sftp - Class in org.apache.nutch.protocol.sftp
This class uses the Jsch package to fetch content using the Sftp protocol.
Sftp() - 
Constructor for class org.apache.nutch.protocol.sftp.Sftp
 
shortestMatch(String) - 
Method in class org.apache.nutch.util.PrefixStringMatcher
Returns the shortest prefix of input that is matched,
 or null if no match exists.
shortestMatch(String) - 
Method in class org.apache.nutch.util.SuffixStringMatcher
Returns the shortest suffix of input that is matched,
 or null if no match exists.
shortestMatch(String) - 
Method in class org.apache.nutch.util.TrieStringMatcher
Returns the shortest substring of input that is
 matched by a pattern in the trie, or null if no match
 exists.
shouldFetch(String, WebPage, long) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
This method provides information whether the page is suitable for
 selection in the current fetchlist.
shouldFetch(String, WebPage, long) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
This method provides information whether the page is suitable for
 selection in the current fetchlist.
shouldProcess(Utf8, Utf8) - 
Static method in class org.apache.nutch.util.NutchJob
 
shutDown() - 
Method in class org.apache.nutch.plugin.Plugin
Shutdown the plugin.
Signature - Class in org.apache.nutch.crawl
 
Signature() - 
Constructor for class org.apache.nutch.crawl.Signature
 
SIGNATURE_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
SignatureComparator - Class in org.apache.nutch.crawl
 
SignatureComparator() - 
Constructor for class org.apache.nutch.crawl.SignatureComparator
 
SignatureFactory - Class in org.apache.nutch.crawl
Factory class, which instantiates a Signature implementation according to the
 current Configuration configuration.
size() - 
Method in class org.apache.nutch.metadata.Metadata
Returns the number of metadata names in this metadata.
SIZEOF_BOOLEAN - 
Static variable in class org.apache.nutch.util.Bytes
Size of boolean in bytes
SIZEOF_BYTE - 
Static variable in class org.apache.nutch.util.Bytes
Size of byte in bytes
SIZEOF_CHAR - 
Static variable in class org.apache.nutch.util.Bytes
Size of char in bytes
SIZEOF_DOUBLE - 
Static variable in class org.apache.nutch.util.Bytes
Size of double in bytes
SIZEOF_FLOAT - 
Static variable in class org.apache.nutch.util.Bytes
Size of float in bytes
SIZEOF_INT - 
Static variable in class org.apache.nutch.util.Bytes
Size of int in bytes
SIZEOF_LONG - 
Static variable in class org.apache.nutch.util.Bytes
Size of long in bytes
SIZEOF_SHORT - 
Static variable in class org.apache.nutch.util.Bytes
Size of short in bytes
skip(DataInput) - 
Static method in class org.apache.nutch.parse.Outlink
Skips over one Outlink in the input.
SKIP_TRUNCATED - 
Static variable in class org.apache.nutch.parse.ParserJob
 
skipChildren() - 
Method in class org.apache.nutch.util.NodeWalker
Skips over and removes from the node stack the children of the last
 node.
skippedEntity(String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of a skipped entity.
SOLR_PREFIX - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
SolrConstants - Interface in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates - Class in org.apache.nutch.indexer.solr
Utility class for deleting duplicate documents from a solr index.
SolrDeleteDuplicates() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
SolrDeleteDuplicates.SolrInputFormat - Class in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates.SolrInputFormat() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
 
SolrDeleteDuplicates.SolrInputSplit - Class in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates.SolrInputSplit() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
 
SolrDeleteDuplicates.SolrInputSplit(int, int) - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
 
SolrDeleteDuplicates.SolrRecord - Class in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates.SolrRecord() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
SolrDeleteDuplicates.SolrRecord(String, float, long) - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
SolrDeleteDuplicates.SolrRecordReader - Class in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates.SolrRecordReader(SolrDocumentList, int) - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
 
SolrIndexerJob - Class in org.apache.nutch.indexer.solr
 
SolrIndexerJob() - 
Constructor for class org.apache.nutch.indexer.solr.SolrIndexerJob
 
SolrMappingReader - Class in org.apache.nutch.indexer.solr
 
SolrMappingReader(Configuration) - 
Constructor for class org.apache.nutch.indexer.solr.SolrMappingReader
 
SolrWriter - Class in org.apache.nutch.indexer.solr
 
SolrWriter() - 
Constructor for class org.apache.nutch.indexer.solr.SolrWriter
 
sortByValue() - 
Method in class org.apache.nutch.util.Histogram
 
sortInverseByValue() - 
Method in class org.apache.nutch.util.Histogram
 
SOURCE - 
Static variable in interface org.apache.nutch.metadata.DublinCore
A reference to a resource from which the present resource is derived.
SpellCheckedMetadata - Class in org.apache.nutch.metadata
A decorator to Metadata that adds spellchecking capabilities to property
 names.
SpellCheckedMetadata() - 
Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
 
split(byte[], byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
Split passed range.
splitEnd - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
splitLen - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
splitStart - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
start() - 
Method in class org.apache.nutch.api.NutchServer
 
startCDATA() - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Report the start of a CDATA section.
startDocument() - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the beginning of a document.
startDTD(String, String, String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Report the start of DTD declarations, if any.
started - 
Static variable in class org.apache.nutch.api.NutchApp
 
startElement(String, String, String, Attributes) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the beginning of an element.
startEntity(String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Report the beginning of an entity.
startPrefixMapping(String, String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Begin the scope of a prefix-URI Namespace mapping.
startsWith(byte[], byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Return true if the byte array on the right is a prefix of the byte array
 on the left.
startUp() - 
Method in class org.apache.nutch.plugin.Plugin
Will be invoked until plugin start up.
STAT_COUNTERS - 
Static variable in interface org.apache.nutch.metadata.Nutch
Counters.
STAT_JOBS - 
Static variable in interface org.apache.nutch.metadata.Nutch
Jobs.
STAT_MESSAGE - 
Static variable in interface org.apache.nutch.metadata.Nutch
Status / result message.
STAT_PHASE - 
Static variable in interface org.apache.nutch.metadata.Nutch
Phase of processing.
STAT_PROGRESS - 
Static variable in interface org.apache.nutch.metadata.Nutch
Progress (float).
state - 
Variable in class org.apache.nutch.api.JobStatus
 
status - 
Variable in class org.apache.nutch.util.NutchTool
 
STATUS_BLOCKED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_FAILED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_FETCHED - 
Static variable in class org.apache.nutch.crawl.CrawlStatus
Page was successfully fetched.
STATUS_GONE - 
Static variable in class org.apache.nutch.crawl.CrawlStatus
Page no longer exists.
STATUS_GONE - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_MODIFIED - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
Page is known to have been modified since our last visit.
STATUS_NOTFETCHING - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_NOTFOUND - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_NOTMODIFIED - 
Static variable in class org.apache.nutch.crawl.CrawlStatus
Fetching successful - page is not modified.
STATUS_NOTMODIFIED - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
Page is known to remain unmodified since our last visit.
STATUS_NOTMODIFIED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_REDIR_EXCEEDED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_REDIR_PERM - 
Static variable in class org.apache.nutch.crawl.CrawlStatus
Page permanently redirects to other page.
STATUS_REDIR_TEMP - 
Static variable in class org.apache.nutch.crawl.CrawlStatus
Page temporarily redirects to other page.
STATUS_RETRY - 
Static variable in class org.apache.nutch.crawl.CrawlStatus
Fetching unsuccessful, needs to be retried (transient errors).
STATUS_RETRY - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_ROBOTS_DENIED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_SUCCESS - 
Static variable in class org.apache.nutch.parse.ParseStatusUtils
 
STATUS_SUCCESS - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
STATUS_UNFETCHED - 
Static variable in class org.apache.nutch.crawl.CrawlStatus
Page was not fetched yet.
STATUS_UNKNOWN - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
It is unknown whether page was changed since our last visit.
STATUS_WOULDBLOCK - 
Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
 
stop(String, String) - 
Method in class org.apache.nutch.api.impl.RAMJobManager
 
stop(String, String) - 
Method in interface org.apache.nutch.api.JobManager
 
stop(boolean) - 
Method in class org.apache.nutch.api.NutchServer
 
stopJob() - 
Method in class org.apache.nutch.crawl.Crawler
 
stopJob() - 
Method in class org.apache.nutch.util.NutchTool
Stop the job with the possibility to resume.
StorageUtils - Class in org.apache.nutch.storage
 
StorageUtils() - 
Constructor for class org.apache.nutch.storage.StorageUtils
 
store - 
Variable in class org.apache.nutch.indexer.IndexerJob.IndexerMapper
 
StringUtil - Class in org.apache.nutch.util
A collection of String processing utility methods.
StringUtil() - 
Constructor for class org.apache.nutch.util.StringUtil
 
stripNonCharCodepoints(String) - 
Static method in class org.apache.nutch.indexer.solr.SolrWriter
 
Subcollection - Class in org.apache.nutch.collection
SubCollection represents a subset of index, you can define url patterns that
 will indicate that particular page (url) is part of SubCollection.
Subcollection(String, String, Configuration) - 
Constructor for class org.apache.nutch.collection.Subcollection
public Constructor
Subcollection(Configuration) - 
Constructor for class org.apache.nutch.collection.Subcollection
 
SubcollectionIndexingFilter - Class in org.apache.nutch.indexer.subcollection
 
SubcollectionIndexingFilter() - 
Constructor for class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
 
SubcollectionIndexingFilter(Configuration) - 
Constructor for class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
 
SUBJECT - 
Static variable in interface org.apache.nutch.metadata.DublinCore
The topic of the content of the resource.
SUCCESS - 
Static variable in interface org.apache.nutch.parse.ParseStatusCodes
Parsing succeeded.
SUCCESS - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Content was retrieved without errors.
SUCCESS_OK - 
Static variable in interface org.apache.nutch.parse.ParseStatusCodes
 
SUCCESS_REDIRECT - 
Static variable in interface org.apache.nutch.parse.ParseStatusCodes
Parsed content contains a directive to redirect to another URL.
SuffixStringMatcher - Class in org.apache.nutch.util
A class for efficiently matching Strings against a set
 of suffixes.
SuffixStringMatcher(String[]) - 
Constructor for class org.apache.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any suffix in the supplied array.
SuffixStringMatcher(Collection) - 
Constructor for class org.apache.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any suffix in the supplied
 Collection
SuffixURLFilter - Class in org.apache.nutch.urlfilter.suffix
Filters URLs based on a file of URL suffixes.
SuffixURLFilter() - 
Constructor for class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
SuffixURLFilter(Reader) - 
Constructor for class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
 
SWFParser - Class in org.apache.nutch.parse.swf
Parser for Flash SWF files.
SWFParser() - 
Constructor for class org.apache.nutch.parse.swf.SWFParser
 



T

TableUtil - Class in org.apache.nutch.util
 
TableUtil() - 
Constructor for class org.apache.nutch.util.TableUtil
 
TAG_BLACKLIST - 
Static variable in class org.apache.nutch.collection.Subcollection
 
TAG_COLLECTION - 
Static variable in class org.apache.nutch.collection.Subcollection
 
TAG_COLLECTIONS - 
Static variable in class org.apache.nutch.collection.Subcollection
 
TAG_ID - 
Static variable in class org.apache.nutch.collection.Subcollection
 
TAG_NAME - 
Static variable in class org.apache.nutch.collection.Subcollection
 
TAG_WHITELIST - 
Static variable in class org.apache.nutch.collection.Subcollection
 
tail(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
 
TEMP_MOVED - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Resource has moved temporarily.
TEMPLATE - 
Static variable in interface org.apache.nutch.metadata.Office
 
terminal - 
Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
TestbedProxy - Class in org.apache.nutch.tools.proxy
 
TestbedProxy() - 
Constructor for class org.apache.nutch.tools.proxy.TestbedProxy
 
TEXT_PLAIN_CONTENT_TYPE - 
Static variable in class org.apache.nutch.parse.feed.FeedParser
 
TextProfileSignature - Class in org.apache.nutch.crawl
An implementation of a page signature.
TextProfileSignature() - 
Constructor for class org.apache.nutch.crawl.TextProfileSignature
 
THREADS_KEY - 
Static variable in class org.apache.nutch.fetcher.FetcherJob
 
TikaConfig - Class in org.apache.nutch.parse.tika
Parse xml config file.
TikaConfig(String) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
 
TikaConfig(File) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
 
TikaConfig(URL) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
 
TikaConfig(InputStream) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
 
TikaConfig(InputStream, Parser) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
Deprecated. This method will be removed in Apache Tika 1.0
TikaConfig(Document) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
 
TikaConfig(Document, Parser) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
Deprecated. This method will be removed in Apache Tika 1.0
TikaConfig(Element) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
 
TikaConfig() - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
 
TikaConfig(Element, Parser) - 
Constructor for class org.apache.nutch.parse.tika.TikaConfig
Deprecated. This method will be removed in Apache Tika 1.0
TikaParser - Class in org.apache.nutch.parse.tika
Wrapper for Tika parsers.
TikaParser() - 
Constructor for class org.apache.nutch.parse.tika.TikaParser
 
timeout - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The network timeout in millisecond
TIMESTAMP_FIELD - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
TimingUtil - Class in org.apache.nutch.util
 
TimingUtil() - 
Constructor for class org.apache.nutch.util.TimingUtil
 
TITLE - 
Static variable in interface org.apache.nutch.metadata.DublinCore
A name given to the resource.
TLDIndexingFilter - Class in org.apache.nutch.indexer.tld
Adds the Top level domain extensions to the index
TLDIndexingFilter() - 
Constructor for class org.apache.nutch.indexer.tld.TLDIndexingFilter
 
TLDScoringFilter - Class in org.apache.nutch.scoring.tld
Scoring filter to boost tlds.
TLDScoringFilter() - 
Constructor for class org.apache.nutch.scoring.tld.TLDScoringFilter
 
toArgMap(Object...) - 
Static method in class org.apache.nutch.util.ToolUtil
 
toBinaryFromHex(byte) - 
Static method in class org.apache.nutch.util.Bytes
Takes a ASCII digit in the range A-F0-9 and returns the corresponding
 integer/ordinal value.
toBoolean(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Reverses Bytes.toBytes(boolean)
toByteArrays(String[]) - 
Static method in class org.apache.nutch.util.Bytes
 
toByteArrays(String) - 
Static method in class org.apache.nutch.util.Bytes
 
toByteArrays(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
 
toBytes(ByteBuffer) - 
Static method in class org.apache.nutch.util.Bytes
Returns a new byte array, copied from the passed ByteBuffer.
toBytes(String) - 
Static method in class org.apache.nutch.util.Bytes
Converts a string to a UTF-8 byte array.
toBytes(boolean) - 
Static method in class org.apache.nutch.util.Bytes
Convert a boolean to a byte array.
toBytes(long) - 
Static method in class org.apache.nutch.util.Bytes
Convert a long value to a byte array using big-endian.
toBytes(float) - 
Static method in class org.apache.nutch.util.Bytes
 
toBytes(double) - 
Static method in class org.apache.nutch.util.Bytes
Serialize a double as the IEEE 754 double format output.
toBytes(int) - 
Static method in class org.apache.nutch.util.Bytes
Convert an int value to a byte array
toBytes(short) - 
Static method in class org.apache.nutch.util.Bytes
Convert a short value to a byte array of Bytes.SIZEOF_SHORT bytes
 long.
toBytesBinary(String) - 
Static method in class org.apache.nutch.util.Bytes
 
toContent() - 
Method in class org.apache.nutch.protocol.file.FileResponse
 
toContent() - 
Method in class org.apache.nutch.protocol.ftp.FtpResponse
 
toDate(String) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toDouble(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
 
toDouble(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
 
toFloat(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Presumes float encoded as IEEE 754 floating-point "single format"
toFloat(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
Presumes float encoded as IEEE 754 floating-point "single format"
toHexString(byte[]) - 
Static method in class org.apache.nutch.util.StringUtil
Convenience call for StringUtil.toHexString(byte[], String, int), where
 sep = null; lineLen = Integer.MAX_VALUE.
toHexString(byte[], String, int) - 
Static method in class org.apache.nutch.util.StringUtil
Get a text representation of a byte[] as hexadecimal String, where each
 pair of hexadecimal digits corresponds to consecutive bytes in the array.
toInt(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to an int value
toInt(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to an int value
toInt(byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to an int value
toLong(String) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toLong(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to a long value.
toLong(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to a long value.
toLong(byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to a long value.
tool - 
Variable in class org.apache.nutch.api.JobStatus
 
ToolUtil - Class in org.apache.nutch.util
 
ToolUtil() - 
Constructor for class org.apache.nutch.util.ToolUtil
 
TopLevelDomain - Class in org.apache.nutch.util.domain
(From wikipedia) A top-level domain (TLD) is the last part of an 
 Internet domain name; that is, the letters which follow the final 
 dot of any domain name.
TopLevelDomain(String, TopLevelDomain.Type, DomainSuffix.Status, float) - 
Constructor for class org.apache.nutch.util.domain.TopLevelDomain
 
TopLevelDomain(String, DomainSuffix.Status, float, String) - 
Constructor for class org.apache.nutch.util.domain.TopLevelDomain
 
TopLevelDomain.Type - Enum in org.apache.nutch.util.domain
 
toShort(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to a short value
toShort(byte[], int) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to a short value
toShort(byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Converts a byte array to a short value
toString() - 
Method in class org.apache.nutch.crawl.UrlWithScore
 
toString() - 
Method in class org.apache.nutch.fetcher.FetchEntry
 
toString() - 
Method in class org.apache.nutch.metadata.Metadata
 
toString(Date) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
Get the HTTP format of the specified date.
toString(Calendar) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toString(long) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toString() - 
Method in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
 
toString() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
 
toString() - 
Method in class org.apache.nutch.parse.Outlink
 
toString(ParseStatus) - 
Static method in class org.apache.nutch.parse.ParseStatusUtils
 
toString() - 
Method in class org.apache.nutch.protocol.Content
 
toString() - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
 
toString(ProtocolStatus) - 
Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
 
toString() - 
Method in class org.apache.nutch.scoring.ScoreDatum
 
toString() - 
Method in enum org.apache.nutch.storage.Host.Field
 
toString() - 
Method in enum org.apache.nutch.storage.ParseStatus.Field
 
toString() - 
Method in enum org.apache.nutch.storage.ProtocolStatus.Field
 
toString() - 
Method in enum org.apache.nutch.storage.WebPage.Field
 
toString() - 
Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
 
toString(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
 
toString(byte[], String, byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Joins two byte arrays together using a separator.
toString(byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
This method will convert utf8 encoded bytes into a string.
toString() - 
Method in class org.apache.nutch.util.domain.DomainSuffix
 
toString(List<E>) - 
Method in class org.apache.nutch.util.Histogram
 
toString(Utf8) - 
Static method in class org.apache.nutch.util.TableUtil
Convert given Utf8 instance to String
toStringArray(Collection<WebPage.Field>) - 
Static method in class org.apache.nutch.storage.StorageUtils
 
toStringBinary(byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Write a printable representation of a byte array.
toStringBinary(byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Write a printable representation of a byte array.
TrieStringMatcher - Class in org.apache.nutch.util
TrieStringMatcher is a base class for simple tree-based string
 matching.
TrieStringMatcher() - 
Constructor for class org.apache.nutch.util.TrieStringMatcher
 
TrieStringMatcher.TrieNode - Class in org.apache.nutch.util
Node class for the character tree.
type - 
Variable in class org.apache.nutch.api.JobStatus
 
TYPE - 
Static variable in interface org.apache.nutch.metadata.DublinCore
The nature or genre of the content of the resource.



U

unreverseHost(String) - 
Static method in class org.apache.nutch.util.TableUtil
 
unreverseUrl(String) - 
Static method in class org.apache.nutch.util.TableUtil
 
unzip(byte[]) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[]) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[], int) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array, truncated to
 sizeLimit bytes, if necessary.
update(Map<String, Object>) - 
Method in class org.apache.nutch.api.ConfResource
 
updateHosts(boolean) - 
Method in class org.apache.nutch.host.HostDbUpdateJob
 
updateScore(String, WebPage, List<ScoreDatum>) - 
Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
 
updateScore(String, WebPage, List<ScoreDatum>) - 
Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
Increase the score by a sum of inlinked scores.
updateScore(String, WebPage, List<ScoreDatum>) - 
Method in interface org.apache.nutch.scoring.ScoringFilter
This method calculates a new score during table update, based on the values contributed
 by inlinked pages.
updateScore(String, WebPage, List<ScoreDatum>) - 
Method in class org.apache.nutch.scoring.ScoringFilters
 
updateScore(String, WebPage, List<ScoreDatum>) - 
Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
 
URL_FIELD - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
URLFilter - Interface in org.apache.nutch.net
Interface used to limit which URLs enter Nutch.
URLFILTER_AUTOMATON_FILE - 
Static variable in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
 
URLFILTER_AUTOMATON_RULES - 
Static variable in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
 
URLFILTER_ORDER - 
Static variable in class org.apache.nutch.net.URLFilters
 
URLFILTER_REGEX_FILE - 
Static variable in class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
URLFILTER_REGEX_RULES - 
Static variable in class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
URLFilterChecker - Class in org.apache.nutch.net
Checks one given filter or all filters.
URLFilterChecker(Configuration) - 
Constructor for class org.apache.nutch.net.URLFilterChecker
 
URLFilterException - Exception in org.apache.nutch.net
 
URLFilterException() - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilterException(String) - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilterException(String, Throwable) - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilterException(Throwable) - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilters - Class in org.apache.nutch.net
Creates and caches URLFilter implementing plugins.
URLFilters(Configuration) - 
Constructor for class org.apache.nutch.net.URLFilters
 
URLNormalizer - Interface in org.apache.nutch.net
Interface used to convert URLs to normal form and optionally perform substitutions
URLNormalizerChecker - Class in org.apache.nutch.net
Checks one given normalizer or all normalizers.
URLNormalizerChecker(Configuration) - 
Constructor for class org.apache.nutch.net.URLNormalizerChecker
 
URLNormalizers - Class in org.apache.nutch.net
This class uses a "chained filter" pattern to run defined normalizers.
URLNormalizers(Configuration, String) - 
Constructor for class org.apache.nutch.net.URLNormalizers
 
URLPartitioner - Class in org.apache.nutch.crawl
Partition urls by host, domain name or IP depending on the value of the
 parameter 'partition.url.mode' which can be 'byHost', 'byDomain' or 'byIP'
URLPartitioner() - 
Constructor for class org.apache.nutch.crawl.URLPartitioner
 
URLPartitioner.FetchEntryPartitioner - Class in org.apache.nutch.crawl
 
URLPartitioner.FetchEntryPartitioner() - 
Constructor for class org.apache.nutch.crawl.URLPartitioner.FetchEntryPartitioner
 
URLPartitioner.SelectorEntryPartitioner - Class in org.apache.nutch.crawl
 
URLPartitioner.SelectorEntryPartitioner() - 
Constructor for class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
 
URLUtil - Class in org.apache.nutch.util
Utility class for URL analysis
URLUtil() - 
Constructor for class org.apache.nutch.util.URLUtil
 
UrlValidator - Class in org.apache.nutch.urlfilter.validator
Validates URLs.
UrlValidator() - 
Constructor for class org.apache.nutch.urlfilter.validator.UrlValidator
 
URLWebPage - Class in org.apache.nutch.crawl
 
URLWebPage(String, WebPage) - 
Constructor for class org.apache.nutch.crawl.URLWebPage
 
UrlWithScore - Class in org.apache.nutch.crawl
A writable comparable container for an url with score.
UrlWithScore() - 
Constructor for class org.apache.nutch.crawl.UrlWithScore
Creates instance with empty url and zero score.
UrlWithScore(Text, FloatWritable) - 
Constructor for class org.apache.nutch.crawl.UrlWithScore
Creates instance with provided writables.
UrlWithScore(String, float) - 
Constructor for class org.apache.nutch.crawl.UrlWithScore
Creates instance with provided non-writable types.
UrlWithScore.UrlOnlyPartitioner - Class in org.apache.nutch.crawl
A partitioner by {url}.
UrlWithScore.UrlOnlyPartitioner() - 
Constructor for class org.apache.nutch.crawl.UrlWithScore.UrlOnlyPartitioner
 
UrlWithScore.UrlScoreComparator - Class in org.apache.nutch.crawl
Compares by {url,score}.
UrlWithScore.UrlScoreComparator() - 
Constructor for class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator
 
UrlWithScore.UrlScoreComparator.UrlOnlyComparator - Class in org.apache.nutch.crawl
Compares by {url}.
UrlWithScore.UrlScoreComparator.UrlOnlyComparator() - 
Constructor for class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator.UrlOnlyComparator
 
useHttp11 - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
Do we use HTTP/1.1?
useProxy - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
Indicates if a proxy is used
useProxy() - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
userAgent - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The Nutch 'User-Agent' request header
UTF8_ENCODING - 
Static variable in class org.apache.nutch.util.Bytes
When we encode strings, we always specify UTF8 encoding
UUID_KEY - 
Static variable in class org.apache.nutch.util.NutchConfiguration
 



V

valueOf(String) - 
Static method in enum org.apache.nutch.api.JobManager.JobType
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.api.JobStatus.State
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.storage.Host.Field
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.storage.Mark
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.storage.ParseStatus.Field
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.storage.ProtocolStatus.Field
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.storage.WebPage.Field
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.tools.proxy.FakeHandler.Mode
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
Returns the enum constant of this type with the specified name.
values() - 
Static method in enum org.apache.nutch.api.JobManager.JobType
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.api.JobStatus.State
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.storage.Host.Field
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.storage.Mark
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.storage.ParseStatus.Field
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.storage.ProtocolStatus.Field
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.storage.WebPage.Field
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.tools.proxy.FakeHandler.Mode
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
Returns an array containing the constants of this enum type, in
the order they are declared.
VERSION - 
Static variable in class org.apache.nutch.indexer.NutchDocument
 
vintToBytes(long) - 
Static method in class org.apache.nutch.util.Bytes
 



W

waitForCompletion(boolean) - 
Method in class org.apache.nutch.util.NutchJob
 
walk(Node, URL, WebPage, Configuration) - 
Static method in class org.creativecommons.nutch.CCParseFilter.Walker
Scan the document adding attributes to metadata.
WebPage - Class in org.apache.nutch.storage
 
WebPage() - 
Constructor for class org.apache.nutch.storage.WebPage
 
WebPage(StateManager) - 
Constructor for class org.apache.nutch.storage.WebPage
 
WebPage.Field - Enum in org.apache.nutch.storage
 
WebPageWritable - Class in org.apache.nutch.util
 
WebPageWritable() - 
Constructor for class org.apache.nutch.util.WebPageWritable
 
WebPageWritable(Configuration, WebPage) - 
Constructor for class org.apache.nutch.util.WebPageWritable
 
WebTableCreator - Class in org.apache.nutch.storage
 
WebTableCreator() - 
Constructor for class org.apache.nutch.storage.WebTableCreator
 
WebTableReader - Class in org.apache.nutch.crawl
Displays information about the entries of the webtable
WebTableReader() - 
Constructor for class org.apache.nutch.crawl.WebTableReader
 
WebTableReader.WebTableRegexMapper - Class in org.apache.nutch.crawl
Filters the entries from the table based on a regex
WebTableReader.WebTableRegexMapper() - 
Constructor for class org.apache.nutch.crawl.WebTableReader.WebTableRegexMapper
 
WebTableReader.WebTableStatCombiner - Class in org.apache.nutch.crawl
 
WebTableReader.WebTableStatCombiner() - 
Constructor for class org.apache.nutch.crawl.WebTableReader.WebTableStatCombiner
 
WebTableReader.WebTableStatMapper - Class in org.apache.nutch.crawl
 
WebTableReader.WebTableStatMapper() - 
Constructor for class org.apache.nutch.crawl.WebTableReader.WebTableStatMapper
 
WebTableReader.WebTableStatReducer - Class in org.apache.nutch.crawl
 
WebTableReader.WebTableStatReducer() - 
Constructor for class org.apache.nutch.crawl.WebTableReader.WebTableStatReducer
 
WORD_COUNT - 
Static variable in interface org.apache.nutch.metadata.Office
 
WORK_TYPE - 
Static variable in interface org.apache.nutch.metadata.CreativeCommons
 
WOULDBLOCK - 
Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
Request was refused by protocol plugins, because it would block.
WRITABLE_GENERATE_TIME_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
WRITABLE_PROTO_STATUS_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
WRITABLE_REPR_URL_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
write(DataOutput) - 
Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
 
write(DataOutput) - 
Method in class org.apache.nutch.crawl.UrlWithScore
 
write(DataOutput) - 
Method in class org.apache.nutch.fetcher.FetchEntry
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
write(NutchDocument) - 
Method in interface org.apache.nutch.indexer.NutchIndexWriter
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
write(NutchDocument) - 
Method in class org.apache.nutch.indexer.solr.SolrWriter
 
write(DataOutput) - 
Method in class org.apache.nutch.metadata.Metadata
 
write(DataOutput) - 
Method in class org.apache.nutch.metadata.MetaWrapper
 
write(DataOutput) - 
Method in class org.apache.nutch.parse.Outlink
 
write(DataOutput) - 
Method in class org.apache.nutch.protocol.Content
 
write(DataOutput) - 
Method in class org.apache.nutch.scoring.ScoreDatum
 
write(DataOutput) - 
Method in class org.apache.nutch.util.WebPageWritable
 
writeByteArray(DataOutput, byte[]) - 
Static method in class org.apache.nutch.util.Bytes
Write byte-array with a WritableableUtils.vint prefix.
writeByteArray(DataOutput, byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Write byte-array to out with a vint length prefix.
writeByteArray(byte[], int, byte[], int, int) - 
Static method in class org.apache.nutch.util.Bytes
Write byte-array from src to tgt with a vint length prefix.
WWW_AUTHENTICATE - 
Static variable in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
The HTTP Authentication (WWW-Authenticate) header which is returned 
 by a webserver requiring authentication.



X

X_POINT_ID - 
Static variable in interface org.apache.nutch.indexer.IndexingFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.net.URLFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.net.URLNormalizer
 
X_POINT_ID - 
Static variable in interface org.apache.nutch.parse.ParseFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.parse.Parser
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.protocol.Protocol
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.scoring.ScoringFilter
The name of the extension point.
XMLCharacterRecognizer - Class in org.apache.nutch.parse.html
Class used to verify whether the specified ch 
 conforms to the XML 1.0 definition of whitespace.
XMLCharacterRecognizer() - 
Constructor for class org.apache.nutch.parse.html.XMLCharacterRecognizer
 



Y

YES_VAL - 
Static variable in class org.apache.nutch.util.TableUtil
 



Z

zip(byte[]) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gzipped copy of the input array.
ZipParser - Class in org.apache.nutch.parse.zip
ZipParser class based on MSPowerPointParser class by Stephan Strittmatter.
ZipParser() - 
Constructor for class org.apache.nutch.parse.zip.ZipParser
Creates a new instance of ZipParser
ZipTextExtractor - Class in org.apache.nutch.parse.zip
 
ZipTextExtractor(Configuration) - 
Constructor for class org.apache.nutch.parse.zip.ZipTextExtractor
Creates a new instance of ZipTextExtractor



_

__openPassiveDataConnection(int, String) - 
Method in class org.apache.nutch.protocol.ftp.Client
 
_ALL_FIELDS - 
Static variable in class org.apache.nutch.storage.Host
 
_ALL_FIELDS - 
Static variable in class org.apache.nutch.storage.ParseStatus
 
_ALL_FIELDS - 
Static variable in class org.apache.nutch.storage.ProtocolStatus
 
_ALL_FIELDS - 
Static variable in class org.apache.nutch.storage.WebPage
 
_compare(byte[], int, int, byte[], int, int) - 
Static method in class org.apache.nutch.crawl.SignatureComparator
 
_SCHEMA - 
Static variable in class org.apache.nutch.storage.Host
 
_SCHEMA - 
Static variable in class org.apache.nutch.storage.ParseStatus
 
_SCHEMA - 
Static variable in class org.apache.nutch.storage.ProtocolStatus
 
_SCHEMA - 
Static variable in class org.apache.nutch.storage.WebPage
 


A B C D E F G H I J K L M N O P R S T U V W X Y Z _ 









  
      Overview 
      Package 
      Class 
      Use 
      Tree 
      Deprecated 
    Index 
      Help 
  









 PREV 
 NEXT

  FRAMES   
 NO FRAMES   
 










Copyright © 2012 The Apache Software Foundation