org.apache.nutch.metadata
Interface Nutch

All Known Implementing Classes:
Metadata, SpellCheckedMetadata

public interface Nutch

A collection of Nutch internal metadata constants.

Author:
Chris Mattmann, Jérôme Charron

Field Summary
static String ALL_BATCH_ID_STR
           
static org.apache.avro.util.Utf8 ALL_CRAWL_ID
           
static String ARG_BATCH
          Batch id to select.
static String ARG_CLASS
          Class to run as a NutchTool.
static String ARG_CRAWL
          Crawl id to use.
static String ARG_CURTIME
          The notion of current time.
static String ARG_DEPTH
          Depth (number of cycles) of a crawl.
static String ARG_FILTER
          Apply URLFilters.
static String ARG_FORCE
          Force processing even if there are locks or inconsistencies.
static String ARG_NORMALIZE
          Apply URLNormalizers.
static String ARG_NUMTASKS
          Number of fetcher tasks.
static String ARG_RESUME
          Resume previously aborted op.
static String ARG_SEEDDIR
          a path to a directory containing a list of seed URLs.
static String ARG_SEEDLIST
          Whitespace-separated list of seed URLs.
static String ARG_SOLR
          Solr URL.
static String ARG_SORT
          Sort statistics.
static String ARG_THREADS
          Number of fetcher threads (per map task).
static String ARG_TOPN
          Generate topN scoring URLs.
static String CACHING_FORBIDDEN_ALL
          Don't show either original forbidden content or summaries.
static String CACHING_FORBIDDEN_CONTENT
          Don't show original forbidden content, but show summaries.
static String CACHING_FORBIDDEN_KEY
          Sites may request that search engines don't provide access to cached documents.
static org.apache.avro.util.Utf8 CACHING_FORBIDDEN_KEY_UTF8
           
static String CACHING_FORBIDDEN_NONE
          Show both original forbidden content and summaries (default).
static String CHAR_ENCODING_FOR_CONVERSION
           
static String CRAWL_ID_KEY
           
static String FETCH_STATUS_KEY
           
static String FETCH_TIME_KEY
           
static String GENERATE_TIME_KEY
           
static String ORIGINAL_CHAR_ENCODING
           
static String PROTO_STATUS_KEY
           
static String REPR_URL_KEY
           
static String SCORE_KEY
           
static String SEGMENT_NAME_KEY
           
static String SIGNATURE_KEY
           
static String STAT_COUNTERS
          Counters.
static String STAT_JOBS
          Jobs.
static String STAT_MESSAGE
          Status / result message.
static String STAT_PHASE
          Phase of processing.
static String STAT_PROGRESS
          Progress (float).
static Text WRITABLE_GENERATE_TIME_KEY
           
static Text WRITABLE_PROTO_STATUS_KEY
           
static Text WRITABLE_REPR_URL_KEY
           
 

Field Detail

ORIGINAL_CHAR_ENCODING

static final String ORIGINAL_CHAR_ENCODING
See Also:
Constant Field Values

CHAR_ENCODING_FOR_CONVERSION

static final String CHAR_ENCODING_FOR_CONVERSION
See Also:
Constant Field Values

SIGNATURE_KEY

static final String SIGNATURE_KEY
See Also:
Constant Field Values

SEGMENT_NAME_KEY

static final String SEGMENT_NAME_KEY
See Also:
Constant Field Values

SCORE_KEY

static final String SCORE_KEY
See Also:
Constant Field Values

GENERATE_TIME_KEY

static final String GENERATE_TIME_KEY
See Also:
Constant Field Values

WRITABLE_GENERATE_TIME_KEY

static final Text WRITABLE_GENERATE_TIME_KEY

PROTO_STATUS_KEY

static final String PROTO_STATUS_KEY
See Also:
Constant Field Values

WRITABLE_PROTO_STATUS_KEY

static final Text WRITABLE_PROTO_STATUS_KEY

FETCH_TIME_KEY

static final String FETCH_TIME_KEY
See Also:
Constant Field Values

FETCH_STATUS_KEY

static final String FETCH_STATUS_KEY
See Also:
Constant Field Values

CACHING_FORBIDDEN_KEY

static final String CACHING_FORBIDDEN_KEY
Sites may request that search engines don't provide access to cached documents.

See Also:
Constant Field Values

CACHING_FORBIDDEN_KEY_UTF8

static final org.apache.avro.util.Utf8 CACHING_FORBIDDEN_KEY_UTF8

CACHING_FORBIDDEN_NONE

static final String CACHING_FORBIDDEN_NONE
Show both original forbidden content and summaries (default).

See Also:
Constant Field Values

CACHING_FORBIDDEN_ALL

static final String CACHING_FORBIDDEN_ALL
Don't show either original forbidden content or summaries.

See Also:
Constant Field Values

CACHING_FORBIDDEN_CONTENT

static final String CACHING_FORBIDDEN_CONTENT
Don't show original forbidden content, but show summaries.

See Also:
Constant Field Values

REPR_URL_KEY

static final String REPR_URL_KEY
See Also:
Constant Field Values

WRITABLE_REPR_URL_KEY

static final Text WRITABLE_REPR_URL_KEY

ALL_BATCH_ID_STR

static final String ALL_BATCH_ID_STR
See Also:
Constant Field Values

ALL_CRAWL_ID

static final org.apache.avro.util.Utf8 ALL_CRAWL_ID

CRAWL_ID_KEY

static final String CRAWL_ID_KEY
See Also:
Constant Field Values

ARG_BATCH

static final String ARG_BATCH
Batch id to select.

See Also:
Constant Field Values

ARG_CRAWL

static final String ARG_CRAWL
Crawl id to use.

See Also:
Constant Field Values

ARG_RESUME

static final String ARG_RESUME
Resume previously aborted op.

See Also:
Constant Field Values

ARG_FORCE

static final String ARG_FORCE
Force processing even if there are locks or inconsistencies.

See Also:
Constant Field Values

ARG_SORT

static final String ARG_SORT
Sort statistics.

See Also:
Constant Field Values

ARG_SOLR

static final String ARG_SOLR
Solr URL.

See Also:
Constant Field Values

ARG_THREADS

static final String ARG_THREADS
Number of fetcher threads (per map task).

See Also:
Constant Field Values

ARG_NUMTASKS

static final String ARG_NUMTASKS
Number of fetcher tasks.

See Also:
Constant Field Values

ARG_TOPN

static final String ARG_TOPN
Generate topN scoring URLs.

See Also:
Constant Field Values

ARG_CURTIME

static final String ARG_CURTIME
The notion of current time.

See Also:
Constant Field Values

ARG_FILTER

static final String ARG_FILTER
Apply URLFilters.

See Also:
Constant Field Values

ARG_NORMALIZE

static final String ARG_NORMALIZE
Apply URLNormalizers.

See Also:
Constant Field Values

ARG_SEEDLIST

static final String ARG_SEEDLIST
Whitespace-separated list of seed URLs.

See Also:
Constant Field Values

ARG_SEEDDIR

static final String ARG_SEEDDIR
a path to a directory containing a list of seed URLs.

See Also:
Constant Field Values

ARG_CLASS

static final String ARG_CLASS
Class to run as a NutchTool.

See Also:
Constant Field Values

ARG_DEPTH

static final String ARG_DEPTH
Depth (number of cycles) of a crawl.

See Also:
Constant Field Values

STAT_MESSAGE

static final String STAT_MESSAGE
Status / result message.

See Also:
Constant Field Values

STAT_PHASE

static final String STAT_PHASE
Phase of processing.

See Also:
Constant Field Values

STAT_PROGRESS

static final String STAT_PROGRESS
Progress (float).

See Also:
Constant Field Values

STAT_JOBS

static final String STAT_JOBS
Jobs.

See Also:
Constant Field Values

STAT_COUNTERS

static final String STAT_COUNTERS
Counters.

See Also:
Constant Field Values


Copyright © 2012 The Apache Software Foundation