org.apache.nutch.protocol.http.api
Class HttpBase

java.lang.Object
  extended by org.apache.nutch.protocol.http.api.HttpBase
All Implemented Interfaces:
Configurable, FieldPluggable, Pluggable, Protocol
Direct Known Subclasses:
Http, Http

public abstract class HttpBase
extends Object
implements Protocol

Author:
Jérôme Charron

Field Summary
protected  String accept
          The "Accept" request header value.
protected  String acceptLanguage
          The "Accept-Language" request header value.
static int BUFFER_SIZE
           
protected  boolean ip_header
          The "_ip" request header value.
protected  int maxContent
          The length limit for downloaded content, in bytes.
protected  String proxyHost
          The proxy hostname.
protected  int proxyPort
          The proxy port.
protected  int timeout
          The network timeout in millisecond
protected  boolean useHttp11
          Do we use HTTP/1.1?
protected  boolean useProxy
          Indicates if a proxy is used
protected  String userAgent
          The Nutch 'User-Agent' request header
 
Fields inherited from interface org.apache.nutch.protocol.Protocol
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID
 
Constructor Summary
HttpBase()
          Creates a new instance of HttpBase
HttpBase(org.slf4j.Logger logger)
          Creates a new instance of HttpBase
 
Method Summary
 String getAccept()
           
 String getAcceptLanguage()
          Value of "Accept-Language" request header sent by Nutch.
 Configuration getConf()
           
 boolean getIP_Header()
           
 int getMaxContent()
           
 ProtocolOutput getProtocolOutput(String url, WebPage page)
          Returns the Content for a fetchlist entry.
 String getProxyHost()
           
 int getProxyPort()
           
protected abstract  Response getResponse(URL url, WebPage page, boolean followRedirects)
           
 RobotRules getRobotRules(String url, WebPage page)
          Retrieve robot rules applicable for this url.
 int getTimeout()
           
 boolean getUseHttp11()
           
 String getUserAgent()
           
protected  void logConf()
           
protected static void main(HttpBase http, String[] args)
           
 byte[] processDeflateEncoded(byte[] compressed, URL url)
           
 byte[] processGzipEncoded(byte[] compressed, URL url)
           
 void setConf(Configuration conf)
           
 boolean useProxy()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.nutch.plugin.FieldPluggable
getFields
 

Field Detail

BUFFER_SIZE

public static final int BUFFER_SIZE
See Also:
Constant Field Values

proxyHost

protected String proxyHost
The proxy hostname.


proxyPort

protected int proxyPort
The proxy port.


useProxy

protected boolean useProxy
Indicates if a proxy is used


timeout

protected int timeout
The network timeout in millisecond


maxContent

protected int maxContent
The length limit for downloaded content, in bytes.


userAgent

protected String userAgent
The Nutch 'User-Agent' request header


acceptLanguage

protected String acceptLanguage
The "Accept-Language" request header value.


accept

protected String accept
The "Accept" request header value.


ip_header

protected boolean ip_header
The "_ip" request header value.


useHttp11

protected boolean useHttp11
Do we use HTTP/1.1?

Constructor Detail

HttpBase

public HttpBase()
Creates a new instance of HttpBase


HttpBase

public HttpBase(org.slf4j.Logger logger)
Creates a new instance of HttpBase

Method Detail

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable

getProtocolOutput

public ProtocolOutput getProtocolOutput(String url,
                                        WebPage page)
Description copied from interface: Protocol
Returns the Content for a fetchlist entry.

Specified by:
getProtocolOutput in interface Protocol

getProxyHost

public String getProxyHost()

getProxyPort

public int getProxyPort()

useProxy

public boolean useProxy()

getTimeout

public int getTimeout()

getMaxContent

public int getMaxContent()

getUserAgent

public String getUserAgent()

getAcceptLanguage

public String getAcceptLanguage()
Value of "Accept-Language" request header sent by Nutch.

Returns:
The value of the header "Accept-Language" header.

getAccept

public String getAccept()

getUseHttp11

public boolean getUseHttp11()

getIP_Header

public boolean getIP_Header()

logConf

protected void logConf()

processGzipEncoded

public byte[] processGzipEncoded(byte[] compressed,
                                 URL url)
                          throws IOException
Throws:
IOException

processDeflateEncoded

public byte[] processDeflateEncoded(byte[] compressed,
                                    URL url)
                             throws IOException
Throws:
IOException

main

protected static void main(HttpBase http,
                           String[] args)
                    throws Exception
Throws:
Exception

getResponse

protected abstract Response getResponse(URL url,
                                        WebPage page,
                                        boolean followRedirects)
                                 throws ProtocolException,
                                        IOException
Throws:
ProtocolException
IOException

getRobotRules

public RobotRules getRobotRules(String url,
                                WebPage page)
Description copied from interface: Protocol
Retrieve robot rules applicable for this url.

Specified by:
getRobotRules in interface Protocol
Parameters:
url - url to check
Returns:
robot rules (specific for this url or default), never null


Copyright © 2012 The Apache Software Foundation