org.apache.nutch.protocol.http.api
Class RobotRulesParser.RobotRuleSet

java.lang.Object
  extended by org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
All Implemented Interfaces:
RobotRules
Enclosing class:
RobotRulesParser

public static class RobotRulesParser.RobotRuleSet
extends Object
implements RobotRules

This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules.


Constructor Summary
RobotRulesParser.RobotRuleSet()
           
 
Method Summary
 long getCrawlDelay()
          Get Crawl-Delay, in milliseconds.
 long getExpireTime()
          Get expire time
 boolean isAllowed(String path)
          Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.
 boolean isAllowed(URL url)
          Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.
 void setCrawlDelay(long crawlDelay)
          Set Crawl-Delay, in milliseconds
 void setExpireTime(long expireTime)
          Change when the ruleset goes stale.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

RobotRulesParser.RobotRuleSet

public RobotRulesParser.RobotRuleSet()
Method Detail

setExpireTime

public void setExpireTime(long expireTime)
Change when the ruleset goes stale.


getExpireTime

public long getExpireTime()
Get expire time

Specified by:
getExpireTime in interface RobotRules

getCrawlDelay

public long getCrawlDelay()
Get Crawl-Delay, in milliseconds. This returns -1 if not set.

Specified by:
getCrawlDelay in interface RobotRules

setCrawlDelay

public void setCrawlDelay(long crawlDelay)
Set Crawl-Delay, in milliseconds


isAllowed

public boolean isAllowed(URL url)
Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.

Specified by:
isAllowed in interface RobotRules

isAllowed

public boolean isAllowed(String path)
Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.


toString

public String toString()
Overrides:
toString in class Object


Copyright © 2012 The Apache Software Foundation