org.apache.nutch.protocol
Interface RobotRules

All Known Implementing Classes:
EmptyRobotRules, RobotRulesParser.RobotRuleSet

public interface RobotRules

This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules.


Method Summary
 long getCrawlDelay()
          Get Crawl-Delay, in milliseconds.
 long getExpireTime()
          Get expire time
 boolean isAllowed(URL url)
          Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.
 

Method Detail

getExpireTime

long getExpireTime()
Get expire time


getCrawlDelay

long getCrawlDelay()
Get Crawl-Delay, in milliseconds. This returns -1 if not set.


isAllowed

boolean isAllowed(URL url)
Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.



Copyright © 2012 The Apache Software Foundation