org.apache.nutch.urlfilter.regex
Class RegexURLFilter

java.lang.Object
  extended by org.apache.nutch.urlfilter.api.RegexURLFilterBase
      extended by org.apache.nutch.urlfilter.regex.RegexURLFilter
All Implemented Interfaces:
Configurable, URLFilter, Pluggable

public class RegexURLFilter
extends RegexURLFilterBase

Filters URLs based on a file of regular expressions using the Java Regex implementation.


Field Summary
static String URLFILTER_REGEX_FILE
           
static String URLFILTER_REGEX_RULES
           
 
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
 
Constructor Summary
RegexURLFilter()
           
RegexURLFilter(String filename)
           
 
Method Summary
protected  RegexRule createRule(boolean sign, String regex)
          Creates a new RegexRule.
protected  Reader getRulesReader(Configuration conf)
          Rules specified as a config property will override rules specified as a config file.
static void main(String[] args)
           
 
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

URLFILTER_REGEX_FILE

public static final String URLFILTER_REGEX_FILE
See Also:
Constant Field Values

URLFILTER_REGEX_RULES

public static final String URLFILTER_REGEX_RULES
See Also:
Constant Field Values
Constructor Detail

RegexURLFilter

public RegexURLFilter()

RegexURLFilter

public RegexURLFilter(String filename)
               throws IOException,
                      PatternSyntaxException
Throws:
IOException
PatternSyntaxException
Method Detail

getRulesReader

protected Reader getRulesReader(Configuration conf)
                         throws IOException
Rules specified as a config property will override rules specified as a config file.

Specified by:
getRulesReader in class RegexURLFilterBase
Parameters:
conf - is the current configuration.
Returns:
the name of the resource containing the rules to use.
Throws:
IOException

createRule

protected RegexRule createRule(boolean sign,
                               String regex)
Description copied from class: RegexURLFilterBase
Creates a new RegexRule.

Specified by:
createRule in class RegexURLFilterBase
Parameters:
sign - of the regular expression. A true value means that any URL matching this rule must be included, whereas a false value means that any URL matching this rule must be excluded.
regex - is the regular expression associated to this rule.

main

public static void main(String[] args)
                 throws IOException
Throws:
IOException


Copyright © 2012 The Apache Software Foundation