org.apache.nutch.urlfilter.automaton
Class AutomatonURLFilter

java.lang.Object
  extended by org.apache.nutch.urlfilter.api.RegexURLFilterBase
      extended by org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
All Implemented Interfaces:
Configurable, URLFilter, Pluggable

public class AutomatonURLFilter
extends RegexURLFilterBase

RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM.

Author:
Jérôme Charron
See Also:
dk.brics.automaton

Field Summary
static String URLFILTER_AUTOMATON_FILE
           
static String URLFILTER_AUTOMATON_RULES
           
 
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
 
Constructor Summary
AutomatonURLFilter()
           
AutomatonURLFilter(String filename)
           
 
Method Summary
protected  RegexRule createRule(boolean sign, String regex)
          Creates a new RegexRule.
protected  Reader getRulesReader(Configuration conf)
          Rules specified as a config property will override rules specified as a config file.
static void main(String[] args)
           
 
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

URLFILTER_AUTOMATON_FILE

public static final String URLFILTER_AUTOMATON_FILE
See Also:
Constant Field Values

URLFILTER_AUTOMATON_RULES

public static final String URLFILTER_AUTOMATON_RULES
See Also:
Constant Field Values
Constructor Detail

AutomatonURLFilter

public AutomatonURLFilter()

AutomatonURLFilter

public AutomatonURLFilter(String filename)
                   throws IOException,
                          PatternSyntaxException
Throws:
IOException
PatternSyntaxException
Method Detail

getRulesReader

protected Reader getRulesReader(Configuration conf)
                         throws IOException
Rules specified as a config property will override rules specified as a config file.

Specified by:
getRulesReader in class RegexURLFilterBase
Parameters:
conf - is the current configuration.
Returns:
the name of the resource containing the rules to use.
Throws:
IOException

createRule

protected RegexRule createRule(boolean sign,
                               String regex)
Description copied from class: RegexURLFilterBase
Creates a new RegexRule.

Specified by:
createRule in class RegexURLFilterBase
Parameters:
sign - of the regular expression. A true value means that any URL matching this rule must be included, whereas a false value means that any URL matching this rule must be excluded.
regex - is the regular expression associated to this rule.

main

public static void main(String[] args)
                 throws IOException
Throws:
IOException


Copyright © 2012 The Apache Software Foundation