org.apache.nutch.urlfilter.automaton
Class AutomatonURLFilter
java.lang.Object
org.apache.nutch.urlfilter.api.RegexURLFilterBase
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
- All Implemented Interfaces:
- Configurable, URLFilter, Pluggable
public class AutomatonURLFilter
- extends RegexURLFilterBase
RegexURLFilterBase implementation based on the
dk.brics.automaton
Finite-State Automata for JavaTM.
- Author:
- Jérôme Charron
- See Also:
- dk.brics.automaton
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
URLFILTER_AUTOMATON_FILE
public static final String URLFILTER_AUTOMATON_FILE
- See Also:
- Constant Field Values
URLFILTER_AUTOMATON_RULES
public static final String URLFILTER_AUTOMATON_RULES
- See Also:
- Constant Field Values
AutomatonURLFilter
public AutomatonURLFilter()
AutomatonURLFilter
public AutomatonURLFilter(String filename)
throws IOException,
PatternSyntaxException
- Throws:
IOException
PatternSyntaxException
getRulesReader
protected Reader getRulesReader(Configuration conf)
throws IOException
- Rules specified as a config property will override rules specified
as a config file.
- Specified by:
getRulesReader
in class RegexURLFilterBase
- Parameters:
conf
- is the current configuration.
- Returns:
- the name of the resource containing the rules to use.
- Throws:
IOException
createRule
protected RegexRule createRule(boolean sign,
String regex)
- Description copied from class:
RegexURLFilterBase
- Creates a new
RegexRule
.
- Specified by:
createRule
in class RegexURLFilterBase
- Parameters:
sign
- of the regular expression.
A true
value means that any URL matching this rule
must be included, whereas a false
value means that any URL matching this rule must be excluded.regex
- is the regular expression associated to this rule.
main
public static void main(String[] args)
throws IOException
- Throws:
IOException
Copyright © 2012 The Apache Software Foundation