|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.urlfilter.domain.DomainURLFilter
public class DomainURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and hostnames. Only a url that matches one of the suffixes, domains, or hosts present in the file is allowed.
Urls are checked in order of domain suffix, domain name, and hostname against entries in the domain file. The domain file would be setup as follows with one entry per line:
com apache.org www.apache.org
The first line is an example of a filter that would allow all .com domains. The second line allows all urls from apache.org and all of its subdomains such as lucene.apache.org and hadoop.apache.org. The third line would allow only urls from www.apache.org. There is no specific ordering to entries. The entries are from more general to more specific with the more general overridding the more specific.
The domain file defaults to domain-urlfilter.txt in the classpath but can be overridden using the:
Field Summary |
---|
Fields inherited from interface org.apache.nutch.net.URLFilter |
---|
X_POINT_ID |
Constructor Summary | |
---|---|
DomainURLFilter()
Default constructor. |
|
DomainURLFilter(String domainFile)
Constructor that specifies the domain file to use. |
Method Summary | |
---|---|
String |
filter(String url)
|
Configuration |
getConf()
|
void |
setConf(Configuration conf)
Sets the configuration. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DomainURLFilter()
public DomainURLFilter(String domainFile)
domainFile
- The domain file, overrides domain-urlfilter.text default.
IOException
Method Detail |
---|
public void setConf(Configuration conf)
setConf
in interface Configurable
public Configuration getConf()
getConf
in interface Configurable
public String filter(String url)
filter
in interface URLFilter
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |