|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.conf.Configured
org.apache.nutch.collection.Subcollection
public class Subcollection
SubCollection represents a subset of index, you can define url patterns that will indicate that particular page (url) is part of SubCollection.
| Field Summary | |
|---|---|
static String |
TAG_BLACKLIST
|
static String |
TAG_COLLECTION
|
static String |
TAG_COLLECTIONS
|
static String |
TAG_ID
|
static String |
TAG_NAME
|
static String |
TAG_WHITELIST
|
| Fields inherited from interface org.apache.nutch.net.URLFilter |
|---|
X_POINT_ID |
| Constructor Summary | |
|---|---|
Subcollection(Configuration conf)
|
|
Subcollection(String id,
String name,
Configuration conf)
public Constructor |
|
| Method Summary | |
|---|---|
String |
filter(String urlString)
Simple "indexOf" currentFilter for matching patterns. |
String |
getBlackListString()
Returns blacklist String |
String |
getId()
|
String |
getName()
|
ArrayList |
getWhiteList()
Returns whitelist |
String |
getWhiteListString()
Returns whitelist String |
void |
initialize(Element collection)
Initialize Subcollection from dom element |
protected void |
parseList(ArrayList list,
String text)
Create a list of patterns from chunk of text, patterns are separated with newline |
void |
setBlackList(String list)
Set contents of blacklist from String |
void |
setWhiteList(ArrayList whiteList)
|
void |
setWhiteList(String list)
Set contents of whitelist from String |
| Methods inherited from class org.apache.hadoop.conf.Configured |
|---|
getConf, setConf |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.hadoop.conf.Configurable |
|---|
getConf, setConf |
| Field Detail |
|---|
public static final String TAG_COLLECTIONS
public static final String TAG_COLLECTION
public static final String TAG_WHITELIST
public static final String TAG_BLACKLIST
public static final String TAG_NAME
public static final String TAG_ID
| Constructor Detail |
|---|
public Subcollection(String id,
String name,
Configuration conf)
id - id of SubCollectionname - name of SubCollectionpublic Subcollection(Configuration conf)
| Method Detail |
|---|
public String getName()
public String getId()
public ArrayList getWhiteList()
public String getWhiteListString()
public String getBlackListString()
public void setWhiteList(ArrayList whiteList)
whiteList - The whiteList to set.public String filter(String urlString)
rules for evaluation are as follows: 1. if pattern matches in blacklist then url is rejected 2. if pattern matches in whitelist then url is allowed 3. url is rejected
filter in interface URLFilterURLFilter.filter(java.lang.String)public void initialize(Element collection)
collection -
protected void parseList(ArrayList list,
String text)
list - text - public void setBlackList(String list)
list - the blacklist contentspublic void setWhiteList(String list)
list - the whitelist contents
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||