|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.util.URLUtil
public class URLUtil
Utility class for URL analysis
Constructor Summary | |
---|---|
URLUtil()
|
Method Summary | |
---|---|
static String |
chooseRepr(String src,
String dst,
boolean temp)
Given two urls, a src and a destination of a redirect, it returns the representative url. |
static String |
getDomainName(String url)
Returns the domain name of the url. |
static String |
getDomainName(URL url)
Returns the domain name of the url. |
static DomainSuffix |
getDomainSuffix(String url)
Returns the DomainSuffix corresponding to the
last public part of the hostname |
static DomainSuffix |
getDomainSuffix(URL url)
Returns the DomainSuffix corresponding to the
last public part of the hostname |
static String |
getHost(String url)
Returns the lowercased hostname for the url or null if the url is not well formed. |
static String[] |
getHostSegments(String url)
Partitions of the hostname of the url by "." |
static String[] |
getHostSegments(URL url)
Partitions of the hostname of the url by "." |
static String |
getPage(String url)
Returns the page for the url. |
static boolean |
isSameDomainName(String url1,
String url2)
Returns whether the given urls have the same domain name. |
static boolean |
isSameDomainName(URL url1,
URL url2)
Returns whether the given urls have the same domain name. |
static void |
main(String[] args)
For testing |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public URLUtil()
Method Detail |
---|
public static String getDomainName(URL url)
getDomainName(conf, new URL(http://lucene.apache.org/))
apache.org
public static String getDomainName(String url) throws MalformedURLException
getDomainName(conf, new http://lucene.apache.org/)
apache.org
MalformedURLException
public static boolean isSameDomainName(URL url1, URL url2)
isSameDomain(new URL("http://lucene.apache.org")
, new URL("http://people.apache.org/"))
will return true.
public static boolean isSameDomainName(String url1, String url2) throws MalformedURLException
isSameDomain("http://lucene.apache.org"
,"http://people.apache.org/")
will return true.
MalformedURLException
public static DomainSuffix getDomainSuffix(URL url)
DomainSuffix
corresponding to the
last public part of the hostname
public static DomainSuffix getDomainSuffix(String url) throws MalformedURLException
DomainSuffix
corresponding to the
last public part of the hostname
MalformedURLException
public static String[] getHostSegments(URL url)
public static String[] getHostSegments(String url) throws MalformedURLException
MalformedURLException
public static String chooseRepr(String src, String dst, boolean temp)
Given two urls, a src and a destination of a redirect, it returns the representative url.
This method implements an extended version of the algorithm used by the
Yahoo! Slurp crawler described here:
How
does the Yahoo! webcrawler handle redirects?
src
- The source url.dst
- The destination url.temp
- Is the redirect a temporary redirect.
public static String getHost(String url)
url
- The url to check.
public static String getPage(String url)
url
- The url to check.
public static void main(String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |