org.apache.nutch.crawl
Class TextProfileSignature

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.crawl.Signature
          extended by org.apache.nutch.crawl.TextProfileSignature
All Implemented Interfaces:
Configurable

public class TextProfileSignature
extends Signature

An implementation of a page signature. It calculates an MD5 hash of a plain text "profile" of a page. In case there is no text, it calculates a hash using the MD5Signature.

The algorithm to calculate a page "profile" takes the plain text version of a page and performs the following steps:

This list is then submitted to an MD5 hash calculation.

Author:
Andrzej Bialecki <ab@getopt.org>

Constructor Summary
TextProfileSignature()
           
 
Method Summary
 byte[] calculate(WebPage page)
           
 Collection<WebPage.Field> getFields()
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextProfileSignature

public TextProfileSignature()
Method Detail

calculate

public byte[] calculate(WebPage page)
Specified by:
calculate in class Signature

getFields

public Collection<WebPage.Field> getFields()
Specified by:
getFields in class Signature


Copyright © 2012 The Apache Software Foundation