org.apache.nutch.scoring.tld
Class TLDScoringFilter

java.lang.Object
  extended by org.apache.nutch.scoring.tld.TLDScoringFilter
All Implemented Interfaces:
Configurable, FieldPluggable, Pluggable, ScoringFilter

public class TLDScoringFilter
extends Object
implements ScoringFilter

Scoring filter to boost tlds.

Author:
Enis Soztutar <enis.soz.nutch@gmail.com>

Field Summary
 
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
 
Constructor Summary
TLDScoringFilter()
           
 
Method Summary
 void distributeScoreToOutlinks(String fromUrl, WebPage page, Collection<ScoreDatum> scoreData, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 float generatorSortValue(String url, WebPage page, float initSort)
          This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.
 Configuration getConf()
           
 Collection<WebPage.Field> getFields()
           
 float indexerScore(String url, NutchDocument doc, WebPage page, float initScore)
          This method calculates a Lucene document boost.
 void initialScore(String url, WebPage page)
          Set an initial score for newly discovered pages.
 void injectedScore(String url, WebPage page)
          Set an initial score for newly injected pages.
 void setConf(Configuration conf)
           
 void updateScore(String url, WebPage page, List<ScoreDatum> inlinkedScoreData)
          This method calculates a new score during table update, based on the values contributed by inlinked pages.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TLDScoringFilter

public TLDScoringFilter()
Method Detail

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getFields

public Collection<WebPage.Field> getFields()
Specified by:
getFields in interface FieldPluggable

injectedScore

public void injectedScore(String url,
                          WebPage page)
                   throws ScoringFilterException
Description copied from interface: ScoringFilter
Set an initial score for newly injected pages. Note: newly injected pages may have no inlinks, so filter implementations may wish to set this score to a non-zero value, to give newly injected pages some initial credit.

Specified by:
injectedScore in interface ScoringFilter
Parameters:
url - url of the page
page - new page. Filters will modify it in-place.
Throws:
ScoringFilterException

initialScore

public void initialScore(String url,
                         WebPage page)
                  throws ScoringFilterException
Description copied from interface: ScoringFilter
Set an initial score for newly discovered pages. Note: newly discovered pages have at least one inlink with its score contribution, so filter implementations may choose to set initial score to zero (unknown value), and then the inlink score contribution will set the "real" value of the new page.

Specified by:
initialScore in interface ScoringFilter
Parameters:
url - url of the page
Throws:
ScoringFilterException

generatorSortValue

public float generatorSortValue(String url,
                                WebPage page,
                                float initSort)
                         throws ScoringFilterException
Description copied from interface: ScoringFilter
This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.

Specified by:
generatorSortValue in interface ScoringFilter
Parameters:
url - url of the page
initSort - initial sort value, or a value from previous filters in chain
Throws:
ScoringFilterException

distributeScoreToOutlinks

public void distributeScoreToOutlinks(String fromUrl,
                                      WebPage page,
                                      Collection<ScoreDatum> scoreData,
                                      int allCount)
                               throws ScoringFilterException
Description copied from interface: ScoringFilter
Distribute score value from the current page to all its outlinked pages.

Specified by:
distributeScoreToOutlinks in interface ScoringFilter
Parameters:
fromUrl - url of the source page
scoreData - A list of OutlinkedScoreDatums for every outlink. These OutlinkedScoreDatums will be passed to #updateScore(String, OldWebTableRow, List) for every outlinked URL.
allCount - number of all collected outlinks from the source page
Throws:
ScoringFilterException

updateScore

public void updateScore(String url,
                        WebPage page,
                        List<ScoreDatum> inlinkedScoreData)
                 throws ScoringFilterException
Description copied from interface: ScoringFilter
This method calculates a new score during table update, based on the values contributed by inlinked pages.

Specified by:
updateScore in interface ScoringFilter
Parameters:
url - url of the page
Throws:
ScoringFilterException

indexerScore

public float indexerScore(String url,
                          NutchDocument doc,
                          WebPage page,
                          float initScore)
                   throws ScoringFilterException
Description copied from interface: ScoringFilter
This method calculates a Lucene document boost.

Specified by:
indexerScore in interface ScoringFilter
Parameters:
url - url of the page
doc - document. NOTE: this already contains all information collected by indexing filters. Implementations may modify this instance, in order to store/remove some information.
initScore - initial boost value for the Lucene document.
Returns:
boost value for the Lucene document. This value is passed as an argument to the next scoring filter in chain. NOTE: implementations may also express other scoring strategies by modifying Lucene document directly.
Throws:
ScoringFilterException


Copyright © 2012 The Apache Software Foundation