Package org.apache.nutch.analysis.lang

Text document language identifier.

See:
          Description

Class Summary
HTMLLanguageParser Adds metadata identifying language of document if found We could also run statistical analysis here but we'd miss all other formats
LanguageIndexingFilter An IndexingFilter that adds a lang (language) field to the document.
 

Package org.apache.nutch.analysis.lang Description

Text document language identifier.

Language profiles are based on material from http://www.homepages.inf.ed.ac.uk/pkoehn/publications/europarl.ps/.



Copyright © 2012 The Apache Software Foundation