org.apache.nutch.parse.tika
Class TikaParser

java.lang.Object
  extended by org.apache.nutch.parse.tika.TikaParser
All Implemented Interfaces:
Configurable, Parser, FieldPluggable, Pluggable

public class TikaParser
extends Object
implements Parser

Wrapper for Tika parsers. Mimics the HTMLParser but using the XHTML representation returned by Tika as SAX events


Field Summary
static org.slf4j.Logger LOG
           
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
TikaParser()
           
 
Method Summary
 Configuration getConf()
           
 Collection<WebPage.Field> getFields()
           
 Parse getParse(String url, WebPage page)
           This method parses content in WebPage instance
 TikaConfig getTikaConfig()
           
static void main(String[] args)
           
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

TikaParser

public TikaParser()
Method Detail

getParse

public Parse getParse(String url,
                      WebPage page)
Description copied from interface: Parser

This method parses content in WebPage instance

Specified by:
getParse in interface Parser
Parameters:
url - Page's URL

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getTikaConfig

public TikaConfig getTikaConfig()

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable

getFields

public Collection<WebPage.Field> getFields()
Specified by:
getFields in interface FieldPluggable

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2012 The Apache Software Foundation