org.apache.nutch.parse.tika
Class TikaParser
java.lang.Object
org.apache.nutch.parse.tika.TikaParser
- All Implemented Interfaces:
- Configurable, Parser, FieldPluggable, Pluggable
public class TikaParser
- extends Object
- implements Parser
Wrapper for Tika parsers. Mimics the HTMLParser but using the XHTML
representation returned by Tika as SAX events
Field Summary |
static org.slf4j.Logger |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
TikaParser
public TikaParser()
getParse
public Parse getParse(String url,
WebPage page)
- Description copied from interface:
Parser
This method parses content in WebPage instance
- Specified by:
getParse
in interface Parser
- Parameters:
url
- Page's URL
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getTikaConfig
public TikaConfig getTikaConfig()
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
getFields
public Collection<WebPage.Field> getFields()
- Specified by:
getFields
in interface FieldPluggable
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2012 The Apache Software Foundation