org.apache.poi
Class POITextExtractor

java.lang.Object
  extended by org.apache.poi.POITextExtractor
Direct Known Subclasses:
HPSFPropertiesExtractor, POIOLE2TextExtractor, POIXMLTextExtractor

public abstract class POITextExtractor
extends java.lang.Object

Common Parent for Text Extractors of POI Documents. You will typically find the implementation of a given format's text extractor under org.apache.poi.[format].extractor .

See Also:
ExcelExtractor, PowerPointExtractor, VisioTextExtractor, WordExtractor

Field Summary
protected  POIDocument document
          The POIDocument that's open
 
Constructor Summary
  POITextExtractor(POIDocument document)
          Creates a new text extractor for the given document
protected POITextExtractor(POITextExtractor otherExtractor)
          Creates a new text extractor, using the same document as another text extractor.
 
Method Summary
abstract  POITextExtractor getMetadataTextExtractor()
          Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.
abstract  java.lang.String getText()
          Retrieves all the text from the document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

document

protected POIDocument document
The POIDocument that's open

Constructor Detail

POITextExtractor

public POITextExtractor(POIDocument document)
Creates a new text extractor for the given document


POITextExtractor

protected POITextExtractor(POITextExtractor otherExtractor)
Creates a new text extractor, using the same document as another text extractor. Normally only used by properties extractors.

Method Detail

getText

public abstract java.lang.String getText()
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.

Returns:
All the text from the document

getMetadataTextExtractor

public abstract POITextExtractor getMetadataTextExtractor()
Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.



Copyright 2012 The Apache Software Foundation or its licensors, as applicable.