org.apache.poi.hslf.extractor
Class PowerPointExtractor

java.lang.Object
  extended by org.apache.poi.POITextExtractor
      extended by org.apache.poi.POIOLE2TextExtractor
          extended by org.apache.poi.hslf.extractor.PowerPointExtractor

public final class PowerPointExtractor
extends POIOLE2TextExtractor

This class can be used to extract text from a PowerPoint file. Can optionally also get the notes from one.

Author:
Nick Burch

Field Summary
 
Fields inherited from class org.apache.poi.POITextExtractor
document
 
Constructor Summary
PowerPointExtractor(DirectoryNode dir)
          Creates a PowerPointExtractor, from a specific place inside an open NPOIFSFileSystem
PowerPointExtractor(DirectoryNode dir, POIFSFileSystem fs)
          Deprecated. Use PowerPointExtractor(DirectoryNode) instead
PowerPointExtractor(HSLFSlideShow ss)
          Creates a PowerPointExtractor, from a HSLFSlideShow
PowerPointExtractor(java.io.InputStream iStream)
          Creates a PowerPointExtractor, from an Input Stream
PowerPointExtractor(NPOIFSFileSystem fs)
          Creates a PowerPointExtractor, from an open NPOIFSFileSystem
PowerPointExtractor(POIFSFileSystem fs)
          Creates a PowerPointExtractor, from an open POIFSFileSystem
PowerPointExtractor(java.lang.String fileName)
          Creates a PowerPointExtractor, from a file
 
Method Summary
 java.lang.String getNotes()
          Fetches all the notes text from the slideshow, but not the slide text
 java.util.List<OLEShape> getOLEShapes()
           
 java.lang.String getText()
          Fetches all the slide text from the slideshow, but not the notes, unless you've called setSlidesByDefault() and setNotesByDefault() to change this
 java.lang.String getText(boolean getSlideText, boolean getNoteText)
          Fetches text from the slideshow, be it slide text or note text.
 java.lang.String getText(boolean getSlideText, boolean getNoteText, boolean getCommentText, boolean getMasterText)
           
static void main(java.lang.String[] args)
          Basic extractor.
 void setCommentsByDefault(boolean commentsByDefault)
          Should a call to getText() return comments text? Default is no
 void setMasterByDefault(boolean masterByDefault)
          Should a call to getText() return text from master? Default is no
 void setNotesByDefault(boolean notesByDefault)
          Should a call to getText() return notes text? Default is no
 void setSlidesByDefault(boolean slidesByDefault)
          Should a call to getText() return slide text? Default is yes
 
Methods inherited from class org.apache.poi.POIOLE2TextExtractor
getDocSummaryInformation, getFileSystem, getMetadataTextExtractor, getRoot, getSummaryInformation
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PowerPointExtractor

public PowerPointExtractor(java.lang.String fileName)
                    throws java.io.IOException
Creates a PowerPointExtractor, from a file

Parameters:
fileName - The name of the file to extract from
Throws:
java.io.IOException

PowerPointExtractor

public PowerPointExtractor(java.io.InputStream iStream)
                    throws java.io.IOException
Creates a PowerPointExtractor, from an Input Stream

Parameters:
iStream - The input stream containing the PowerPoint document
Throws:
java.io.IOException

PowerPointExtractor

public PowerPointExtractor(POIFSFileSystem fs)
                    throws java.io.IOException
Creates a PowerPointExtractor, from an open POIFSFileSystem

Parameters:
fs - the POIFSFileSystem containing the PowerPoint document
Throws:
java.io.IOException

PowerPointExtractor

public PowerPointExtractor(NPOIFSFileSystem fs)
                    throws java.io.IOException
Creates a PowerPointExtractor, from an open NPOIFSFileSystem

Parameters:
fs - the NPOIFSFileSystem containing the PowerPoint document
Throws:
java.io.IOException

PowerPointExtractor

public PowerPointExtractor(DirectoryNode dir)
                    throws java.io.IOException
Creates a PowerPointExtractor, from a specific place inside an open NPOIFSFileSystem

Parameters:
dir - the POIFS Directory containing the PowerPoint document
Throws:
java.io.IOException

PowerPointExtractor

@Deprecated
public PowerPointExtractor(DirectoryNode dir,
                                      POIFSFileSystem fs)
                    throws java.io.IOException
Deprecated. Use PowerPointExtractor(DirectoryNode) instead

Throws:
java.io.IOException

PowerPointExtractor

public PowerPointExtractor(HSLFSlideShow ss)
Creates a PowerPointExtractor, from a HSLFSlideShow

Parameters:
ss - the HSLFSlideShow to extract text from
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Basic extractor. Returns all the text, and optionally all the notes

Throws:
java.io.IOException

setSlidesByDefault

public void setSlidesByDefault(boolean slidesByDefault)
Should a call to getText() return slide text? Default is yes


setNotesByDefault

public void setNotesByDefault(boolean notesByDefault)
Should a call to getText() return notes text? Default is no


setCommentsByDefault

public void setCommentsByDefault(boolean commentsByDefault)
Should a call to getText() return comments text? Default is no


setMasterByDefault

public void setMasterByDefault(boolean masterByDefault)
Should a call to getText() return text from master? Default is no


getText

public java.lang.String getText()
Fetches all the slide text from the slideshow, but not the notes, unless you've called setSlidesByDefault() and setNotesByDefault() to change this

Specified by:
getText in class POITextExtractor
Returns:
All the text from the document

getNotes

public java.lang.String getNotes()
Fetches all the notes text from the slideshow, but not the slide text


getOLEShapes

public java.util.List<OLEShape> getOLEShapes()

getText

public java.lang.String getText(boolean getSlideText,
                                boolean getNoteText)
Fetches text from the slideshow, be it slide text or note text. Because the final block of text in a TextRun normally have their last \n stripped, we add it back

Parameters:
getSlideText - fetch slide text
getNoteText - fetch note text

getText

public java.lang.String getText(boolean getSlideText,
                                boolean getNoteText,
                                boolean getCommentText,
                                boolean getMasterText)


Copyright 2012 The Apache Software Foundation or its licensors, as applicable.