org.apache.poi.hwpf.converter
Class WordToTextConverter

java.lang.Object
  extended by org.apache.poi.hwpf.converter.AbstractWordConverter
      extended by org.apache.poi.hwpf.converter.WordToTextConverter

@Beta
public class WordToTextConverter
extends AbstractWordConverter


Field Summary
 
Fields inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
UNICODECHAR_NO_BREAK_SPACE, UNICODECHAR_NONBREAKING_HYPHEN, UNICODECHAR_ZERO_WIDTH_SPACE
 
Constructor Summary
WordToTextConverter()
          Creates new instance of WordToTextConverter.
WordToTextConverter(org.w3c.dom.Document document)
          Creates new instance of WordToTextConverter.
WordToTextConverter(TextDocumentFacade textDocumentFacade)
           
 
Method Summary
protected  void afterProcess()
          Special actions that need to be called after processing complete, like updating stylesheets or building document notes list.
 org.w3c.dom.Document getDocument()
           
 java.lang.String getText()
           
static java.lang.String getText(DirectoryNode root)
           
static java.lang.String getText(java.io.File docFile)
           
static java.lang.String getText(HWPFDocumentCore wordDocument)
           
 boolean isOutputSummaryInformation()
           
static void main(java.lang.String[] args)
          Java main() interface to interact with WordToTextConverter
protected  void outputCharacters(org.w3c.dom.Element block, CharacterRun characterRun, java.lang.String text)
           
protected  void processBookmarks(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range range, int currentTableLevel, java.util.List<Bookmark> rangeBookmarks)
          Wrap range into bookmark(s) and process it.
protected  void processDocumentInformation(SummaryInformation summaryInformation)
           
 void processDocumentPart(HWPFDocumentCore wordDocument, Range range)
           
protected  void processDrawnObject(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, java.lang.String path, org.w3c.dom.Element block)
           
protected  void processEndnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range endnoteTextRange)
           
protected  void processFootnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range footnoteTextRange)
           
protected  void processHyperlink(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String hyperlink)
           
protected  void processImage(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture)
           
protected  void processImage(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture, java.lang.String url)
           
protected  void processImageWithoutPicturesManager(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture)
           
protected  void processLineBreak(org.w3c.dom.Element block, CharacterRun characterRun)
           
protected  void processNote(HWPFDocument wordDocument, org.w3c.dom.Element block, Range noteTextRange)
           
protected  boolean processOle2(HWPFDocument wordDocument, org.w3c.dom.Element block, Entry entry)
           
protected  void processPageBreak(HWPFDocumentCore wordDocument, org.w3c.dom.Element flow)
           
protected  void processPageref(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String pageref)
           
protected  void processParagraph(HWPFDocumentCore wordDocument, org.w3c.dom.Element parentElement, int currentTableLevel, Paragraph paragraph, java.lang.String bulletText)
           
protected  void processSection(HWPFDocumentCore wordDocument, Section section, int s)
           
protected  void processTable(HWPFDocumentCore wordDocument, org.w3c.dom.Element flow, Table table)
           
 void setOutputSummaryInformation(boolean outputDocumentInformation)
           
 
Methods inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
getCharacterRunTriplet, getFontReplacer, getNumberColumnsSpanned, getNumberRowsSpanned, getPicturesManager, processCharacters, processDeadField, processDocument, processDrawnObject, processField, processNoteAnchor, processParagraphes, processSingleSection, setFontReplacer, setPicturesManager, tryDeadField
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordToTextConverter

public WordToTextConverter()
                    throws javax.xml.parsers.ParserConfigurationException
Creates new instance of WordToTextConverter. Can be used for output several HWPFDocuments into single text document.

Throws:
javax.xml.parsers.ParserConfigurationException - if an internal DocumentBuilder cannot be created

WordToTextConverter

public WordToTextConverter(org.w3c.dom.Document document)
Creates new instance of WordToTextConverter. Can be used for output several HWPFDocuments into single text document.

Parameters:
document - XML DOM Document used as storage for text pieces

WordToTextConverter

public WordToTextConverter(TextDocumentFacade textDocumentFacade)
Method Detail

getText

public static java.lang.String getText(DirectoryNode root)
                                throws java.lang.Exception
Throws:
java.lang.Exception

getText

public static java.lang.String getText(java.io.File docFile)
                                throws java.lang.Exception
Throws:
java.lang.Exception

getText

public static java.lang.String getText(HWPFDocumentCore wordDocument)
                                throws java.lang.Exception
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)
Java main() interface to interact with WordToTextConverter

Usage: WordToTextConverter infile outfile

Where infile is an input .doc file ( Word 95-2007) which will be rendered as plain text into outfile


afterProcess

protected void afterProcess()
Description copied from class: AbstractWordConverter
Special actions that need to be called after processing complete, like updating stylesheets or building document notes list. Usually they are called once, but it's okay to call them several times.

Overrides:
afterProcess in class AbstractWordConverter

getDocument

public org.w3c.dom.Document getDocument()
Specified by:
getDocument in class AbstractWordConverter

getText

public java.lang.String getText()
                         throws java.lang.Exception
Throws:
java.lang.Exception

isOutputSummaryInformation

public boolean isOutputSummaryInformation()

outputCharacters

protected void outputCharacters(org.w3c.dom.Element block,
                                CharacterRun characterRun,
                                java.lang.String text)
Specified by:
outputCharacters in class AbstractWordConverter

processBookmarks

protected void processBookmarks(HWPFDocumentCore wordDocument,
                                org.w3c.dom.Element currentBlock,
                                Range range,
                                int currentTableLevel,
                                java.util.List<Bookmark> rangeBookmarks)
Description copied from class: AbstractWordConverter
Wrap range into bookmark(s) and process it. All bookmarks have starts equal to range start and ends equal to range end. Usually it's only one bookmark.

Specified by:
processBookmarks in class AbstractWordConverter

processDocumentInformation

protected void processDocumentInformation(SummaryInformation summaryInformation)
Specified by:
processDocumentInformation in class AbstractWordConverter

processDocumentPart

public void processDocumentPart(HWPFDocumentCore wordDocument,
                                Range range)
Overrides:
processDocumentPart in class AbstractWordConverter

processDrawnObject

protected void processDrawnObject(HWPFDocument doc,
                                  CharacterRun characterRun,
                                  OfficeDrawing officeDrawing,
                                  java.lang.String path,
                                  org.w3c.dom.Element block)
Specified by:
processDrawnObject in class AbstractWordConverter

processEndnoteAutonumbered

protected void processEndnoteAutonumbered(HWPFDocument wordDocument,
                                          int noteIndex,
                                          org.w3c.dom.Element block,
                                          Range endnoteTextRange)
Specified by:
processEndnoteAutonumbered in class AbstractWordConverter

processFootnoteAutonumbered

protected void processFootnoteAutonumbered(HWPFDocument wordDocument,
                                           int noteIndex,
                                           org.w3c.dom.Element block,
                                           Range footnoteTextRange)
Specified by:
processFootnoteAutonumbered in class AbstractWordConverter

processHyperlink

protected void processHyperlink(HWPFDocumentCore wordDocument,
                                org.w3c.dom.Element currentBlock,
                                Range textRange,
                                int currentTableLevel,
                                java.lang.String hyperlink)
Specified by:
processHyperlink in class AbstractWordConverter

processImage

protected void processImage(org.w3c.dom.Element currentBlock,
                            boolean inlined,
                            Picture picture)
Overrides:
processImage in class AbstractWordConverter

processImage

protected void processImage(org.w3c.dom.Element currentBlock,
                            boolean inlined,
                            Picture picture,
                            java.lang.String url)
Specified by:
processImage in class AbstractWordConverter

processImageWithoutPicturesManager

protected void processImageWithoutPicturesManager(org.w3c.dom.Element currentBlock,
                                                  boolean inlined,
                                                  Picture picture)
Specified by:
processImageWithoutPicturesManager in class AbstractWordConverter

processLineBreak

protected void processLineBreak(org.w3c.dom.Element block,
                                CharacterRun characterRun)
Specified by:
processLineBreak in class AbstractWordConverter

processNote

protected void processNote(HWPFDocument wordDocument,
                           org.w3c.dom.Element block,
                           Range noteTextRange)

processOle2

protected boolean processOle2(HWPFDocument wordDocument,
                              org.w3c.dom.Element block,
                              Entry entry)
                       throws java.lang.Exception
Overrides:
processOle2 in class AbstractWordConverter
Throws:
java.lang.Exception

processPageBreak

protected void processPageBreak(HWPFDocumentCore wordDocument,
                                org.w3c.dom.Element flow)
Specified by:
processPageBreak in class AbstractWordConverter

processPageref

protected void processPageref(HWPFDocumentCore wordDocument,
                              org.w3c.dom.Element currentBlock,
                              Range textRange,
                              int currentTableLevel,
                              java.lang.String pageref)
Specified by:
processPageref in class AbstractWordConverter

processParagraph

protected void processParagraph(HWPFDocumentCore wordDocument,
                                org.w3c.dom.Element parentElement,
                                int currentTableLevel,
                                Paragraph paragraph,
                                java.lang.String bulletText)
Specified by:
processParagraph in class AbstractWordConverter

processSection

protected void processSection(HWPFDocumentCore wordDocument,
                              Section section,
                              int s)
Specified by:
processSection in class AbstractWordConverter

processTable

protected void processTable(HWPFDocumentCore wordDocument,
                            org.w3c.dom.Element flow,
                            Table table)
Specified by:
processTable in class AbstractWordConverter

setOutputSummaryInformation

public void setOutputSummaryInformation(boolean outputDocumentInformation)


Copyright 2012 The Apache Software Foundation or its licensors, as applicable.