org.apache.poi.hwpf.converter
Class WordToHtmlConverter

java.lang.Object
  extended by org.apache.poi.hwpf.converter.AbstractWordConverter
      extended by org.apache.poi.hwpf.converter.WordToHtmlConverter

@Beta
public class WordToHtmlConverter
extends AbstractWordConverter

Converts Word files (95-2007) into HTML files.

This implementation doesn't create images or links to them. This can be changed by overriding AbstractWordConverter.processImage(Element, boolean, Picture) method.

Author:
Sergey Vladimirov (vlsergey {at} gmail {dot} com)

Field Summary
 
Fields inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
UNICODECHAR_NO_BREAK_SPACE, UNICODECHAR_NONBREAKING_HYPHEN, UNICODECHAR_ZERO_WIDTH_SPACE
 
Constructor Summary
WordToHtmlConverter(org.w3c.dom.Document document)
          Creates new instance of WordToHtmlConverter.
WordToHtmlConverter(HtmlDocumentFacade htmlDocumentFacade)
           
 
Method Summary
protected  void afterProcess()
          Special actions that need to be called after processing complete, like updating stylesheets or building document notes list.
 org.w3c.dom.Document getDocument()
           
static void main(java.lang.String[] args)
          Java main() interface to interact with WordToHtmlConverter
protected  void outputCharacters(org.w3c.dom.Element pElement, CharacterRun characterRun, java.lang.String text)
           
protected  void processBookmarks(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range range, int currentTableLevel, java.util.List<Bookmark> rangeBookmarks)
          Wrap range into bookmark(s) and process it.
protected  void processDocumentInformation(SummaryInformation summaryInformation)
           
 void processDocumentPart(HWPFDocumentCore wordDocument, Range range)
           
protected  void processDrawnObject(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, java.lang.String path, org.w3c.dom.Element block)
           
protected  void processEndnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range endnoteTextRange)
           
protected  void processFootnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range footnoteTextRange)
           
protected  void processHyperlink(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String hyperlink)
           
protected  void processImage(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture, java.lang.String imageSourcePath)
           
protected  void processImageWithoutPicturesManager(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture)
           
protected  void processLineBreak(org.w3c.dom.Element block, CharacterRun characterRun)
           
protected  void processNoteAutonumbered(HWPFDocument doc, java.lang.String type, int noteIndex, org.w3c.dom.Element block, Range noteTextRange)
           
protected  void processPageBreak(HWPFDocumentCore wordDocument, org.w3c.dom.Element flow)
           
protected  void processPageref(HWPFDocumentCore hwpfDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String pageref)
           
protected  void processParagraph(HWPFDocumentCore hwpfDocument, org.w3c.dom.Element parentElement, int currentTableLevel, Paragraph paragraph, java.lang.String bulletText)
           
protected  void processSection(HWPFDocumentCore wordDocument, Section section, int sectionCounter)
           
protected  void processSingleSection(HWPFDocumentCore wordDocument, Section section)
           
protected  void processTable(HWPFDocumentCore hwpfDocument, org.w3c.dom.Element flow, Table table)
           
 
Methods inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
getCharacterRunTriplet, getFontReplacer, getNumberColumnsSpanned, getNumberRowsSpanned, getPicturesManager, processCharacters, processDeadField, processDocument, processDrawnObject, processField, processImage, processNoteAnchor, processOle2, processParagraphes, setFontReplacer, setPicturesManager, tryDeadField
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordToHtmlConverter

public WordToHtmlConverter(org.w3c.dom.Document document)
Creates new instance of WordToHtmlConverter. Can be used for output several HWPFDocuments into single HTML document.

Parameters:
document - XML DOM Document used as HTML document

WordToHtmlConverter

public WordToHtmlConverter(HtmlDocumentFacade htmlDocumentFacade)
Method Detail

main

public static void main(java.lang.String[] args)
Java main() interface to interact with WordToHtmlConverter

Usage: WordToHtmlConverter infile outfile

Where infile is an input .doc file ( Word 95-2007) which will be rendered as HTML into outfile


afterProcess

protected void afterProcess()
Description copied from class: AbstractWordConverter
Special actions that need to be called after processing complete, like updating stylesheets or building document notes list. Usually they are called once, but it's okay to call them several times.

Overrides:
afterProcess in class AbstractWordConverter

getDocument

public org.w3c.dom.Document getDocument()
Specified by:
getDocument in class AbstractWordConverter

outputCharacters

protected void outputCharacters(org.w3c.dom.Element pElement,
                                CharacterRun characterRun,
                                java.lang.String text)
Specified by:
outputCharacters in class AbstractWordConverter

processBookmarks

protected void processBookmarks(HWPFDocumentCore wordDocument,
                                org.w3c.dom.Element currentBlock,
                                Range range,
                                int currentTableLevel,
                                java.util.List<Bookmark> rangeBookmarks)
Description copied from class: AbstractWordConverter
Wrap range into bookmark(s) and process it. All bookmarks have starts equal to range start and ends equal to range end. Usually it's only one bookmark.

Specified by:
processBookmarks in class AbstractWordConverter

processDocumentInformation

protected void processDocumentInformation(SummaryInformation summaryInformation)
Specified by:
processDocumentInformation in class AbstractWordConverter

processDocumentPart

public void processDocumentPart(HWPFDocumentCore wordDocument,
                                Range range)
Overrides:
processDocumentPart in class AbstractWordConverter

processDrawnObject

protected void processDrawnObject(HWPFDocument doc,
                                  CharacterRun characterRun,
                                  OfficeDrawing officeDrawing,
                                  java.lang.String path,
                                  org.w3c.dom.Element block)
Specified by:
processDrawnObject in class AbstractWordConverter

processEndnoteAutonumbered

protected void processEndnoteAutonumbered(HWPFDocument wordDocument,
                                          int noteIndex,
                                          org.w3c.dom.Element block,
                                          Range endnoteTextRange)
Specified by:
processEndnoteAutonumbered in class AbstractWordConverter

processFootnoteAutonumbered

protected void processFootnoteAutonumbered(HWPFDocument wordDocument,
                                           int noteIndex,
                                           org.w3c.dom.Element block,
                                           Range footnoteTextRange)
Specified by:
processFootnoteAutonumbered in class AbstractWordConverter

processHyperlink

protected void processHyperlink(HWPFDocumentCore wordDocument,
                                org.w3c.dom.Element currentBlock,
                                Range textRange,
                                int currentTableLevel,
                                java.lang.String hyperlink)
Specified by:
processHyperlink in class AbstractWordConverter

processImage

protected void processImage(org.w3c.dom.Element currentBlock,
                            boolean inlined,
                            Picture picture,
                            java.lang.String imageSourcePath)
Specified by:
processImage in class AbstractWordConverter

processImageWithoutPicturesManager

protected void processImageWithoutPicturesManager(org.w3c.dom.Element currentBlock,
                                                  boolean inlined,
                                                  Picture picture)
Specified by:
processImageWithoutPicturesManager in class AbstractWordConverter

processLineBreak

protected void processLineBreak(org.w3c.dom.Element block,
                                CharacterRun characterRun)
Specified by:
processLineBreak in class AbstractWordConverter

processNoteAutonumbered

protected void processNoteAutonumbered(HWPFDocument doc,
                                       java.lang.String type,
                                       int noteIndex,
                                       org.w3c.dom.Element block,
                                       Range noteTextRange)

processPageBreak

protected void processPageBreak(HWPFDocumentCore wordDocument,
                                org.w3c.dom.Element flow)
Specified by:
processPageBreak in class AbstractWordConverter

processPageref

protected void processPageref(HWPFDocumentCore hwpfDocument,
                              org.w3c.dom.Element currentBlock,
                              Range textRange,
                              int currentTableLevel,
                              java.lang.String pageref)
Specified by:
processPageref in class AbstractWordConverter

processParagraph

protected void processParagraph(HWPFDocumentCore hwpfDocument,
                                org.w3c.dom.Element parentElement,
                                int currentTableLevel,
                                Paragraph paragraph,
                                java.lang.String bulletText)
Specified by:
processParagraph in class AbstractWordConverter

processSection

protected void processSection(HWPFDocumentCore wordDocument,
                              Section section,
                              int sectionCounter)
Specified by:
processSection in class AbstractWordConverter

processSingleSection

protected void processSingleSection(HWPFDocumentCore wordDocument,
                                    Section section)
Overrides:
processSingleSection in class AbstractWordConverter

processTable

protected void processTable(HWPFDocumentCore hwpfDocument,
                            org.w3c.dom.Element flow,
                            Table table)
Specified by:
processTable in class AbstractWordConverter


Copyright 2012 The Apache Software Foundation or its licensors, as applicable.