Package org.apache.nutch.parse.html

An HTML document parsing plugin.

See:
          Description

Class Summary
DOMBuilder This class takes SAX events (in addition to some extra events that SAX doesn't handle yet) and adds the result to a document or document fragment.
DOMContentUtils A collection of methods for extracting content from DOM trees.
DOMContentUtils.LinkParams  
HTMLMetaProcessor Class for parsing META Directives from DOM trees.
HtmlParser  
XMLCharacterRecognizer Class used to verify whether the specified ch conforms to the XML 1.0 definition of whitespace.
 

Package org.apache.nutch.parse.html Description

An HTML document parsing plugin.

This package relies on NekoHTML.



Copyright © 2012 The Apache Software Foundation