|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||
| Packages that use Parse | |
|---|---|
| org.apache.nutch.analysis.lang | Text document language identifier. |
| org.apache.nutch.indexer.feed | |
| org.apache.nutch.microformats.reltag | A microformats Rel-Tag Parser/Indexer/Querier plugin. |
| org.apache.nutch.parse | |
| org.apache.nutch.parse.html | An HTML document parsing plugin. |
| org.apache.nutch.parse.js | |
| org.apache.nutch.parse.tika | |
| org.creativecommons.nutch | Sample plugins that parse and index Creative Commons medadata. |
| Uses of Parse in org.apache.nutch.analysis.lang |
|---|
| Methods in org.apache.nutch.analysis.lang that return Parse | |
|---|---|
Parse |
HTMLLanguageParser.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content language 1. |
| Methods in org.apache.nutch.analysis.lang with parameters of type Parse | |
|---|---|
Parse |
HTMLLanguageParser.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content language 1. |
| Uses of Parse in org.apache.nutch.indexer.feed |
|---|
| Methods in org.apache.nutch.indexer.feed with parameters of type Parse | |
|---|---|
NutchDocument |
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch
index. |
| Uses of Parse in org.apache.nutch.microformats.reltag |
|---|
| Methods in org.apache.nutch.microformats.reltag that return Parse | |
|---|---|
Parse |
RelTagParser.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
| Methods in org.apache.nutch.microformats.reltag with parameters of type Parse | |
|---|---|
Parse |
RelTagParser.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
| Uses of Parse in org.apache.nutch.parse |
|---|
| Methods in org.apache.nutch.parse that return Parse | |
|---|---|
Parse |
ParseFilter.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse, given the DOM tree of a page. |
Parse |
ParseFilters.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters. |
static Parse |
ParseStatusUtils.getEmptyParse(Exception e,
Configuration conf)
|
static Parse |
ParseStatusUtils.getEmptyParse(int minorCode,
String message,
Configuration conf)
|
Parse |
Parser.getParse(String url,
WebPage page)
This method parses content in WebPage instance |
Parse |
ParseUtil.parse(String url,
WebPage page)
Performs a parse by iterating through a List of preferred Parsers
until a successful parse is performed and a Parse object is
returned. |
| Methods in org.apache.nutch.parse with parameters of type Parse | |
|---|---|
Parse |
ParseFilter.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse, given the DOM tree of a page. |
Parse |
ParseFilters.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters. |
| Uses of Parse in org.apache.nutch.parse.html |
|---|
| Methods in org.apache.nutch.parse.html that return Parse | |
|---|---|
Parse |
HtmlParser.getParse(String url,
WebPage page)
|
| Uses of Parse in org.apache.nutch.parse.js |
|---|
| Methods in org.apache.nutch.parse.js that return Parse | |
|---|---|
Parse |
JSParseFilter.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
Parse |
JSParseFilter.getParse(String url,
WebPage page)
|
| Methods in org.apache.nutch.parse.js with parameters of type Parse | |
|---|---|
Parse |
JSParseFilter.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
| Uses of Parse in org.apache.nutch.parse.tika |
|---|
| Methods in org.apache.nutch.parse.tika that return Parse | |
|---|---|
Parse |
TikaParser.getParse(String url,
WebPage page)
|
| Uses of Parse in org.creativecommons.nutch |
|---|
| Methods in org.creativecommons.nutch that return Parse | |
|---|---|
Parse |
CCParseFilter.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. |
| Methods in org.creativecommons.nutch with parameters of type Parse | |
|---|---|
Parse |
CCParseFilter.filter(String url,
WebPage page,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. |
|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||