|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.jsoup.safety.Whitelist
public class Whitelist
Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed.
Start with one of the defaults: If you need to allow more through (please be careful!), tweak a base whitelist with:addTags(java.lang.String...)
addAttributes(java.lang.String, java.lang.String...)
addEnforcedAttribute(java.lang.String, java.lang.String, java.lang.String)
addProtocols(java.lang.String, java.lang.String, java.lang.String...)
body
fragment of HTML (to add user
supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the
document HTML around the cleaned body HTML, or create a whitelist that allows html
and head
elements as appropriate.
If you are going to extend a whitelist, please be very careful. Make sure you understand what attributes may lead to
XSS attack vectors. URL attributes are particularly vulnerable and require careful validation. See
http://ha.ckers.org/xss.html for some XSS attack examples.
Constructor Summary | |
---|---|
Whitelist()
Create a new, empty whitelist. |
Method Summary | |
---|---|
Whitelist |
addAttributes(String tag,
String... keys)
Add a list of allowed attributes to a tag. |
Whitelist |
addEnforcedAttribute(String tag,
String key,
String value)
Add an enforced attribute to a tag. |
Whitelist |
addProtocols(String tag,
String key,
String... protocols)
Add allowed URL protocols for an element's URL attribute. |
Whitelist |
addTags(String... tags)
Add a list of allowed elements to a whitelist. |
static Whitelist |
basic()
This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li,
ol, p, pre, q, small, strike, strong, sub, sup, u, ul , and appropriate attributes. |
static Whitelist |
basicWithImages()
This whitelist allows the same text tags as basic() , and also allows img tags, with appropriate
attributes, with src pointing to http or https . |
static Whitelist |
none()
This whitelist allows only text nodes: all HTML will be stripped. |
Whitelist |
preserveRelativeLinks(boolean preserve)
Configure this Whitelist to preserve relative links in an element's URL attribute, or convert them to absolute links. |
static Whitelist |
relaxed()
This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite,
code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub,
sup, table, tbody, td, tfoot, th, thead, tr, u, ul
Links do not have an enforced rel=nofollow attribute, but you can add that if desired. |
static Whitelist |
simpleText()
This whitelist allows only simple text formatting: b, em, i, strong, u . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Whitelist()
basic()
,
basicWithImages()
,
simpleText()
,
relaxed()
Method Detail |
---|
public static Whitelist none()
public static Whitelist simpleText()
b, em, i, strong, u
. All other HTML (tags and
attributes) will be removed.
public static Whitelist basic()
a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li,
ol, p, pre, q, small, strike, strong, sub, sup, u, ul
, and appropriate attributes.
Links (a
elements) can point to http, https, ftp, mailto
, and have an enforced
rel=nofollow
attribute.
Does not allow images.
public static Whitelist basicWithImages()
basic()
, and also allows img
tags, with appropriate
attributes, with src
pointing to http
or https
.
public static Whitelist relaxed()
a, b, blockquote, br, caption, cite,
code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub,
sup, table, tbody, td, tfoot, th, thead, tr, u, ul
Links do not have an enforced rel=nofollow
attribute, but you can add that if desired.
public Whitelist addTags(String... tags)
tags
- tag names to allow
public Whitelist addAttributes(String tag, String... keys)
addAttributes("a", "href", "class")
allows href
and class
attributes
on a
tags.
To make an attribute valid for all tags, use the pseudo tag :all
, e.g.
addAttributes(":all", "class")
.
tag
- The tag the attributes are for. The tag will be added to the allowed tag list if necessary.keys
- List of valid attributes for the tag
public Whitelist addEnforcedAttribute(String tag, String key, String value)
addEnforcedAttribute("a", "rel", "nofollow")
will make all a
tags output as
<a href="..." rel="nofollow">
tag
- The tag the enforced attribute is for. The tag will be added to the allowed tag list if necessary.key
- The attribute keyvalue
- The enforced attribute value
public Whitelist preserveRelativeLinks(boolean preserve)
http://
.
Note that when handling relative links, the input document must have an appropriate base URI
set when
parsing, so that the link's protocol can be confirmed. Regardless of the setting of the preserve relative
links
option, the link must be resolvable against the base URI to an allowed protocol; otherwise the attribute
will be removed.
preserve
- true
to allow relative links, false
(default) to deny
addProtocols(java.lang.String, java.lang.String, java.lang.String...)
public Whitelist addProtocols(String tag, String key, String... protocols)
addProtocols("a", "href", "ftp", "http", "https")
tag
- Tag the URL protocol is forkey
- Attribute keyprotocols
- List of valid protocols
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |