public final class BytesToNameCanonicalizer extends Object
Name
s which are constructed directly from a byte-based
input source).
Complications arise from trying to do efficient reuse and merging of
symbol tables, to be able to make use of usually shared vocabulary
of subsequent parsing runs.限定符和类型 | 字段和说明 |
---|---|
protected int |
_collCount
Total number of Names in collision buckets (included in
_count along with primary entries) |
protected int |
_collEnd
Index of the first unused collision bucket entry (== size of
the used portion of collision list): less than
or equal to 0xFF (255), since max number of entries is 255
(8-bit, minus 0 used as 'empty' marker)
|
protected org.codehaus.jackson.sym.BytesToNameCanonicalizer.Bucket[] |
_collList
Array of heads of collision bucket chains; size dynamically
|
protected int |
_count
Total number of Names in the symbol table;
only used for child tables.
|
protected boolean |
_intern
Whether canonical symbol Strings are to be intern()ed before added
to the table or not
|
protected int |
_longestCollisionList
We need to keep track of the longest collision list; this is needed
both to indicate problems with attacks and to allow flushing for
other cases.
|
protected int[] |
_mainHash
Array of 2^N size, which contains combination
of 24-bits of hash (0 to indicate 'empty' slot),
and 8-bit collision bucket index (0 to indicate empty
collision bucket chain; otherwise subtract one from index)
|
protected int |
_mainHashMask
Mask used to truncate 32-bit hash value to current hash array
size; essentially, hash array size - 1 (since hash array sizes
are 2^N).
|
protected Name[] |
_mainNames
Array that contains
Name instances matching
entries in _mainHash . |
protected BytesToNameCanonicalizer |
_parent
Reference to the root symbol table, for child tables, so
that they can merge table information back as necessary.
|
protected AtomicReference<org.codehaus.jackson.sym.BytesToNameCanonicalizer.TableInfo> |
_tableInfo
Member that is only used by the root table instance: root
passes immutable state into child instances, and children
may return new state if they add entries to the table.
|
protected static int |
DEFAULT_TABLE_SIZE |
protected static int |
MAX_TABLE_SIZE
Let's not expand symbol tables past some maximum size;
this should protected against OOMEs caused by large documents
with unique (~= random) names.
|
限定符和类型 | 方法和说明 |
---|---|
Name |
addName(String symbolStr,
int[] quads,
int qlen) |
Name |
addName(String symbolStr,
int q1,
int q2) |
int |
bucketCount() |
int |
calcHash(int firstQuad) |
int |
calcHash(int[] quads,
int qlen) |
int |
calcHash(int firstQuad,
int secondQuad) |
protected static int[] |
calcQuads(byte[] wordBytes) |
int |
collisionCount()
Method mostly needed by unit tests; calculates number of
entries that are in collision list.
|
static BytesToNameCanonicalizer |
createRoot()
Factory method to call to create a symbol table instance with a
randomized seed value.
|
protected static BytesToNameCanonicalizer |
createRoot(int hashSeed)
Factory method that should only be called from unit tests, where seed
value should remain the same.
|
Name |
findName(int firstQuad)
Finds and returns name matching the specified symbol, if such
name already exists in the table.
|
Name |
findName(int[] quads,
int qlen)
Finds and returns name matching the specified symbol, if such
name already exists in the table; or if not, creates name object,
adds to the table, and returns it.
|
Name |
findName(int firstQuad,
int secondQuad)
Finds and returns name matching the specified symbol, if such
name already exists in the table.
|
static Name |
getEmptyName() |
int |
hashSeed() |
BytesToNameCanonicalizer |
makeChild(boolean canonicalize,
boolean intern)
Factory method used to create actual symbol table instance to
use for parsing.
|
int |
maxCollisionLength()
Method mostly needed by unit tests; calculates length of the
longest collision chain.
|
boolean |
maybeDirty()
Method called to check to quickly see if a child symbol table
may have gotten additional entries.
|
void |
release()
Method called by the using code to indicate it is done
with this instance.
|
protected void |
reportTooManyCollisions(int maxLen) |
int |
size() |
protected static final int DEFAULT_TABLE_SIZE
protected static final int MAX_TABLE_SIZE
protected final BytesToNameCanonicalizer _parent
protected final AtomicReference<org.codehaus.jackson.sym.BytesToNameCanonicalizer.TableInfo> _tableInfo
protected final boolean _intern
protected int _count
protected int _longestCollisionList
protected int _mainHashMask
protected int[] _mainHash
protected Name[] _mainNames
Name
instances matching
entries in _mainHash
. Contains nulls for unused
entries.protected org.codehaus.jackson.sym.BytesToNameCanonicalizer.Bucket[] _collList
protected int _collCount
_count
along with primary entries)protected int _collEnd
public static BytesToNameCanonicalizer createRoot()
protected static BytesToNameCanonicalizer createRoot(int hashSeed)
public BytesToNameCanonicalizer makeChild(boolean canonicalize, boolean intern)
intern
- Whether canonical symbol Strings should be interned
or notpublic void release()
public int size()
public int bucketCount()
public boolean maybeDirty()
public int hashSeed()
public int collisionCount()
size()
- 1), but should usually be much lower, ideally 0.public int maxCollisionLength()
size()
- 1 in the pathological casepublic static Name getEmptyName()
public Name findName(int firstQuad)
Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)
firstQuad
- int32 containing first 4 bytes of the name;
if the whole name less than 4 bytes, padded with zero bytes
in front (zero MSBs, ie. right aligned)public Name findName(int firstQuad, int secondQuad)
Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)
firstQuad
- int32 containing first 4 bytes of the name.secondQuad
- int32 containing bytes 5 through 8 of the
name; if less than 8 bytes, padded with up to 3 zero bytes
in front (zero MSBs, ie. right aligned)public Name findName(int[] quads, int qlen)
Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.
quads
- Array of int32s, each of which contain 4 bytes of
encoded nameqlen
- Number of int32s, starting from index 0, in quads
parameterpublic final int calcHash(int firstQuad)
public final int calcHash(int firstQuad, int secondQuad)
public final int calcHash(int[] quads, int qlen)
protected static int[] calcQuads(byte[] wordBytes)
protected void reportTooManyCollisions(int maxLen)