org.apache.lucene.util
Class BytesRefHash

java.lang.Object
  extended by org.apache.lucene.util.BytesRefHash

public final class BytesRefHash
extends Object

BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ordinal (Map) storing the hashed bytes efficiently in continuous storage. The mapping to the ordinal is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Nested Class Summary
static class BytesRefHash.BytesStartArray
          Manages allocation of the per-term addresses.
static class BytesRefHash.DirectBytesStartArray
          A simple BytesRefHash.BytesStartArray that tracks memory allocation using a private AtomicLong instance.
static class BytesRefHash.MaxBytesLengthExceededException
          Thrown if a BytesRef exceeds the BytesRefHash limit of ByteBlockPool.BYTE_BLOCK_SIZE-2.
static class BytesRefHash.TrackingDirectBytesStartArray
          A simple BytesRefHash.BytesStartArray that tracks all memory allocation using a shared AtomicLong instance.
 
Field Summary
static int DEFAULT_CAPACITY
           
 
Constructor Summary
BytesRefHash()
          Creates a new BytesRefHash with a ByteBlockPool using a ByteBlockPool.DirectAllocator.
BytesRefHash(ByteBlockPool pool)
          Creates a new BytesRefHash
BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
          Creates a new BytesRefHash
 
Method Summary
 int add(BytesRef bytes)
          Adds a new BytesRef
 int add(BytesRef bytes, int code)
          Adds a new BytesRef with a pre-calculated hash code.
 int addByPoolOffset(int offset)
           
 int byteStart(int ord)
          Returns the bytesStart offset into the internally used ByteBlockPool for the given ord
 void clear()
           
 void clear(boolean resetPool)
          Clears the BytesRef which maps to the given BytesRef
 void close()
          Closes the BytesRefHash and releases all internally used memory
 int[] compact()
          Returns the ords array in arbitrary order.
 BytesRef get(int ord, BytesRef ref)
          Populates and returns a BytesRef with the bytes for the given ord.
 void reinit()
          reinitializes the BytesRefHash after a previous clear() call.
 int size()
          Returns the number of BytesRef values in this BytesRefHash.
 int[] sort(Comparator<BytesRef> comp)
          Returns the values array sorted by the referenced byte values.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_CAPACITY

public static final int DEFAULT_CAPACITY
See Also:
Constant Field Values
Constructor Detail

BytesRefHash

public BytesRefHash()
Creates a new BytesRefHash with a ByteBlockPool using a ByteBlockPool.DirectAllocator.


BytesRefHash

public BytesRefHash(ByteBlockPool pool)
Creates a new BytesRefHash


BytesRefHash

public BytesRefHash(ByteBlockPool pool,
                    int capacity,
                    BytesRefHash.BytesStartArray bytesStartArray)
Creates a new BytesRefHash

Method Detail

size

public int size()
Returns the number of BytesRef values in this BytesRefHash.

Returns:
the number of BytesRef values in this BytesRefHash.

get

public BytesRef get(int ord,
                    BytesRef ref)
Populates and returns a BytesRef with the bytes for the given ord.

Note: the given ord must be a positive integer less that the current size ( size())

Parameters:
ord - the ord
ref - the BytesRef to populate
Returns:
the given BytesRef instance populated with the bytes for the given ord

compact

public int[] compact()
Returns the ords array in arbitrary order. Valid ords start at offset of 0 and end at a limit of size() - 1

Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.


sort

public int[] sort(Comparator<BytesRef> comp)
Returns the values array sorted by the referenced byte values.

Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

Parameters:
comp - the Comparator used for sorting

clear

public void clear(boolean resetPool)
Clears the BytesRef which maps to the given BytesRef


clear

public void clear()

close

public void close()
Closes the BytesRefHash and releases all internally used memory


add

public int add(BytesRef bytes)
Adds a new BytesRef

Parameters:
bytes - the bytes to hash
Returns:
the ord the given bytes are hashed if there was no mapping for the given bytes, otherwise (-(ord)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before.
Throws:
BytesRefHash.MaxBytesLengthExceededException - if the given bytes are > 2 + ByteBlockPool.BYTE_BLOCK_SIZE

add

public int add(BytesRef bytes,
               int code)
Adds a new BytesRef with a pre-calculated hash code.

Parameters:
bytes - the bytes to hash
code - the bytes hash code

Hashcode is defined as:

 int hash = 0;
 for (int i = offset; i < offset + length; i++) {
   hash = 31 * hash + bytes[i];
 }
 
Returns:
the ord the given bytes are hashed if there was no mapping for the given bytes, otherwise (-(ord)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before.
Throws:
BytesRefHash.MaxBytesLengthExceededException - if the given bytes are > ByteBlockPool.BYTE_BLOCK_SIZE - 2

addByPoolOffset

public int addByPoolOffset(int offset)

reinit

public void reinit()
reinitializes the BytesRefHash after a previous clear() call. If clear() has not been called previously this method has no effect.


byteStart

public int byteStart(int ord)
Returns the bytesStart offset into the internally used ByteBlockPool for the given ord

Parameters:
ord - the ord to look up
Returns:
the bytesStart offset into the internally used ByteBlockPool for the given ord