org.apache.lucene.facet.taxonomy
Class CategoryPath

java.lang.Object
  extended by org.apache.lucene.facet.taxonomy.CategoryPath
All Implemented Interfaces:
Serializable, Cloneable, Comparable<CategoryPath>

public class CategoryPath
extends Object
implements Serializable, Cloneable, Comparable<CategoryPath>

A CategoryPath holds a sequence of string components, specifying the hierarchical name of a category.

CategoryPath is designed to reduce the number of object allocations, in two ways: First, it keeps the components internally in two arrays, rather than keeping individual strings. Second, it allows reusing the same CategoryPath object (which can be clear()ed and new components add()ed again) and of add()'s parameter (which can be a reusable object, not just a string).

See Also:
Serialized Form
WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
protected  char[] chars
           
protected  short[] ends
           
protected  short ncomponents
           
 
Constructor Summary
CategoryPath()
          Create an empty CategoryPath object.
CategoryPath(CategoryPath existing)
          Construct a new CategoryPath object, copying the path given in an existing CategoryPath object.
CategoryPath(CategoryPath existing, int prefixLen)
          Construct a new CategoryPath object, copying a prefix with the given number of components of the path given in an existing CategoryPath object.
CategoryPath(CharSequence... components)
          Construct a new CategoryPath object, copying an existing path given as an array of strings.
CategoryPath(int capacityChars, int capacityComponents)
          Construct a new empty CategoryPath object.
CategoryPath(String pathString, char delimiter)
          Construct a new CategoryPath object, given a single string with components separated by a given delimiter character.
 
Method Summary
 void add(CharSequence component)
          Add the given component to the end of the path.
 void add(CharSequence pathString, char delimiter)
          Add the given components to the end of the path.
 void appendTo(Appendable out, char delimiter)
          Build a string representation of the path, with its components separated by the given delimiter character.
 void appendTo(Appendable out, char delimiter, int prefixLen)
          like appendTo(Appendable, char), but takes only a prefix of the path, rather than the whole path.
 void appendTo(Appendable out, char delimiter, int start, int end)
          like appendTo(Appendable, char), but takes only a part of the path, rather than the whole path.
 int capacityChars()
          Returns the current character capacity of the CategoryPath.
 int capacityComponents()
          Returns the current component capacity of the CategoryPath.
 int charsNeededForFullPath()
          Returns the number of characters required to represent this entire category path, if written using copyToCharArray(char[], int, int, char) or appendTo(Appendable, char).
 void clear()
          Empty the CategoryPath object, so that it has zero components.
 Object clone()
           
 int compareTo(CategoryPath other)
          Compares this CategoryPath with the other CategoryPath for lexicographic order.
 int copyToCharArray(char[] outputBuffer, int outputBufferStart, int numberOfComponentsToCopy, char separatorChar)
          Copies the specified number of components from this category path to the specified character array, with the components separated by a given delimiter character.
 void deserializeFromStreamReader(InputStreamReader isr)
          Serializes the content of this CategoryPath to a byte stream, using UTF-8 encoding to convert characters to bytes, and treating the ends as 16-bit characters.
 boolean equals(Object obj)
          Compare the given CategoryPath to another one.
 boolean equalsToSerialized(CharSequence buffer, int offset)
          Check whether the current path is identical to the one serialized (with serializeAppendTo(Appendable)) in the given buffer, at the given offset.
 boolean equalsToSerialized(int prefixLen, CharSequence buffer, int offset)
          Just like equalsToSerialized(CharSequence, int), but compare to a prefix of the CategoryPath, instead of the whole CategoryPath.
 String getComponent(int i)
          Return the i'th component of the path, in a new String object.
 int hashCode()
          Calculate a hashCode for this path, used when a CategoryPath serves as a hash-table key.
 int hashCode(int prefixLen)
          Like hashCode(), but find the hash function of a prefix with the given number of components, rather than of the entire path.
static int hashCodeOfSerialized(CharSequence buffer, int offset)
          This method calculates a hash function of a path that has been written to (using serializeAppendTo(Appendable)) a character buffer.
 boolean isDescendantOf(CategoryPath other)
          Test whether this object is a descendant of another CategoryPath.
 String lastComponent()
          Return the last component of the path, in a new String object.
 short length()
          Return the number of components in the facet path.
 long longHashCode()
          Calculate a 64-bit hash function for this path.
 long longHashCode(int prefixLen)
          Like longHashCode(), but find the hash function of a prefix with the given number of components, rather than of the entire path.
 void serializeAppendTo(Appendable out)
          Write out a serialized (as a character sequence) representation of the path to a given Appendable (e.g., a StringBuilder, CharBuffer, Writer, or something similar.
 void serializeAppendTo(int prefixLen, Appendable out)
          Just like serializeAppendTo(Appendable), but writes only a prefix of the CategoryPath.
 void serializeToStreamWriter(OutputStreamWriter osw)
          Serializes the content of this CategoryPath to a byte stream, using UTF-8 encoding to convert characters to bytes, and treating the ends as 16-bit characters.
 int setFromSerialized(CharSequence buffer, int offset)
          Set a CategoryPath from a character-sequence representation written by serializeAppendTo(Appendable).
 String toString()
          This method, an implementation of the Object.toString() interface, is to allow simple printing of a CategoryPath, for debugging purposes.
 String toString(char delimiter)
          Build a string representation of the path, with its components separated by the given delimiter character.
 String toString(char delimiter, int prefixLen)
          like toString(char), but takes only a prefix with a given number of components, rather than the whole path.
 String toString(char delimiter, int start, int end)
          like toString(char), but takes only a part of the path, rather than the whole path.
 void trim(int nTrim)
          Trim the last components from the path.
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

chars

protected char[] chars

ends

protected short[] ends

ncomponents

protected short ncomponents
Constructor Detail

CategoryPath

public CategoryPath(int capacityChars,
                    int capacityComponents)
Construct a new empty CategoryPath object. CategoryPath objects are meant to be reused, by add()ing components, and later clear()ing, and add()ing components again. The CategoryPath object is created with a buffer pre-allocated for a given number of characters and components, but the buffer will grow as necessary (see capacityChars() and capacityComponents()).


CategoryPath

public CategoryPath()
Create an empty CategoryPath object. Equivalent to the constructor CategoryPath(int, int) with the two initial-capacity arguments set to zero.


CategoryPath

public CategoryPath(String pathString,
                    char delimiter)
Construct a new CategoryPath object, given a single string with components separated by a given delimiter character.

The initial capacity of the constructed object will be exactly what is needed to hold the given path. This fact is convenient when creating a temporary object that will not be reused later.


CategoryPath

public CategoryPath(CharSequence... components)
Construct a new CategoryPath object, copying an existing path given as an array of strings.

The new object occupies exactly the space it needs, without any spare capacity. This is the expected behavior in the typical use case, where this constructor is used to create a temporary object which is never reused.


CategoryPath

public CategoryPath(CategoryPath existing)
Construct a new CategoryPath object, copying the path given in an existing CategoryPath object.

This copy-constructor is handy when you need to save a reference to a CategoryPath (e.g., when it serves as a key to a hash-table), but cannot save a reference to the original object because its contents can be changed later by the user. Copying the contents into a new object is a solution.

This constructor does not copy the capacity (spare buffer size) of the existing CategoryPath. Rather, the new object occupies exactly the space it needs, without any spare. This is the expected behavior in the typical use case outlined in the previous paragraph.


CategoryPath

public CategoryPath(CategoryPath existing,
                    int prefixLen)
Construct a new CategoryPath object, copying a prefix with the given number of components of the path given in an existing CategoryPath object.

If the given length is negative or bigger than the given path's actual length, the full path is taken.

This constructor is often convenient for creating a temporary object with a path's prefix, but this practice is wasteful, and therefore inadvisable. Rather, the application should be written in a way that allows considering only a prefix of a given path, without needing to make a copy of that path.

Method Detail

length

public short length()
Return the number of components in the facet path. Note that this is not the number of characters, but the number of components.


trim

public void trim(int nTrim)
Trim the last components from the path.

Parameters:
nTrim - Number of components to trim. If larger than the number of components this path has, the entire path will be cleared.

capacityChars

public int capacityChars()
Returns the current character capacity of the CategoryPath. The character capacity is the size of the internal buffer used to hold the characters of all the path's components. When a component is added and the capacity is not big enough, the buffer is automatically grown, and capacityChars() increases.


capacityComponents

public int capacityComponents()
Returns the current component capacity of the CategoryPath. The component capacity is the maximum number of components that the internal buffer can currently hold. When a component is added beyond this capacity, the buffer is automatically grown, and capacityComponents() increases.


add

public void add(CharSequence component)
Add the given component to the end of the path.

Note that when a String object is passed to this method, a reference to it is not saved (rather, its content is copied), which will lead to that String object being gc'ed. To reduce the number of garbage objects, you can pass a mutable CharBuffer instead of an immutable String to this method.


clear

public void clear()
Empty the CategoryPath object, so that it has zero components. The capacity of the object (see capacityChars() and capacityComponents()) is not reduced, so that the object can be reused without frequent reallocations.


appendTo

public void appendTo(Appendable out,
                     char delimiter)
              throws IOException
Build a string representation of the path, with its components separated by the given delimiter character. The resulting string is appended to a given Appendable, e.g., a StringBuilder, CharBuffer or Writer.

Note that the two cases of zero components and one component with zero length produce indistinguishable results (both of them append nothing). This is normally not a problem, because components should not normally have zero lengths.

An IOException can be thrown if the given Appendable's append() throws this exception.

Throws:
IOException

appendTo

public void appendTo(Appendable out,
                     char delimiter,
                     int prefixLen)
              throws IOException
like appendTo(Appendable, char), but takes only a prefix of the path, rather than the whole path.

If the given prefix length is negative or bigger than the path's actual length, the whole path is taken.

Throws:
IOException

appendTo

public void appendTo(Appendable out,
                     char delimiter,
                     int start,
                     int end)
              throws IOException
like appendTo(Appendable, char), but takes only a part of the path, rather than the whole path.

start specifies the first component in the subpath, and end is one past the last component. If start is negative, 0 is assumed, and if end is negative or past the end of the path, the path is taken until the end. Otherwise, if end<=start, nothing is appended. Nothing is appended also in the case that the path is empty.

Throws:
IOException

toString

public String toString(char delimiter)
Build a string representation of the path, with its components separated by the given delimiter character. The resulting string is returned as a new String object. To avoid this temporary object creation, consider using appendTo(Appendable, char) instead.

Note that the two cases of zero components and one component with zero length produce indistinguishable results (both of them return an empty string). This is normally not a problem, because components should not normally have zero lengths.


toString

public String toString()
This method, an implementation of the Object.toString() interface, is to allow simple printing of a CategoryPath, for debugging purposes. When possible, it recommended to avoid using it it, and rather, if you want to output the path with its components separated by a delimiter character, specify the delimiter explicitly, with toString(char).

Overrides:
toString in class Object

toString

public String toString(char delimiter,
                       int prefixLen)
like toString(char), but takes only a prefix with a given number of components, rather than the whole path.

If the given length is negative or bigger than the path's actual length, the whole path is taken.


toString

public String toString(char delimiter,
                       int start,
                       int end)
like toString(char), but takes only a part of the path, rather than the whole path.

start specifies the first component in the subpath, and end is one past the last component. If start is negative, 0 is assumed, and if end is negative or past the end of the path, the path is taken until the end. Otherwise, if end<=start, an empty string is returned. An emptry string is returned also in the case that the path is empty.


getComponent

public String getComponent(int i)
Return the i'th component of the path, in a new String object. If there is no i'th component, a null is returned.


lastComponent

public String lastComponent()
Return the last component of the path, in a new String object. If the path is empty, a null is returned.


copyToCharArray

public int copyToCharArray(char[] outputBuffer,
                           int outputBufferStart,
                           int numberOfComponentsToCopy,
                           char separatorChar)
Copies the specified number of components from this category path to the specified character array, with the components separated by a given delimiter character. The array must be large enough to hold the components and separators - the amount of needed space can be calculated with charsNeededForFullPath().

This method returns the number of characters written to the array.

Parameters:
outputBuffer - The destination character array.
outputBufferStart - The first location to write in the output array.
numberOfComponentsToCopy - The number of path components to write to the destination buffer.
separatorChar - The separator inserted between every pair of path components in the output buffer.
See Also:
charsNeededForFullPath()

charsNeededForFullPath

public int charsNeededForFullPath()
Returns the number of characters required to represent this entire category path, if written using copyToCharArray(char[], int, int, char) or appendTo(Appendable, char). This includes the number of characters in all the components, plus the number of separators between them (each one character in the aforementioned methods).


add

public void add(CharSequence pathString,
                char delimiter)
Add the given components to the end of the path. The components are given in a single string, separated by a given delimiter character. If the given string is empty, it is assumed to refer to the root (empty) category, and nothing is added to the path (rather than adding a single empty component).

Note that when a String object is passed to this method, a reference to it is not saved (rather, its content is copied), which will lead to that String object being gc'ed. To reduce the number of garbage objects, you can pass a mutable CharBuffer instead of an immutable String to this method.


clone

public Object clone()
Overrides:
clone in class Object

equals

public boolean equals(Object obj)
Compare the given CategoryPath to another one. For two category paths to be considered equal, only the path they contain needs to be identical The unused capacity of the objects is not considered in the comparison.

Overrides:
equals in class Object

isDescendantOf

public boolean isDescendantOf(CategoryPath other)
Test whether this object is a descendant of another CategoryPath. This is true if the other CategoryPath is the prefix of this.


hashCode

public int hashCode()
Calculate a hashCode for this path, used when a CategoryPath serves as a hash-table key. If two objects are equal(), their hashCodes need to be equal, so like in equal(), hashCode does not consider unused portions of the internal buffers in its calculation.

The hash function used is modeled after Java's String.hashCode() - a simple multiplicative hash function with the multiplier 31. The same hash function also appeared in Kernighan & Ritchie's second edition of "The C Programming Language" (1988).

Overrides:
hashCode in class Object

hashCode

public int hashCode(int prefixLen)
Like hashCode(), but find the hash function of a prefix with the given number of components, rather than of the entire path.


longHashCode

public long longHashCode()
Calculate a 64-bit hash function for this path. Unlike hashCode(), this method is not part of the Java standard, and is only used if explicitly called by the user.

If two objects are equal(), their hash codes need to be equal, so like in equals(Object), longHashCode does not consider unused portions of the internal buffers in its calculation.

The hash function used is a simple multiplicative hash function, with the multiplier 65599. While Java's standard multiplier 31 (used in hashCode()) gives a good distribution for ASCII strings, it turns out that for foreign-language strings (with 16-bit characters) it gives too many collisions, and a bigger multiplier produces fewer collisions in this case.


longHashCode

public long longHashCode(int prefixLen)
Like longHashCode(), but find the hash function of a prefix with the given number of components, rather than of the entire path.


serializeAppendTo

public void serializeAppendTo(Appendable out)
                       throws IOException
Write out a serialized (as a character sequence) representation of the path to a given Appendable (e.g., a StringBuilder, CharBuffer, Writer, or something similar.

This method may throw a IOException if the given Appendable threw this exception while appending.

Throws:
IOException

serializeAppendTo

public void serializeAppendTo(int prefixLen,
                              Appendable out)
                       throws IOException
Just like serializeAppendTo(Appendable), but writes only a prefix of the CategoryPath.

Throws:
IOException

setFromSerialized

public int setFromSerialized(CharSequence buffer,
                             int offset)
Set a CategoryPath from a character-sequence representation written by serializeAppendTo(Appendable).

Reading starts at the given offset into the given character sequence, and the offset right after the end of this path is returned.


equalsToSerialized

public boolean equalsToSerialized(CharSequence buffer,
                                  int offset)
Check whether the current path is identical to the one serialized (with serializeAppendTo(Appendable)) in the given buffer, at the given offset.


equalsToSerialized

public boolean equalsToSerialized(int prefixLen,
                                  CharSequence buffer,
                                  int offset)
Just like equalsToSerialized(CharSequence, int), but compare to a prefix of the CategoryPath, instead of the whole CategoryPath.


hashCodeOfSerialized

public static int hashCodeOfSerialized(CharSequence buffer,
                                       int offset)
This method calculates a hash function of a path that has been written to (using serializeAppendTo(Appendable)) a character buffer. It is guaranteed that the value returned is identical to that which hashCode() would have produced for the original object before it was serialized.


serializeToStreamWriter

public void serializeToStreamWriter(OutputStreamWriter osw)
                             throws IOException
Serializes the content of this CategoryPath to a byte stream, using UTF-8 encoding to convert characters to bytes, and treating the ends as 16-bit characters.

Parameters:
osw - The output byte stream.
Throws:
IOException - If there are encoding errors.

deserializeFromStreamReader

public void deserializeFromStreamReader(InputStreamReader isr)
                                 throws IOException
Serializes the content of this CategoryPath to a byte stream, using UTF-8 encoding to convert characters to bytes, and treating the ends as 16-bit characters.

Parameters:
isr - The input stream.
Throws:
IOException - If there are encoding errors.

compareTo

public int compareTo(CategoryPath other)
Compares this CategoryPath with the other CategoryPath for lexicographic order. Returns a negative integer, zero, or a positive integer as this CategoryPath lexicographically precedes, equals to, or lexicographically follows the other CategoryPath.

Specified by:
compareTo in interface Comparable<CategoryPath>