org.apache.lucene.util
Class CharacterUtils

java.lang.Object
  extended by org.apache.lucene.util.CharacterUtils

public abstract class CharacterUtils
extends Object

CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations based on a Version instance.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Nested Class Summary
static class CharacterUtils.CharacterBuffer
          A simple IO buffer to use with fill(CharacterBuffer, Reader).
 
Constructor Summary
CharacterUtils()
           
 
Method Summary
abstract  int codePointAt(char[] chars, int offset)
          Returns the code point at the given index of the char array.
abstract  int codePointAt(char[] chars, int offset, int limit)
          Returns the code point at the given index of the char array where only elements with index less than the limit are used.
abstract  int codePointAt(CharSequence seq, int offset)
          Returns the code point at the given index of the CharSequence.
abstract  boolean fill(CharacterUtils.CharacterBuffer buffer, Reader reader)
          Fills the CharacterUtils.CharacterBuffer with characters read from the given reader Reader.
static CharacterUtils getInstance(Version matchVersion)
          Returns a CharacterUtils implementation according to the given Version instance.
static CharacterUtils.CharacterBuffer newCharacterBuffer(int bufferSize)
          Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CharacterUtils

public CharacterUtils()
Method Detail

getInstance

public static CharacterUtils getInstance(Version matchVersion)
Returns a CharacterUtils implementation according to the given Version instance.

Parameters:
matchVersion - a version instance
Returns:
a CharacterUtils implementation according to the given Version instance.

codePointAt

public abstract int codePointAt(char[] chars,
                                int offset)
Returns the code point at the given index of the char array. Depending on the Version passed to getInstance(Version) this method mimics the behavior of Character.codePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Parameters:
chars - a character array
offset - the offset to the char values in the chars array to be converted
Returns:
the Unicode code point at the given index
Throws:
NullPointerException - - if the array is null.
IndexOutOfBoundsException - - if the value offset is negative or not less than the length of the char array.

codePointAt

public abstract int codePointAt(CharSequence seq,
                                int offset)
Returns the code point at the given index of the CharSequence. Depending on the Version passed to getInstance(Version) this method mimics the behavior of Character.codePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Parameters:
seq - a character sequence
offset - the offset to the char values in the chars array to be converted
Returns:
the Unicode code point at the given index
Throws:
NullPointerException - - if the sequence is null.
IndexOutOfBoundsException - - if the value offset is negative or not less than the length of the character sequence.

codePointAt

public abstract int codePointAt(char[] chars,
                                int offset,
                                int limit)
Returns the code point at the given index of the char array where only elements with index less than the limit are used. Depending on the Version passed to getInstance(Version) this method mimics the behavior of Character.codePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Parameters:
chars - a character array
offset - the offset to the char values in the chars array to be converted
limit - the index afer the last element that should be used to calculate codepoint.
Returns:
the Unicode code point at the given index
Throws:
NullPointerException - - if the array is null.
IndexOutOfBoundsException - - if the value offset is negative or not less than the length of the char array.

newCharacterBuffer

public static CharacterUtils.CharacterBuffer newCharacterBuffer(int bufferSize)
Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.

Parameters:
bufferSize - the internal char buffer size, must be >= 2
Returns:
a new CharacterUtils.CharacterBuffer instance.

fill

public abstract boolean fill(CharacterUtils.CharacterBuffer buffer,
                             Reader reader)
                      throws IOException
Fills the CharacterUtils.CharacterBuffer with characters read from the given reader Reader. This method tries to read as many characters into the CharacterUtils.CharacterBuffer as possible, each call to fill will start filling the buffer from offset 0 up to the length of the size of the internal character array.

Depending on the Version passed to getInstance(Version) this method implements supplementary character awareness when filling the given buffer. For all Version > 3.0 fill(CharacterBuffer, Reader) guarantees that the given CharacterUtils.CharacterBuffer will never contain a high surrogate character as the last element in the buffer unless it is the last available character in the reader. In other words, high and low surrogate pairs will always be preserved across buffer boarders.

Parameters:
buffer - the buffer to fill.
reader - the reader to read characters from.
Returns:
true if and only if no more characters are available in the reader, otherwise false.
Throws:
IOException - if the reader throws an IOException.