|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.util.UnicodeUtil
public final class UnicodeUtil
Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.
Field Summary | |
---|---|
static BytesRef |
BIG_TERM
A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms one would normally encounter, and definitely bigger than any UTF-8 terms. |
static int |
UNI_REPLACEMENT_CHAR
|
static int |
UNI_SUR_HIGH_END
|
static int |
UNI_SUR_HIGH_START
|
static int |
UNI_SUR_LOW_END
|
static int |
UNI_SUR_LOW_START
|
Method Summary | |
---|---|
static int |
codePointCount(BytesRef utf8)
Returns the number of code points in this utf8 sequence. |
static String |
newString(int[] codePoints,
int offset,
int count)
Cover JDK 1.5 API. |
static String |
toHexString(String s)
|
static void |
UTF16toUTF8(char[] source,
int offset,
int length,
BytesRef result)
Encode characters from a char[] source, starting at offset for length chars. |
static void |
UTF16toUTF8(CharSequence s,
int offset,
int length,
BytesRef result)
Encode characters from this String, starting at offset for length characters. |
static int |
UTF16toUTF8WithHash(char[] source,
int offset,
int length,
BytesRef result)
Encode characters from a char[] source, starting at offset for length chars. |
static void |
UTF8toUTF16(byte[] utf8,
int offset,
int length,
CharsRef chars)
Interprets the given byte array as UTF-8 and converts to UTF-16. |
static void |
UTF8toUTF16(BytesRef bytesRef,
CharsRef chars)
Utility method for UTF8toUTF16(byte[], int, int, CharsRef) |
static void |
UTF8toUTF32(BytesRef utf8,
IntsRef utf32)
|
static boolean |
validUTF16String(char[] s,
int size)
|
static boolean |
validUTF16String(CharSequence s)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final BytesRef BIG_TERM
WARNING: This is not a valid UTF8 Term
public static final int UNI_SUR_HIGH_START
public static final int UNI_SUR_HIGH_END
public static final int UNI_SUR_LOW_START
public static final int UNI_SUR_LOW_END
public static final int UNI_REPLACEMENT_CHAR
Method Detail |
---|
public static int UTF16toUTF8WithHash(char[] source, int offset, int length, BytesRef result)
public static void UTF16toUTF8(char[] source, int offset, int length, BytesRef result)
public static void UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result)
public static boolean validUTF16String(CharSequence s)
public static boolean validUTF16String(char[] s, int size)
public static int codePointCount(BytesRef utf8)
public static void UTF8toUTF32(BytesRef utf8, IntsRef utf32)
public static String newString(int[] codePoints, int offset, int count)
codePoints
- The code arrayoffset
- The start of the text in the code point arraycount
- The number of code points
IllegalArgumentException
- If an invalid code point is encountered
IndexOutOfBoundsException
- If the offset or count are out of bounds.public static String toHexString(String s)
public static void UTF8toUTF16(byte[] utf8, int offset, int length, CharsRef chars)
CharsRef
will be extended if
it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.
NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
public static void UTF8toUTF16(BytesRef bytesRef, CharsRef chars)
UTF8toUTF16(byte[], int, int, CharsRef)
UTF8toUTF16(byte[], int, int, CharsRef)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |