|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.util.UnicodeUtil
public final class UnicodeUtil
Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.
Nested Class Summary | |
---|---|
static class |
UnicodeUtil.UTF16Result
Holds decoded UTF16 code units. |
static class |
UnicodeUtil.UTF8Result
Holds decoded UTF8 code units. |
Field Summary | |
---|---|
static int |
UNI_REPLACEMENT_CHAR
|
static int |
UNI_SUR_HIGH_END
|
static int |
UNI_SUR_HIGH_START
|
static int |
UNI_SUR_LOW_END
|
static int |
UNI_SUR_LOW_START
|
Method Summary | |
---|---|
static String |
newString(int[] codePoints,
int offset,
int count)
Cover JDK 1.5 API. |
static void |
UTF16toUTF8(char[] source,
int offset,
int length,
BytesRef result)
Encode characters from a char[] source, starting at offset for length chars. |
static void |
UTF16toUTF8(char[] source,
int offset,
int length,
UnicodeUtil.UTF8Result result)
Encode characters from a char[] source, starting at offset for length chars. |
static void |
UTF16toUTF8(char[] source,
int offset,
UnicodeUtil.UTF8Result result)
Encode characters from a char[] source, starting at offset and stopping when the character 0xffff is seen. |
static void |
UTF16toUTF8(CharSequence s,
int offset,
int length,
BytesRef result)
Encode characters from this String, starting at offset for length characters. |
static void |
UTF16toUTF8(String s,
int offset,
int length,
UnicodeUtil.UTF8Result result)
Encode characters from this String, starting at offset for length characters. |
static int |
UTF16toUTF8WithHash(char[] source,
int offset,
int length,
BytesRef result)
Encode characters from a char[] source, starting at offset for length chars. |
static void |
UTF8toUTF16(byte[] utf8,
int offset,
int length,
CharsRef chars)
Interprets the given byte array as UTF-8 and converts to UTF-16. |
static void |
UTF8toUTF16(byte[] utf8,
int offset,
int length,
UnicodeUtil.UTF16Result result)
Convert UTF8 bytes into UTF16 characters. |
static void |
UTF8toUTF16(BytesRef bytesRef,
CharsRef chars)
Utility method for UTF8toUTF16(byte[], int, int, CharsRef) |
static boolean |
validUTF16String(CharSequence s)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int UNI_SUR_HIGH_START
public static final int UNI_SUR_HIGH_END
public static final int UNI_SUR_LOW_START
public static final int UNI_SUR_LOW_END
public static final int UNI_REPLACEMENT_CHAR
Method Detail |
---|
public static int UTF16toUTF8WithHash(char[] source, int offset, int length, BytesRef result)
public static void UTF16toUTF8(char[] source, int offset, UnicodeUtil.UTF8Result result)
public static void UTF16toUTF8(char[] source, int offset, int length, UnicodeUtil.UTF8Result result)
public static void UTF16toUTF8(String s, int offset, int length, UnicodeUtil.UTF8Result result)
public static void UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result)
public static void UTF16toUTF8(char[] source, int offset, int length, BytesRef result)
public static void UTF8toUTF16(byte[] utf8, int offset, int length, UnicodeUtil.UTF16Result result)
public static String newString(int[] codePoints, int offset, int count)
codePoints
- The code arrayoffset
- The start of the text in the code point arraycount
- The number of code points
IllegalArgumentException
- If an invalid code point is encountered
IndexOutOfBoundsException
- If the offset or count are out of bounds.public static void UTF8toUTF16(byte[] utf8, int offset, int length, CharsRef chars)
CharsRef
will be extended if
it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.
NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
public static void UTF8toUTF16(BytesRef bytesRef, CharsRef chars)
UTF8toUTF16(byte[], int, int, CharsRef)
UTF8toUTF16(byte[], int, int, CharsRef)
public static boolean validUTF16String(CharSequence s)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |