org.apache.nutch.util
Class Bytes

java.lang.Object
  extended by org.apache.nutch.util.Bytes

public class Bytes
extends Object

Utility class that handles byte arrays, conversions to/from other types, comparisons, hash code generation, manufacturing keys for HashMaps or HashSets, etc. Taken from Hbase Utils to avoid a dependency


Nested Class Summary
static class Bytes.ByteArrayComparator
          Byte array comparator class.
 
Field Summary
static Comparator<byte[]> BYTES_COMPARATOR
          Pass this to TreeMaps where byte [] are keys.
static RawComparator<byte[]> BYTES_RAWCOMPARATOR
          Use comparing byte arrays, byte-by-byte
static byte[] EMPTY_BYTE_ARRAY
          An empty instance.
static int ESTIMATED_HEAP_TAX
          Estimate of size cost to pay beyond payload in jvm for instance of byte [].
static int SIZEOF_BOOLEAN
          Size of boolean in bytes
static int SIZEOF_BYTE
          Size of byte in bytes
static int SIZEOF_CHAR
          Size of char in bytes
static int SIZEOF_DOUBLE
          Size of double in bytes
static int SIZEOF_FLOAT
          Size of float in bytes
static int SIZEOF_INT
          Size of int in bytes
static int SIZEOF_LONG
          Size of long in bytes
static int SIZEOF_SHORT
          Size of short in bytes
static String UTF8_ENCODING
          When we encode strings, we always specify UTF8 encoding
 
Constructor Summary
Bytes()
           
 
Method Summary
static byte[] add(byte[] a, byte[] b)
           
static byte[] add(byte[] a, byte[] b, byte[] c)
           
static int binarySearch(byte[][] arr, byte[] key, int offset, int length, RawComparator<byte[]> comparator)
          Binary search for keys in indexes.
static long bytesToVint(byte[] buffer)
           
static int compareTo(byte[] left, byte[] right)
           
static int compareTo(byte[] buffer1, int offset1, int length1, byte[] buffer2, int offset2, int length2)
          Lexographically compare two arrays.
static boolean equals(byte[] left, byte[] right)
           
static int hashCode(byte[] b)
           
static int hashCode(byte[] b, int length)
           
static byte[] head(byte[] a, int length)
           
static byte[] incrementBytes(byte[] value, long amount)
          Bytewise binary increment/deincrement of long contained in byte array on given amount.
static Iterable<byte[]> iterateOnSplits(byte[] a, byte[] b, int num)
          Iterate over keys within the passed inclusive range.
static Integer mapKey(byte[] b)
           
static Integer mapKey(byte[] b, int length)
           
static byte[] padHead(byte[] a, int length)
           
static byte[] padTail(byte[] a, int length)
           
static int putByte(byte[] bytes, int offset, byte b)
          Write a single byte out to the specified byte array position.
static int putBytes(byte[] tgtBytes, int tgtOffset, byte[] srcBytes, int srcOffset, int srcLength)
          Put bytes at the specified byte array position.
static int putDouble(byte[] bytes, int offset, double d)
           
static int putFloat(byte[] bytes, int offset, float f)
           
static int putInt(byte[] bytes, int offset, int val)
          Put an int value out to the specified byte array position.
static int putLong(byte[] bytes, int offset, long val)
          Put a long value out to the specified byte array position.
static int putShort(byte[] bytes, int offset, short val)
          Put a short value out to the specified byte array position.
static byte[] readByteArray(DataInput in)
          Read byte-array written with a WritableableUtils.vint prefix.
static byte[] readByteArrayThrowsRuntime(DataInput in)
          Read byte-array written with a WritableableUtils.vint prefix.
static long readVLong(byte[] buffer, int offset)
          Reads a zero-compressed encoded long from input stream and returns it.
static byte[][] split(byte[] a, byte[] b, int num)
          Split passed range.
static boolean startsWith(byte[] bytes, byte[] prefix)
          Return true if the byte array on the right is a prefix of the byte array on the left.
static byte[] tail(byte[] a, int length)
           
static byte toBinaryFromHex(byte ch)
          Takes a ASCII digit in the range A-F0-9 and returns the corresponding integer/ordinal value.
static boolean toBoolean(byte[] b)
          Reverses toBytes(boolean)
static byte[][] toByteArrays(byte[] column)
           
static byte[][] toByteArrays(String column)
           
static byte[][] toByteArrays(String[] t)
           
static byte[] toBytes(boolean b)
          Convert a boolean to a byte array.
static byte[] toBytes(ByteBuffer bb)
          Returns a new byte array, copied from the passed ByteBuffer.
static byte[] toBytes(double d)
          Serialize a double as the IEEE 754 double format output.
static byte[] toBytes(float f)
           
static byte[] toBytes(int val)
          Convert an int value to a byte array
static byte[] toBytes(long val)
          Convert a long value to a byte array using big-endian.
static byte[] toBytes(short val)
          Convert a short value to a byte array of SIZEOF_SHORT bytes long.
static byte[] toBytes(String s)
          Converts a string to a UTF-8 byte array.
static byte[] toBytesBinary(String in)
           
static double toDouble(byte[] bytes)
           
static double toDouble(byte[] bytes, int offset)
           
static float toFloat(byte[] bytes)
          Presumes float encoded as IEEE 754 floating-point "single format"
static float toFloat(byte[] bytes, int offset)
          Presumes float encoded as IEEE 754 floating-point "single format"
static int toInt(byte[] bytes)
          Converts a byte array to an int value
static int toInt(byte[] bytes, int offset)
          Converts a byte array to an int value
static int toInt(byte[] bytes, int offset, int length)
          Converts a byte array to an int value
static long toLong(byte[] bytes)
          Converts a byte array to a long value.
static long toLong(byte[] bytes, int offset)
          Converts a byte array to a long value.
static long toLong(byte[] bytes, int offset, int length)
          Converts a byte array to a long value.
static short toShort(byte[] bytes)
          Converts a byte array to a short value
static short toShort(byte[] bytes, int offset)
          Converts a byte array to a short value
static short toShort(byte[] bytes, int offset, int length)
          Converts a byte array to a short value
static String toString(byte[] b)
           
static String toString(byte[] b, int off, int len)
          This method will convert utf8 encoded bytes into a string.
static String toString(byte[] b1, String sep, byte[] b2)
          Joins two byte arrays together using a separator.
static String toStringBinary(byte[] b)
          Write a printable representation of a byte array.
static String toStringBinary(byte[] b, int off, int len)
          Write a printable representation of a byte array.
static byte[] vintToBytes(long vint)
           
static int writeByteArray(byte[] tgt, int tgtOffset, byte[] src, int srcOffset, int srcLength)
          Write byte-array from src to tgt with a vint length prefix.
static void writeByteArray(DataOutput out, byte[] b)
          Write byte-array with a WritableableUtils.vint prefix.
static void writeByteArray(DataOutput out, byte[] b, int offset, int length)
          Write byte-array to out with a vint length prefix.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UTF8_ENCODING

public static final String UTF8_ENCODING
When we encode strings, we always specify UTF8 encoding

See Also:
Constant Field Values

EMPTY_BYTE_ARRAY

public static final byte[] EMPTY_BYTE_ARRAY
An empty instance.


SIZEOF_BOOLEAN

public static final int SIZEOF_BOOLEAN
Size of boolean in bytes

See Also:
Constant Field Values

SIZEOF_BYTE

public static final int SIZEOF_BYTE
Size of byte in bytes

See Also:
Constant Field Values

SIZEOF_CHAR

public static final int SIZEOF_CHAR
Size of char in bytes

See Also:
Constant Field Values

SIZEOF_DOUBLE

public static final int SIZEOF_DOUBLE
Size of double in bytes

See Also:
Constant Field Values

SIZEOF_FLOAT

public static final int SIZEOF_FLOAT
Size of float in bytes

See Also:
Constant Field Values

SIZEOF_INT

public static final int SIZEOF_INT
Size of int in bytes

See Also:
Constant Field Values

SIZEOF_LONG

public static final int SIZEOF_LONG
Size of long in bytes

See Also:
Constant Field Values

SIZEOF_SHORT

public static final int SIZEOF_SHORT
Size of short in bytes

See Also:
Constant Field Values

ESTIMATED_HEAP_TAX

public static final int ESTIMATED_HEAP_TAX
Estimate of size cost to pay beyond payload in jvm for instance of byte []. Estimate based on study of jhat and jprofiler numbers.

See Also:
Constant Field Values

BYTES_COMPARATOR

public static Comparator<byte[]> BYTES_COMPARATOR
Pass this to TreeMaps where byte [] are keys.


BYTES_RAWCOMPARATOR

public static RawComparator<byte[]> BYTES_RAWCOMPARATOR
Use comparing byte arrays, byte-by-byte

Constructor Detail

Bytes

public Bytes()
Method Detail

readByteArray

public static byte[] readByteArray(DataInput in)
                            throws IOException
Read byte-array written with a WritableableUtils.vint prefix.

Parameters:
in - Input to read from.
Returns:
byte array read off in
Throws:
IOException - e

readByteArrayThrowsRuntime

public static byte[] readByteArrayThrowsRuntime(DataInput in)
Read byte-array written with a WritableableUtils.vint prefix. IOException is converted to a RuntimeException.

Parameters:
in - Input to read from.
Returns:
byte array read off in

writeByteArray

public static void writeByteArray(DataOutput out,
                                  byte[] b)
                           throws IOException
Write byte-array with a WritableableUtils.vint prefix.

Parameters:
out - output stream to be written to
b - array to write
Throws:
IOException - e

writeByteArray

public static void writeByteArray(DataOutput out,
                                  byte[] b,
                                  int offset,
                                  int length)
                           throws IOException
Write byte-array to out with a vint length prefix.

Parameters:
out - output stream
b - array
offset - offset into array
length - length past offset
Throws:
IOException - e

writeByteArray

public static int writeByteArray(byte[] tgt,
                                 int tgtOffset,
                                 byte[] src,
                                 int srcOffset,
                                 int srcLength)
Write byte-array from src to tgt with a vint length prefix.

Parameters:
tgt - target array
tgtOffset - offset into target array
src - source array
srcOffset - source offset
srcLength - source length
Returns:
New offset in src array.

putBytes

public static int putBytes(byte[] tgtBytes,
                           int tgtOffset,
                           byte[] srcBytes,
                           int srcOffset,
                           int srcLength)
Put bytes at the specified byte array position.

Parameters:
tgtBytes - the byte array
tgtOffset - position in the array
srcBytes - array to write out
srcOffset - source offset
srcLength - source length
Returns:
incremented offset

putByte

public static int putByte(byte[] bytes,
                          int offset,
                          byte b)
Write a single byte out to the specified byte array position.

Parameters:
bytes - the byte array
offset - position in the array
b - byte to write out
Returns:
incremented offset

toBytes

public static byte[] toBytes(ByteBuffer bb)
Returns a new byte array, copied from the passed ByteBuffer.

Parameters:
bb - A ByteBuffer
Returns:
the byte array

toString

public static String toString(byte[] b)
Parameters:
b - Presumed UTF-8 encoded byte array.
Returns:
String made from b

toString

public static String toString(byte[] b1,
                              String sep,
                              byte[] b2)
Joins two byte arrays together using a separator.

Parameters:
b1 - The first byte array.
sep - The separator to use.
b2 - The second byte array.

toString

public static String toString(byte[] b,
                              int off,
                              int len)
This method will convert utf8 encoded bytes into a string. If an UnsupportedEncodingException occurs, this method will eat it and return null instead.

Parameters:
b - Presumed UTF-8 encoded byte array.
off - offset into array
len - length of utf-8 sequence
Returns:
String made from b or null

toStringBinary

public static String toStringBinary(byte[] b)
Write a printable representation of a byte array.

Parameters:
b - byte array
Returns:
string
See Also:
toStringBinary(byte[], int, int)

toStringBinary

public static String toStringBinary(byte[] b,
                                    int off,
                                    int len)
Write a printable representation of a byte array. Non-printable characters are hex escaped in the format \\x%02X, eg: \x00 \x05 etc

Parameters:
b - array to write out
off - offset to start at
len - length to write
Returns:
string output

toBinaryFromHex

public static byte toBinaryFromHex(byte ch)
Takes a ASCII digit in the range A-F0-9 and returns the corresponding integer/ordinal value.

Parameters:
ch - The hex digit.
Returns:
The converted hex value as a byte.

toBytesBinary

public static byte[] toBytesBinary(String in)

toBytes

public static byte[] toBytes(String s)
Converts a string to a UTF-8 byte array.

Parameters:
s - string
Returns:
the byte array

toBytes

public static byte[] toBytes(boolean b)
Convert a boolean to a byte array. True becomes -1 and false becomes 0.

Parameters:
b - value
Returns:
b encoded in a byte array.

toBoolean

public static boolean toBoolean(byte[] b)
Reverses toBytes(boolean)

Parameters:
b - array
Returns:
True or false.

toBytes

public static byte[] toBytes(long val)
Convert a long value to a byte array using big-endian.

Parameters:
val - value to convert
Returns:
the byte array

toLong

public static long toLong(byte[] bytes)
Converts a byte array to a long value. Reverses toBytes(long)

Parameters:
bytes - array
Returns:
the long value

toLong

public static long toLong(byte[] bytes,
                          int offset)
Converts a byte array to a long value. Assumes there will be SIZEOF_LONG bytes available.

Parameters:
bytes - bytes
offset - offset
Returns:
the long value

toLong

public static long toLong(byte[] bytes,
                          int offset,
                          int length)
Converts a byte array to a long value.

Parameters:
bytes - array of bytes
offset - offset into array
length - length of data (must be SIZEOF_LONG)
Returns:
the long value
Throws:
IllegalArgumentException - if length is not SIZEOF_LONG or if there's not enough room in the array at the offset indicated.

putLong

public static int putLong(byte[] bytes,
                          int offset,
                          long val)
Put a long value out to the specified byte array position.

Parameters:
bytes - the byte array
offset - position in the array
val - long to write out
Returns:
incremented offset
Throws:
IllegalArgumentException - if the byte array given doesn't have enough room at the offset specified.

toFloat

public static float toFloat(byte[] bytes)
Presumes float encoded as IEEE 754 floating-point "single format"

Parameters:
bytes - byte array
Returns:
Float made from passed byte array.

toFloat

public static float toFloat(byte[] bytes,
                            int offset)
Presumes float encoded as IEEE 754 floating-point "single format"

Parameters:
bytes - array to convert
offset - offset into array
Returns:
Float made from passed byte array.

putFloat

public static int putFloat(byte[] bytes,
                           int offset,
                           float f)
Parameters:
bytes - byte array
offset - offset to write to
f - float value
Returns:
New offset in bytes

toBytes

public static byte[] toBytes(float f)
Parameters:
f - float value
Returns:
the float represented as byte []

toDouble

public static double toDouble(byte[] bytes)
Parameters:
bytes - byte array
Returns:
Return double made from passed bytes.

toDouble

public static double toDouble(byte[] bytes,
                              int offset)
Parameters:
bytes - byte array
offset - offset where double is
Returns:
Return double made from passed bytes.

putDouble

public static int putDouble(byte[] bytes,
                            int offset,
                            double d)
Parameters:
bytes - byte array
offset - offset to write to
d - value
Returns:
New offset into array bytes

toBytes

public static byte[] toBytes(double d)
Serialize a double as the IEEE 754 double format output. The resultant array will be 8 bytes long.

Parameters:
d - value
Returns:
the double represented as byte []

toBytes

public static byte[] toBytes(int val)
Convert an int value to a byte array

Parameters:
val - value
Returns:
the byte array

toInt

public static int toInt(byte[] bytes)
Converts a byte array to an int value

Parameters:
bytes - byte array
Returns:
the int value

toInt

public static int toInt(byte[] bytes,
                        int offset)
Converts a byte array to an int value

Parameters:
bytes - byte array
offset - offset into array
Returns:
the int value

toInt

public static int toInt(byte[] bytes,
                        int offset,
                        int length)
Converts a byte array to an int value

Parameters:
bytes - byte array
offset - offset into array
length - length of int (has to be SIZEOF_INT)
Returns:
the int value
Throws:
IllegalArgumentException - if length is not SIZEOF_INT or if there's not enough room in the array at the offset indicated.

putInt

public static int putInt(byte[] bytes,
                         int offset,
                         int val)
Put an int value out to the specified byte array position.

Parameters:
bytes - the byte array
offset - position in the array
val - int to write out
Returns:
incremented offset
Throws:
IllegalArgumentException - if the byte array given doesn't have enough room at the offset specified.

toBytes

public static byte[] toBytes(short val)
Convert a short value to a byte array of SIZEOF_SHORT bytes long.

Parameters:
val - value
Returns:
the byte array

toShort

public static short toShort(byte[] bytes)
Converts a byte array to a short value

Parameters:
bytes - byte array
Returns:
the short value

toShort

public static short toShort(byte[] bytes,
                            int offset)
Converts a byte array to a short value

Parameters:
bytes - byte array
offset - offset into array
Returns:
the short value

toShort

public static short toShort(byte[] bytes,
                            int offset,
                            int length)
Converts a byte array to a short value

Parameters:
bytes - byte array
offset - offset into array
length - length, has to be SIZEOF_SHORT
Returns:
the short value
Throws:
IllegalArgumentException - if length is not SIZEOF_SHORT or if there's not enough room in the array at the offset indicated.

putShort

public static int putShort(byte[] bytes,
                           int offset,
                           short val)
Put a short value out to the specified byte array position.

Parameters:
bytes - the byte array
offset - position in the array
val - short to write out
Returns:
incremented offset
Throws:
IllegalArgumentException - if the byte array given doesn't have enough room at the offset specified.

vintToBytes

public static byte[] vintToBytes(long vint)
Parameters:
vint - Integer to make a vint of.
Returns:
Vint as bytes array.

bytesToVint

public static long bytesToVint(byte[] buffer)
Parameters:
buffer - buffer to convert
Returns:
vint bytes as an integer.

readVLong

public static long readVLong(byte[] buffer,
                             int offset)
                      throws IOException
Reads a zero-compressed encoded long from input stream and returns it.

Parameters:
buffer - Binary array
offset - Offset into array at which vint begins.
Returns:
deserialized long from stream.
Throws:
IOException - e

compareTo

public static int compareTo(byte[] left,
                            byte[] right)
Parameters:
left - left operand
right - right operand
Returns:
0 if equal, < 0 if left is less than right, etc.

compareTo

public static int compareTo(byte[] buffer1,
                            int offset1,
                            int length1,
                            byte[] buffer2,
                            int offset2,
                            int length2)
Lexographically compare two arrays.

Parameters:
buffer1 - left operand
buffer2 - right operand
offset1 - Where to start comparing in the left buffer
offset2 - Where to start comparing in the right buffer
length1 - How much to compare from the left buffer
length2 - How much to compare from the right buffer
Returns:
0 if equal, < 0 if left is less than right, etc.

equals

public static boolean equals(byte[] left,
                             byte[] right)
Parameters:
left - left operand
right - right operand
Returns:
True if equal

startsWith

public static boolean startsWith(byte[] bytes,
                                 byte[] prefix)
Return true if the byte array on the right is a prefix of the byte array on the left.


hashCode

public static int hashCode(byte[] b)
Parameters:
b - bytes to hash
Returns:
Runs WritableComparator.hashBytes(byte[], int) on the passed in array. This method is what Text and ImmutableBytesWritable use calculating hash code.

hashCode

public static int hashCode(byte[] b,
                           int length)
Parameters:
b - value
length - length of the value
Returns:
Runs WritableComparator.hashBytes(byte[], int) on the passed in array. This method is what Text and ImmutableBytesWritable use calculating hash code.

mapKey

public static Integer mapKey(byte[] b)
Parameters:
b - bytes to hash
Returns:
A hash of b as an Integer that can be used as key in Maps.

mapKey

public static Integer mapKey(byte[] b,
                             int length)
Parameters:
b - bytes to hash
length - length to hash
Returns:
A hash of b as an Integer that can be used as key in Maps.

add

public static byte[] add(byte[] a,
                         byte[] b)
Parameters:
a - lower half
b - upper half
Returns:
New array that has a in lower half and b in upper half.

add

public static byte[] add(byte[] a,
                         byte[] b,
                         byte[] c)
Parameters:
a - first third
b - second third
c - third third
Returns:
New array made from a, b and c

head

public static byte[] head(byte[] a,
                          int length)
Parameters:
a - array
length - amount of bytes to grab
Returns:
First length bytes from a

tail

public static byte[] tail(byte[] a,
                          int length)
Parameters:
a - array
length - amount of bytes to snarf
Returns:
Last length bytes from a

padHead

public static byte[] padHead(byte[] a,
                             int length)
Parameters:
a - array
length - new array size
Returns:
Value in a plus length prepended 0 bytes

padTail

public static byte[] padTail(byte[] a,
                             int length)
Parameters:
a - array
length - new array size
Returns:
Value in a plus length appended 0 bytes

split

public static byte[][] split(byte[] a,
                             byte[] b,
                             int num)
Split passed range. Expensive operation relatively. Uses BigInteger math. Useful splitting ranges for MapReduce jobs.

Parameters:
a - Beginning of range
b - End of range
num - Number of times to split range. Pass 1 if you want to split the range in two; i.e. one split.
Returns:
Array of dividing values

iterateOnSplits

public static Iterable<byte[]> iterateOnSplits(byte[] a,
                                               byte[] b,
                                               int num)
Iterate over keys within the passed inclusive range.


toByteArrays

public static byte[][] toByteArrays(String[] t)
Parameters:
t - operands
Returns:
Array of byte arrays made from passed array of Text

toByteArrays

public static byte[][] toByteArrays(String column)
Parameters:
column - operand
Returns:
A byte array of a byte array where first and only entry is column

toByteArrays

public static byte[][] toByteArrays(byte[] column)
Parameters:
column - operand
Returns:
A byte array of a byte array where first and only entry is column

binarySearch

public static int binarySearch(byte[][] arr,
                               byte[] key,
                               int offset,
                               int length,
                               RawComparator<byte[]> comparator)
Binary search for keys in indexes.

Parameters:
arr - array of byte arrays to search for
key - the key you want to find
offset - the offset in the key you want to find
length - the length of the key
comparator - a comparator to compare.
Returns:
index of key

incrementBytes

public static byte[] incrementBytes(byte[] value,
                                    long amount)
                             throws IOException
Bytewise binary increment/deincrement of long contained in byte array on given amount.

Parameters:
value - - array of bytes containing long (length <= SIZEOF_LONG)
amount - value will be incremented on (deincremented if negative)
Returns:
array of bytes containing incremented long (length == SIZEOF_LONG)
Throws:
IOException - - if value.length > SIZEOF_LONG


Copyright © 2012 The Apache Software Foundation