org.apache.lucene.analysis.cn.smart
Class Utility

java.lang.Object
  extended by org.apache.lucene.analysis.cn.smart.Utility

public class Utility
extends Object

SmartChineseAnalyzer utility constants and methods

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
static char[] COMMON_DELIMITER
          Delimiters will be filtered to this character by SegTokenFilter
static char[] END_CHAR_ARRAY
           
static int MAX_FREQUENCE
          Maximum bigram frequency (used in the smoothing function).
static char[] NUMBER_CHAR_ARRAY
           
static String SPACES
          Space-like characters that need to be skipped: such as space, tab, newline, carriage return.
static char[] START_CHAR_ARRAY
           
static char[] STRING_CHAR_ARRAY
           
 
Constructor Summary
Utility()
           
 
Method Summary
static int compareArray(char[] larray, int lstartIndex, char[] rarray, int rstartIndex)
          compare two arrays starting at the specified offsets.
static int compareArrayByPrefix(char[] shortArray, int shortIndex, char[] longArray, int longIndex)
          Compare two arrays, starting at the specified offsets, but treating shortArray as a prefix to longArray.
static int getCharType(char ch)
          Return the internal CharType constant of a given character.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STRING_CHAR_ARRAY

public static final char[] STRING_CHAR_ARRAY

NUMBER_CHAR_ARRAY

public static final char[] NUMBER_CHAR_ARRAY

START_CHAR_ARRAY

public static final char[] START_CHAR_ARRAY

END_CHAR_ARRAY

public static final char[] END_CHAR_ARRAY

COMMON_DELIMITER

public static final char[] COMMON_DELIMITER
Delimiters will be filtered to this character by SegTokenFilter


SPACES

public static final String SPACES
Space-like characters that need to be skipped: such as space, tab, newline, carriage return.

See Also:
Constant Field Values

MAX_FREQUENCE

public static final int MAX_FREQUENCE
Maximum bigram frequency (used in the smoothing function).

See Also:
Constant Field Values
Constructor Detail

Utility

public Utility()
Method Detail

compareArray

public static int compareArray(char[] larray,
                               int lstartIndex,
                               char[] rarray,
                               int rstartIndex)
compare two arrays starting at the specified offsets.

Parameters:
larray - left array
lstartIndex - start offset into larray
rarray - right array
rstartIndex - start offset into rarray
Returns:
0 if the arrays are equal,1 if larray > rarray, -1 if larray < rarray

compareArrayByPrefix

public static int compareArrayByPrefix(char[] shortArray,
                                       int shortIndex,
                                       char[] longArray,
                                       int longIndex)
Compare two arrays, starting at the specified offsets, but treating shortArray as a prefix to longArray. As long as shortArray is a prefix of longArray, return 0. Otherwise, behave as compareArray(char[], int, char[], int)

Parameters:
shortArray - prefix array
shortIndex - offset into shortArray
longArray - long array (word)
longIndex - offset into longArray
Returns:
0 if shortArray is a prefix of longArray, otherwise act as compareArray(char[], int, char[], int)

getCharType

public static int getCharType(char ch)
Return the internal CharType constant of a given character.

Parameters:
ch - input character
Returns:
constant from CharType describing the character type.
See Also:
CharType