org.apache.lucene.analysis.standard.std31
Class StandardTokenizerImpl31

java.lang.Object
  extended by org.apache.lucene.analysis.standard.std31.StandardTokenizerImpl31
All Implemented Interfaces:
StandardTokenizerInterface

Deprecated. This class is only for exact backwards compatibility

@Deprecated
public final class StandardTokenizerImpl31
extends Object
implements StandardTokenizerInterface

This class implements StandardTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters:


Field Summary
static int HANGUL_TYPE
          Deprecated.  
static int HIRAGANA_TYPE
          Deprecated.  
static int IDEOGRAPHIC_TYPE
          Deprecated.  
static int KATAKANA_TYPE
          Deprecated.  
static int NUMERIC_TYPE
          Deprecated. Numbers
static int SOUTH_EAST_ASIAN_TYPE
          Deprecated. Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.).
static int WORD_TYPE
          Deprecated. Alphanumeric sequences
static int YYEOF
          Deprecated. This character denotes the end of file
static int YYINITIAL
          Deprecated. lexical states
 
Constructor Summary
StandardTokenizerImpl31(InputStream in)
          Deprecated. Creates a new scanner.
StandardTokenizerImpl31(Reader in)
          Deprecated. Creates a new scanner There is also a java.io.InputStream version of this constructor.
 
Method Summary
 int getNextToken()
          Deprecated. Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
 void getText(CharTermAttribute t)
          Deprecated. Fills CharTermAttribute with the current token text.
 void yybegin(int newState)
          Deprecated. Enters a new lexical state
 int yychar()
          Deprecated. Returns the current position.
 char yycharat(int pos)
          Deprecated. Returns the character at position pos from the matched text.
 void yyclose()
          Deprecated. Closes the input stream.
 int yylength()
          Deprecated. Returns the length of the matched text region.
 void yypushback(int number)
          Deprecated. Pushes the specified amount of characters back into the input stream.
 void yyreset(Reader reader)
          Deprecated. Resets the scanner to read from a new input stream.
 int yystate()
          Deprecated. Returns the current lexical state.
 String yytext()
          Deprecated. Returns the text matched by the current regular expression.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

YYEOF

public static final int YYEOF
Deprecated. 
This character denotes the end of file

See Also:
Constant Field Values

YYINITIAL

public static final int YYINITIAL
Deprecated. 
lexical states

See Also:
Constant Field Values

WORD_TYPE

public static final int WORD_TYPE
Deprecated. 
Alphanumeric sequences

See Also:
Constant Field Values

NUMERIC_TYPE

public static final int NUMERIC_TYPE
Deprecated. 
Numbers

See Also:
Constant Field Values

SOUTH_EAST_ASIAN_TYPE

public static final int SOUTH_EAST_ASIAN_TYPE
Deprecated. 
Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.

See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA

See Also:
Constant Field Values

IDEOGRAPHIC_TYPE

public static final int IDEOGRAPHIC_TYPE
Deprecated. 
See Also:
Constant Field Values

HIRAGANA_TYPE

public static final int HIRAGANA_TYPE
Deprecated. 
See Also:
Constant Field Values

KATAKANA_TYPE

public static final int KATAKANA_TYPE
Deprecated. 
See Also:
Constant Field Values

HANGUL_TYPE

public static final int HANGUL_TYPE
Deprecated. 
See Also:
Constant Field Values
Constructor Detail

StandardTokenizerImpl31

public StandardTokenizerImpl31(Reader in)
Deprecated. 
Creates a new scanner There is also a java.io.InputStream version of this constructor.

Parameters:
in - the java.io.Reader to read input from.

StandardTokenizerImpl31

public StandardTokenizerImpl31(InputStream in)
Deprecated. 
Creates a new scanner. There is also java.io.Reader version of this constructor.

Parameters:
in - the java.io.Inputstream to read input from.
Method Detail

yychar

public final int yychar()
Deprecated. 
Description copied from interface: StandardTokenizerInterface
Returns the current position.

Specified by:
yychar in interface StandardTokenizerInterface

getText

public final void getText(CharTermAttribute t)
Deprecated. 
Fills CharTermAttribute with the current token text.

Specified by:
getText in interface StandardTokenizerInterface

yyclose

public final void yyclose()
                   throws IOException
Deprecated. 
Closes the input stream.

Throws:
IOException

yyreset

public final void yyreset(Reader reader)
Deprecated. 
Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL. Internal scan buffer is resized down to its initial length, if it has grown.

Specified by:
yyreset in interface StandardTokenizerInterface
Parameters:
reader - the new input stream

yystate

public final int yystate()
Deprecated. 
Returns the current lexical state.


yybegin

public final void yybegin(int newState)
Deprecated. 
Enters a new lexical state

Parameters:
newState - the new lexical state

yytext

public final String yytext()
Deprecated. 
Returns the text matched by the current regular expression.


yycharat

public final char yycharat(int pos)
Deprecated. 
Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster

Parameters:
pos - the position of the character to fetch. A value from 0 to yylength()-1.
Returns:
the character at position pos

yylength

public final int yylength()
Deprecated. 
Returns the length of the matched text region.

Specified by:
yylength in interface StandardTokenizerInterface

yypushback

public void yypushback(int number)
Deprecated. 
Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method

Parameters:
number - the number of characters to be read again. This number must not be greater than yylength()!

getNextToken

public int getNextToken()
                 throws IOException
Deprecated. 
Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.

Specified by:
getNextToken in interface StandardTokenizerInterface
Returns:
the next token
Throws:
IOException - if any I/O-Error occurs