org.apache.lucene.analysis.ja.dict
Class UserDictionary

java.lang.Object
  extended by org.apache.lucene.analysis.ja.dict.UserDictionary
All Implemented Interfaces:
Dictionary

public final class UserDictionary
extends Object
implements Dictionary

Class for building a User Dictionary. This class allows for custom segmentation of phrases.


Field Summary
static int LEFT_ID
           
static int RIGHT_ID
           
static int WORD_COST
           
 
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
 
Constructor Summary
UserDictionary(Reader reader)
           
 
Method Summary
 String getBaseForm(int wordId, char[] surface, int off, int len)
          Get base form of word
 TokenInfoFST getFST()
           
 String getInflectionForm(int wordId)
          Get inflection form of tokens
 String getInflectionType(int wordId)
          Get inflection type of tokens
 int getLeftId(int wordId)
          Get left id of specified word
 String getPartOfSpeech(int wordId)
          Get Part-Of-Speech of tokens
 String getPronunciation(int wordId, char[] surface, int off, int len)
          Get pronunciation of tokens
 String getReading(int wordId, char[] surface, int off, int len)
          Get reading of tokens
 int getRightId(int wordId)
          Get right id of specified word
 int getWordCost(int wordId)
          Get word cost of specified word
 int[][] lookup(char[] chars, int off, int len)
          Lookup words in text
 int[] lookupSegmentation(int phraseID)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

WORD_COST

public static final int WORD_COST
See Also:
Constant Field Values

LEFT_ID

public static final int LEFT_ID
See Also:
Constant Field Values

RIGHT_ID

public static final int RIGHT_ID
See Also:
Constant Field Values
Constructor Detail

UserDictionary

public UserDictionary(Reader reader)
               throws IOException
Throws:
IOException
Method Detail

lookup

public int[][] lookup(char[] chars,
                      int off,
                      int len)
               throws IOException
Lookup words in text

Parameters:
chars - text
off - offset into text
len - length of text
Returns:
array of {wordId, position, length}
Throws:
IOException

getFST

public TokenInfoFST getFST()

lookupSegmentation

public int[] lookupSegmentation(int phraseID)

getLeftId

public int getLeftId(int wordId)
Description copied from interface: Dictionary
Get left id of specified word

Specified by:
getLeftId in interface Dictionary
Returns:
left id

getRightId

public int getRightId(int wordId)
Description copied from interface: Dictionary
Get right id of specified word

Specified by:
getRightId in interface Dictionary
Returns:
left id

getWordCost

public int getWordCost(int wordId)
Description copied from interface: Dictionary
Get word cost of specified word

Specified by:
getWordCost in interface Dictionary
Returns:
left id

getReading

public String getReading(int wordId,
                         char[] surface,
                         int off,
                         int len)
Description copied from interface: Dictionary
Get reading of tokens

Specified by:
getReading in interface Dictionary
Parameters:
wordId - word ID of token
Returns:
Reading of the token

getPartOfSpeech

public String getPartOfSpeech(int wordId)
Description copied from interface: Dictionary
Get Part-Of-Speech of tokens

Specified by:
getPartOfSpeech in interface Dictionary
Parameters:
wordId - word ID of token
Returns:
Part-Of-Speech of the token

getBaseForm

public String getBaseForm(int wordId,
                          char[] surface,
                          int off,
                          int len)
Description copied from interface: Dictionary
Get base form of word

Specified by:
getBaseForm in interface Dictionary
Parameters:
wordId - word ID of token
Returns:
Base form (only different for inflected words, otherwise null)

getPronunciation

public String getPronunciation(int wordId,
                               char[] surface,
                               int off,
                               int len)
Description copied from interface: Dictionary
Get pronunciation of tokens

Specified by:
getPronunciation in interface Dictionary
Parameters:
wordId - word ID of token
Returns:
Pronunciation of the token

getInflectionType

public String getInflectionType(int wordId)
Description copied from interface: Dictionary
Get inflection type of tokens

Specified by:
getInflectionType in interface Dictionary
Parameters:
wordId - word ID of token
Returns:
inflection type, or null

getInflectionForm

public String getInflectionForm(int wordId)
Description copied from interface: Dictionary
Get inflection form of tokens

Specified by:
getInflectionForm in interface Dictionary
Parameters:
wordId - word ID of token
Returns:
inflection form, or null