org.apache.lucene.analysis.icu.segmentation
Class LaoBreakIterator

java.lang.Object
  extended by com.ibm.icu.text.BreakIterator
      extended by org.apache.lucene.analysis.icu.segmentation.LaoBreakIterator
All Implemented Interfaces:
Cloneable

public class LaoBreakIterator
extends com.ibm.icu.text.BreakIterator

Syllable iterator for Lao text.

This breaks Lao text into syllables according to: Syllabification of Lao Script for Line Breaking Phonpasit Phissamay, Valaxay Dalolay, Chitaphone Chanhsililath, Oulaiphone Silimasak, Sarmad Hussain, Nadir Durrani, Science Technology and Environment Agency, CRULP.

Most work is accomplished with RBBI rules, however some additional special logic is needed that cannot be coded in a grammar, and this is implemented here.

For example, what appears to be a final consonant might instead be part of the next syllable. Rules match in a greedy fashion, leaving an illegal sequence that matches no rules.

Take for instance the text ກວ່າດອກ The first rule greedily matches ກວ່າດ, but then ອກ is encountered, which is illegal. What LaoBreakIterator does, according to the paper:

  1. backtrack and remove the ດ from the last syllable, placing it on the current syllable.
  2. verify the modified previous syllable (ກວ່າ ) is still legal.
  3. verify the modified current syllable (ດອກ) is now legal.
  4. If 2 or 3 fails, then restore the ດ to the last syllable and skip the current character.

Finally, LaoBreakIterator also takes care of the second concern mentioned in the paper. This is the issue of combining marks being in the wrong order (typos).

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
 
Fields inherited from class com.ibm.icu.text.BreakIterator
DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD
 
Constructor Summary
LaoBreakIterator(com.ibm.icu.text.RuleBasedBreakIterator rules)
           
 
Method Summary
 Object clone()
          Clone method.
 int current()
           
 int first()
           
 int following(int offset)
           
 CharacterIterator getText()
           
 int last()
           
 int next()
           
 int next(int n)
           
 int previous()
           
 void setText(CharacterIterator text)
           
 void setText(String newText)
           
 
Methods inherited from class com.ibm.icu.text.BreakIterator
getAvailableLocales, getAvailableULocales, getBreakInstance, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, isBoundary, preceding, registerInstance, registerInstance, unregister
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LaoBreakIterator

public LaoBreakIterator(com.ibm.icu.text.RuleBasedBreakIterator rules)
Method Detail

current

public int current()
Specified by:
current in class com.ibm.icu.text.BreakIterator

first

public int first()
Specified by:
first in class com.ibm.icu.text.BreakIterator

following

public int following(int offset)
Specified by:
following in class com.ibm.icu.text.BreakIterator

getText

public CharacterIterator getText()
Specified by:
getText in class com.ibm.icu.text.BreakIterator

last

public int last()
Specified by:
last in class com.ibm.icu.text.BreakIterator

next

public int next()
Specified by:
next in class com.ibm.icu.text.BreakIterator

next

public int next(int n)
Specified by:
next in class com.ibm.icu.text.BreakIterator

previous

public int previous()
Specified by:
previous in class com.ibm.icu.text.BreakIterator

setText

public void setText(CharacterIterator text)
Specified by:
setText in class com.ibm.icu.text.BreakIterator

setText

public void setText(String newText)
Overrides:
setText in class com.ibm.icu.text.BreakIterator

clone

public Object clone()
Clone method. Creates another LaoBreakIterator with the same behavior and current state as this one.

Overrides:
clone in class com.ibm.icu.text.BreakIterator
Returns:
The clone.