org.apache.lucene.analysis.ar
Class ArabicNormalizer
java.lang.Object
org.apache.lucene.analysis.ar.ArabicNormalizer
public class ArabicNormalizer
- extends Object
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Method Summary |
int |
normalize(char[] s,
int len)
Normalize an input buffer of Arabic text |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ALEF
public static final char ALEF
- See Also:
- Constant Field Values
ALEF_MADDA
public static final char ALEF_MADDA
- See Also:
- Constant Field Values
ALEF_HAMZA_ABOVE
public static final char ALEF_HAMZA_ABOVE
- See Also:
- Constant Field Values
ALEF_HAMZA_BELOW
public static final char ALEF_HAMZA_BELOW
- See Also:
- Constant Field Values
YEH
public static final char YEH
- See Also:
- Constant Field Values
DOTLESS_YEH
public static final char DOTLESS_YEH
- See Also:
- Constant Field Values
TEH_MARBUTA
public static final char TEH_MARBUTA
- See Also:
- Constant Field Values
HEH
public static final char HEH
- See Also:
- Constant Field Values
TATWEEL
public static final char TATWEEL
- See Also:
- Constant Field Values
FATHATAN
public static final char FATHATAN
- See Also:
- Constant Field Values
DAMMATAN
public static final char DAMMATAN
- See Also:
- Constant Field Values
KASRATAN
public static final char KASRATAN
- See Also:
- Constant Field Values
FATHA
public static final char FATHA
- See Also:
- Constant Field Values
DAMMA
public static final char DAMMA
- See Also:
- Constant Field Values
KASRA
public static final char KASRA
- See Also:
- Constant Field Values
SHADDA
public static final char SHADDA
- See Also:
- Constant Field Values
SUKUN
public static final char SUKUN
- See Also:
- Constant Field Values
ArabicNormalizer
public ArabicNormalizer()
normalize
public int normalize(char[] s,
int len)
- Normalize an input buffer of Arabic text
- Parameters:
s
- input bufferlen
- length of input buffer
- Returns:
- length of input buffer after normalization