org.apache.lucene.analysis.ar
Class ArabicNormalizer
java.lang.Object
  
org.apache.lucene.analysis.ar.ArabicNormalizer
public class ArabicNormalizer
- extends Object
 
Normalizer for Arabic.
  
  Normalization is done in-place for efficiency, operating on a termbuffer.
  
  Normalization is defined as:
  
  -  Normalization of hamza with alef seat to a bare alef.
  
 -  Normalization of teh marbuta to heh
  
 -  Normalization of dotless yeh (alef maksura) to yeh.
  
 -  Removal of Arabic diacritics (the harakat)
  
 -  Removal of tatweel (stretching character).
 
 
 
 
| 
Method Summary | 
 int | 
normalize(char[] s,
          int len)
 
          Normalize an input buffer of Arabic text | 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
ALEF
public static final char ALEF
- See Also:
 - Constant Field Values
 
ALEF_MADDA
public static final char ALEF_MADDA
- See Also:
 - Constant Field Values
 
ALEF_HAMZA_ABOVE
public static final char ALEF_HAMZA_ABOVE
- See Also:
 - Constant Field Values
 
ALEF_HAMZA_BELOW
public static final char ALEF_HAMZA_BELOW
- See Also:
 - Constant Field Values
 
YEH
public static final char YEH
- See Also:
 - Constant Field Values
 
DOTLESS_YEH
public static final char DOTLESS_YEH
- See Also:
 - Constant Field Values
 
TEH_MARBUTA
public static final char TEH_MARBUTA
- See Also:
 - Constant Field Values
 
HEH
public static final char HEH
- See Also:
 - Constant Field Values
 
TATWEEL
public static final char TATWEEL
- See Also:
 - Constant Field Values
 
FATHATAN
public static final char FATHATAN
- See Also:
 - Constant Field Values
 
DAMMATAN
public static final char DAMMATAN
- See Also:
 - Constant Field Values
 
KASRATAN
public static final char KASRATAN
- See Also:
 - Constant Field Values
 
FATHA
public static final char FATHA
- See Also:
 - Constant Field Values
 
DAMMA
public static final char DAMMA
- See Also:
 - Constant Field Values
 
KASRA
public static final char KASRA
- See Also:
 - Constant Field Values
 
SHADDA
public static final char SHADDA
- See Also:
 - Constant Field Values
 
SUKUN
public static final char SUKUN
- See Also:
 - Constant Field Values
 
ArabicNormalizer
public ArabicNormalizer()
normalize
public int normalize(char[] s,
                     int len)
- Normalize an input buffer of Arabic text
- Parameters:
 s - input bufferlen - length of input buffer
- Returns:
 - length of input buffer after normalization
 
 
 
          Copyright © 2000-2012 Apache Software Foundation.  All Rights Reserved.