org.apache.lucene.analysis.hi
Class HindiNormalizer
java.lang.Object
  
org.apache.lucene.analysis.hi.HindiNormalizer
public class HindiNormalizer
- extends Object
 
Normalizer for Hindi.
 
 Normalizes text to remove some differences in spelling variations.
 
 Implements the Hindi-language specific algorithm specified in:
 Word normalization in Indian languages
 Prasad Pingali and Vasudeva Varma.
 http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf
 
 with the following additions from Hindi CLIR in Thirty Days
 Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel.
 http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:
 
  - Internal Zero-width joiner and Zero-width non-joiners are removed
  
 - In addition to chandrabindu, NA+halant is normalized to anusvara
 
 
 
| 
Method Summary | 
 int | 
normalize(char[] s,
          int len)
 
          Normalize an input buffer of Hindi text | 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
HindiNormalizer
public HindiNormalizer()
normalize
public int normalize(char[] s,
                     int len)
- Normalize an input buffer of Hindi text
- Parameters:
 s - input bufferlen - length of input buffer
- Returns:
 - length of input buffer after normalization
 
 
 
          Copyright © 2000-2012 Apache Software Foundation.  All Rights Reserved.