org.apache.lucene.analysis.hi
Class HindiNormalizer

java.lang.Object
  extended by org.apache.lucene.analysis.hi.HindiNormalizer

public class HindiNormalizer
extends Object

Normalizer for Hindi.

Normalizes text to remove some differences in spelling variations.

Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma. http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf

with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel. http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:


Constructor Summary
HindiNormalizer()
           
 
Method Summary
 int normalize(char[] s, int len)
          Normalize an input buffer of Hindi text
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HindiNormalizer

public HindiNormalizer()
Method Detail

normalize

public int normalize(char[] s,
                     int len)
Normalize an input buffer of Hindi text

Parameters:
s - input buffer
len - length of input buffer
Returns:
length of input buffer after normalization