org.apache.lucene.search.spell
Class NGramDistance

java.lang.Object
  extended by org.apache.lucene.search.spell.NGramDistance
All Implemented Interfaces:
StringDistance

public class NGramDistance
extends Object
implements StringDistance

N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram similarity and distance". Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005. http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf This implementation uses the position-based optimization to compute partial matches of n-gram sub-strings and adds a null-character prefix of size n-1 so that the first character is contained in the same number of n-grams as a middle character. Null-character prefix matches are discounted so that strings with no matching characters will return a distance of 0.


Constructor Summary
NGramDistance()
          Creates an N-Gram distance measure using n-grams of size 2.
NGramDistance(int size)
          Creates an N-Gram distance measure using n-grams of the specified size.
 
Method Summary
 float getDistance(String source, String target)
          Returns a float between 0 and 1 based on how similar the specified strings are to one another.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NGramDistance

public NGramDistance(int size)
Creates an N-Gram distance measure using n-grams of the specified size.

Parameters:
size - The size of the n-gram to be used to compute the string distance.

NGramDistance

public NGramDistance()
Creates an N-Gram distance measure using n-grams of size 2.

Method Detail

getDistance

public float getDistance(String source,
                         String target)
Description copied from interface: StringDistance
Returns a float between 0 and 1 based on how similar the specified strings are to one another. Returning a value of 1 means the specified strings are identical and 0 means the string are maximally different.

Specified by:
getDistance in interface StringDistance
Parameters:
source - The first string.
target - The second string.
Returns:
a float between 0 and 1 based on how similar the specified strings are to one another.