|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.search.similar.SimilarityQueries
public final class SimilarityQueries
Simple similarity measures.
MoreLikeThis
Method Summary | |
---|---|
static Query |
formSimilarQuery(String body,
Analyzer a,
String field,
Set<?> stop)
Simple similarity query generators. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static Query formSimilarQuery(String body, Analyzer a, String field, Set<?> stop) throws IOException
IndexSearcher
for similar docs.
The only caveat is the first hit returned should be your source document - you'll
need to then ignore that.
So, if you have a code fragment like this:
Query q = formSimilaryQuery( "I use Lucene to search fast. Fast searchers are good", new StandardAnalyzer(), "contents", null);
The query returned, in string form, will be '(i use lucene to search fast searchers are good')
.
The philosophy behind this method is "two documents are similar if they share lots of words". Note that behind the scenes, Lucene's scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.
This method is fail-safe in that if a long 'body' is passed in and
BooleanQuery.add()
(used internally)
throws
BooleanQuery.TooManyClauses
, the
query as it is will be returned.
body
- the body of the document you want to find similar documents toa
- the analyzer to use to parse the bodyfield
- the field you want to search on, probably something like "contents" or "body"stop
- optional set of stop words to ignore
IOException
- this can't happen...
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |