NumericRangeQuery (Lucene 3.6.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.search
Class NumericRangeQuery<T extends Number>

java.lang.Object
  org.apache.lucene.search.Query
      org.apache.lucene.search.MultiTermQuery
          org.apache.lucene.search.NumericRangeQuery<T>

All Implemented Interfaces:: Serializable, Cloneable

public final class NumericRangeQuery<T extends Number>
extends MultiTermQuery
extends MultiTermQuery

A Query that matches numeric values within a specified range. To use this, you must first index the numeric values using NumericField (expert: NumericTokenStream). If your terms are instead textual, you should use TermRangeQuery. NumericRangeFilter is the filter equivalent of this query.

You create a new NumericRangeQuery with the static factory methods, eg:

 Query q = NumericRangeQuery.newFloatRange("weight", 0.03f, 0.10f, true, true);

matches all documents whose float valued "weight" field ranges from 0.03 to 0.10, inclusive.

The performance of NumericRangeQuery is much better than the corresponding TermRangeQuery because the number of terms that must be searched is usually far fewer, thanks to trie indexing, described below.

You can optionally specify a precisionStep when creating this query. This is necessary if you've changed this configuration from its default (4) during indexing. Lower values consume more disk space but speed up searching. Suitable values are between 1 and 8. A good starting point to test is 4, which is the default value for all Numeric* classes. See below for details.

This query defaults to MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULT for 32 bit (int/float) ranges with precisionStep ≤8 and 64 bit (long/double) ranges with precisionStep ≤6. Otherwise it uses MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE as the number of terms is likely to be high. With precision steps of ≤4, this query can be run with one of the BooleanQuery rewrite methods without changing BooleanQuery's default max clause count.

How it works

See the publication about panFMP, where this algorithm was described (referred to as TrieRangeQuery):

Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023

A quote from this paper: Because Apache Lucene is a full-text search engine and not a conventional database, it cannot handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical values). We have developed an extension to Apache Lucene that stores the numerical values in a special string-encoded format with variable precision (all numerical values like doubles, longs, floats, and ints are converted to lexicographic sortable string representations and stored with different precisions (for a more detailed description of how the values are stored, see NumericUtils). A range is then divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.

For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the lowest precision. Overall, a range could consist of a theoretical maximum of 7*255*2 + 255 = 3825 distinct terms (when there is a term for every distinct value of an 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used because it would always be possible to reduce the full 256 values to one term with degraded precision). In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records and a uniform value distribution).

Precision Step

You can choose any precisionStep when encoding values. Lower step values mean more precisions and so more terms in index (and index gets larger). On the other hand, the maximum number of terms to match reduces, which optimized query speed. The formula to calculate the maximum term count is:

  n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )

(this formula is only correct, when bitsPerValue/precisionStep is an integer; in other cases, the value must be rounded up and the last summand must contain the modulo of the division as precision step). For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465, and for a precision step of 2, n = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking in the term enum of the index. Because of this, the ideal precisionStep value can only be found out by testing. Important: You can index with a lower precision step value and test search speed using a multiple of the original step value.

Good values for precisionStep are depending on usage and data type:

The default for all data types is 4, which is used, when no precisionStep is given.

Ideal value in most cases for 64 bit data types (long, double) is 6 or 8.

Ideal value in most cases for 32 bit data types (int, float) is 4.

For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use Integer.MAX_VALUE (see below).
Steps ≥64 for long/double and ≥32 for int/float produces one token per value in the index and querying is as slow as a conventional TermRangeQuery. But it can be used to produce fields, that are solely used for sorting (in this case simply use Integer.MAX_VALUE as precisionStep). Using NumericFields for sorting is ideal, because building the field cache is much faster than with text-only numbers. These fields have one term per value and therefore also work with term enumeration for building distinct lists (e.g. facets / preselected values to search for). Sorting is also possible with range query optimized fields using one of the above precisionSteps.

Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed that TermRangeQuery in boolean rewrite mode (with raised BooleanQuery clause count) took about 30-40 secs to complete, TermRangeQuery in constant score filter rewrite mode took 5 secs and executing this class took <100ms to complete (on an Opteron64 machine, Java 1.5, 8 bit precision step). This query type was developed for a geographic portal, where the performance for e.g. bounding boxes or exact date/time stamps is important.

Since:: 2.9
See Also:: Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.MultiTermQuery
`MultiTermQuery.ConstantScoreAutoRewrite, MultiTermQuery.RewriteMethod, MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite, MultiTermQuery.TopTermsScoringBooleanQueryRewrite`

Field Summary

Fields inherited from class org.apache.lucene.search.MultiTermQuery
`CONSTANT_SCORE_AUTO_REWRITE_DEFAULT, CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE, CONSTANT_SCORE_FILTER_REWRITE, rewriteMethod, SCORING_BOOLEAN_QUERY_REWRITE`

Method Summary
`boolean`	`equals(Object o)`
`protected FilteredTermEnum`	`getEnum(IndexReader reader)` Construct the enumeration to be used, expanding the pattern term.
`String`	`getField()` Returns the field name for this query
`T`	`getMax()` Returns the upper value of this range query
`T`	`getMin()` Returns the lower value of this range query
`int`	`getPrecisionStep()` Returns the precision step.
`int`	`hashCode()`
`boolean`	`includesMax()` Returns `true` if the upper endpoint is inclusive
`boolean`	`includesMin()` Returns `true` if the lower endpoint is inclusive
`static NumericRangeQuery<Double>`	`newDoubleRange(String field, Double min, Double max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `double` range using the default `precisionStep` `NumericUtils.PRECISION_STEP_DEFAULT` (4).
`static NumericRangeQuery<Double>`	`newDoubleRange(String field, int precisionStep, Double min, Double max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `double` range using the given `precisionStep`.
`static NumericRangeQuery<Float>`	`newFloatRange(String field, Float min, Float max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `float` range using the default `precisionStep` `NumericUtils.PRECISION_STEP_DEFAULT` (4).
`static NumericRangeQuery<Float>`	`newFloatRange(String field, int precisionStep, Float min, Float max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `float` range using the given `precisionStep`.
`static NumericRangeQuery<Integer>`	`newIntRange(String field, Integer min, Integer max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `int` range using the default `precisionStep` `NumericUtils.PRECISION_STEP_DEFAULT` (4).
`static NumericRangeQuery<Integer>`	`newIntRange(String field, int precisionStep, Integer min, Integer max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `int` range using the given `precisionStep`.
`static NumericRangeQuery<Long>`	`newLongRange(String field, int precisionStep, Long min, Long max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `long` range using the given `precisionStep`.
`static NumericRangeQuery<Long>`	`newLongRange(String field, Long min, Long max, boolean minInclusive, boolean maxInclusive)` Factory that creates a `NumericRangeQuery`, that queries a `long` range using the default `precisionStep` `NumericUtils.PRECISION_STEP_DEFAULT` (4).
`String`	`toString(String field)` Prints a query to a string, with `field` assumed to be the default field and omitted.

Methods inherited from class org.apache.lucene.search.MultiTermQuery
`clearTotalNumberOfTerms, getRewriteMethod, getTotalNumberOfTerms, incTotalNumberOfTerms, rewrite, setRewriteMethod`

Methods inherited from class org.apache.lucene.search.Query
`clone, combine, createWeight, extractTerms, getBoost, getSimilarity, mergeBooleanQueries, setBoost, toString, weight`

Methods inherited from class java.lang.Object
`finalize, getClass, notify, notifyAll, wait, wait, wait`

Method Detail

newLongRange

public static NumericRangeQuery<Long> newLongRange(String field,
                                                   int precisionStep,
                                                   Long min,
                                                   Long max,
                                                   boolean minInclusive,
                                                   boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a long range using the given precisionStep. You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

newLongRange

public static NumericRangeQuery<Long> newLongRange(String field,
                                                   Long min,
                                                   Long max,
                                                   boolean minInclusive,
                                                   boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a long range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

newIntRange

public static NumericRangeQuery<Integer> newIntRange(String field,
                                                     int precisionStep,
                                                     Integer min,
                                                     Integer max,
                                                     boolean minInclusive,
                                                     boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a int range using the given precisionStep. You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

newIntRange

public static NumericRangeQuery<Integer> newIntRange(String field,
                                                     Integer min,
                                                     Integer max,
                                                     boolean minInclusive,
                                                     boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a int range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

newDoubleRange

public static NumericRangeQuery<Double> newDoubleRange(String field,
                                                       int precisionStep,
                                                       Double min,
                                                       Double max,
                                                       boolean minInclusive,
                                                       boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a double range using the given precisionStep. You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. Double.NaN will never match a half-open range, to hit NaN use a query with min == max == Double.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

newDoubleRange

public static NumericRangeQuery<Double> newDoubleRange(String field,
                                                       Double min,
                                                       Double max,
                                                       boolean minInclusive,
                                                       boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a double range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. Double.NaN will never match a half-open range, to hit NaN use a query with min == max == Double.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

newFloatRange

public static NumericRangeQuery<Float> newFloatRange(String field,
                                                     int precisionStep,
                                                     Float min,
                                                     Float max,
                                                     boolean minInclusive,
                                                     boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a float range using the given precisionStep. You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. Float.NaN will never match a half-open range, to hit NaN use a query with min == max == Float.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

newFloatRange

public static NumericRangeQuery<Float> newFloatRange(String field,
                                                     Float min,
                                                     Float max,
                                                     boolean minInclusive,
                                                     boolean maxInclusive)

Factory that creates a NumericRangeQuery, that queries a float range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). You can have half-open ranges (which are in fact </≤ or >/≥ queries) by setting the min or max value to null. Float.NaN will never match a half-open range, to hit NaN use a query with min == max == Float.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

getEnum

protected FilteredTermEnum getEnum(IndexReader reader)
                            throws IOException

Description copied from class: MultiTermQuery

Construct the enumeration to be used, expanding the pattern term.

Specified by:: getEnum in class MultiTermQuery

Throws:: IOException

getField

public String getField()

Returns the field name for this query

includesMin

public boolean includesMin()

Returns true if the lower endpoint is inclusive

includesMax

public boolean includesMax()

Returns true if the upper endpoint is inclusive

getMin

public T getMin()

Returns the lower value of this range query

getMax

public T getMax()

Returns the upper value of this range query

getPrecisionStep

public int getPrecisionStep()

Returns the precision step.

toString

public String toString(String field)

Description copied from class: Query

Prints a query to a string, with field assumed to be the default field and omitted.

The representation used is one that is supposed to be readable by QueryParser. However, there are the following limitations:

If the query was created by the parser, the printed representation may not be exactly what was parsed. For example, characters that need to be escaped will be represented without the required backslash.
Some of the more complicated queries (e.g. span queries) don't have a representation that can be parsed by QueryParser.

Specified by:: toString in class Query

equals

public final boolean equals(Object o)

Overrides:: equals in class MultiTermQuery

hashCode

public final int hashCode()

Overrides:: hashCode in class MultiTermQuery

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.search Class NumericRangeQuery<T extends Number>

How it works

Precision Step

newLongRange

newLongRange

newIntRange

newIntRange

newDoubleRange

newDoubleRange

newFloatRange

newFloatRange

getEnum

getField

includesMin

includesMax

getMin

getMax

getPrecisionStep

toString

equals

hashCode

org.apache.lucene.search
Class NumericRangeQuery<T extends Number>