Class NumericUtils
To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.
This class generates terms to achieve this: First the numerical integer values need to
be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned
and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is
sortable like the original integer value (even using UTF-8 sort order). Each value is also
prefixed (in the first char) by the shift value (number of bits removed) used
during encoding.
To also index floating point numbers, this class supplies two methods to convert them
to integer values by changing their bit layout: doubleToSortableLong(double),
floatToSortableInt(float). You will have no precision loss by
converting floating point numbers to integers and back (only that the integer form
is not usable). Other data types like dates can easily converted to longs or ints (e.g.
date to long: Date.getTime()).
For easy usage, the trie algorithm is implemented for indexing inside
NumericTokenStream that can index int, long,
float, and double. For querying,
NumericRangeQuery and NumericRangeFilter implement the query part
for the same data types.
This class can also be used, to generate lexicographically sortable (according to
BytesRef.getUTF8SortedAsUTF16Comparator()) representations of numeric data
types for other usages (e.g. sorting).
- Since:
- 2.9, API changed non backwards-compliant in 4.0
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classstatic class -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe maximum term length (used forbyte[]buffer size) for encodingintvalues.static final intThe maximum term length (used forbyte[]buffer size) for encodinglongvalues.static final intThe default precision step used byIntField,FloatField,LongField,DoubleField,NumericTokenStream,NumericRangeQuery, andNumericRangeFilter.static final byteIntegers are stored at lower precision by shifting off lower bits.static final byteLongs are stored at lower precision by shifting off lower bits. -
Method Summary
Modifier and TypeMethodDescriptionstatic longdoubleToSortableLong(double val) Converts adoublevalue to a sortable signedlong.static TermsEnumfilterPrefixCodedInts(TermsEnum termsEnum) Filters the givenTermsEnumby accepting only prefix coded 32 bit terms with a shift value of 0.static TermsEnumfilterPrefixCodedLongs(TermsEnum termsEnum) Filters the givenTermsEnumby accepting only prefix coded 64 bit terms with a shift value of 0.static intfloatToSortableInt(float val) Converts afloatvalue to a sortable signedint.static intReturns the shift value from a prefix encodedint.static intReturns the shift value from a prefix encodedlong.static intintToPrefixCoded(int val, int shift, BytesRef bytes) Returns prefix coded bits after reducing the precision byshiftbits.static voidintToPrefixCodedBytes(int val, int shift, BytesRef bytes) Returns prefix coded bits after reducing the precision byshiftbits.static intlongToPrefixCoded(long val, int shift, BytesRef bytes) Returns prefix coded bits after reducing the precision byshiftbits.static voidlongToPrefixCodedBytes(long val, int shift, BytesRef bytes) Returns prefix coded bits after reducing the precision byshiftbits.static intprefixCodedToInt(BytesRef val) Returns an int from prefixCoded bytes.static longReturns a long from prefixCoded bytes.static floatsortableIntToFloat(int val) Converts a sortableintback to afloat.static doublesortableLongToDouble(long val) Converts a sortablelongback to adouble.static voidsplitIntRange(NumericUtils.IntRangeBuilder builder, int precisionStep, int minBound, int maxBound) Splits an int range recursively.static voidsplitLongRange(NumericUtils.LongRangeBuilder builder, int precisionStep, long minBound, long maxBound) Splits a long range recursively.
-
Field Details
-
PRECISION_STEP_DEFAULT
public static final int PRECISION_STEP_DEFAULTThe default precision step used byIntField,FloatField,LongField,DoubleField,NumericTokenStream,NumericRangeQuery, andNumericRangeFilter.- See Also:
-
SHIFT_START_LONG
public static final byte SHIFT_START_LONGLongs are stored at lower precision by shifting off lower bits. The shift count is stored asSHIFT_START_LONG+shiftin the first byte- See Also:
-
BUF_SIZE_LONG
public static final int BUF_SIZE_LONGThe maximum term length (used forbyte[]buffer size) for encodinglongvalues. -
SHIFT_START_INT
public static final byte SHIFT_START_INTIntegers are stored at lower precision by shifting off lower bits. The shift count is stored asSHIFT_START_INT+shiftin the first byte- See Also:
-
BUF_SIZE_INT
public static final int BUF_SIZE_INTThe maximum term length (used forbyte[]buffer size) for encodingintvalues.
-
-
Method Details
-
longToPrefixCoded
Returns prefix coded bits after reducing the precision byshiftbits. This is method is used byNumericTokenStream. After encoding,bytes.offsetwill always be 0.- Parameters:
val- the numeric valueshift- how many bits to strip from the rightbytes- will contain the encoded value- Returns:
- the hash code for indexing (TermsHash)
-
intToPrefixCoded
Returns prefix coded bits after reducing the precision byshiftbits. This is method is used byNumericTokenStream. After encoding,bytes.offsetwill always be 0.- Parameters:
val- the numeric valueshift- how many bits to strip from the rightbytes- will contain the encoded value- Returns:
- the hash code for indexing (TermsHash)
-
longToPrefixCodedBytes
Returns prefix coded bits after reducing the precision byshiftbits. This is method is used byNumericTokenStream. After encoding,bytes.offsetwill always be 0.- Parameters:
val- the numeric valueshift- how many bits to strip from the rightbytes- will contain the encoded value
-
intToPrefixCodedBytes
Returns prefix coded bits after reducing the precision byshiftbits. This is method is used byNumericTokenStream. After encoding,bytes.offsetwill always be 0.- Parameters:
val- the numeric valueshift- how many bits to strip from the rightbytes- will contain the encoded value
-
getPrefixCodedLongShift
Returns the shift value from a prefix encodedlong.- Throws:
NumberFormatException- if the suppliedBytesRefis not correctly prefix encoded.
-
getPrefixCodedIntShift
Returns the shift value from a prefix encodedint.- Throws:
NumberFormatException- if the suppliedBytesRefis not correctly prefix encoded.
-
prefixCodedToLong
Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.- Throws:
NumberFormatException- if the suppliedBytesRefis not correctly prefix encoded.- See Also:
-
prefixCodedToInt
Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.- Throws:
NumberFormatException- if the suppliedBytesRefis not correctly prefix encoded.- See Also:
-
doubleToSortableLong
public static long doubleToSortableLong(double val) Converts adoublevalue to a sortable signedlong. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (includingDouble.NaN) is defined byDouble.compareTo(java.lang.Double);NaNis greater than positive infinity.- See Also:
-
sortableLongToDouble
public static double sortableLongToDouble(long val) Converts a sortablelongback to adouble.- See Also:
-
floatToSortableInt
public static int floatToSortableInt(float val) Converts afloatvalue to a sortable signedint. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (includingFloat.NaN) is defined byFloat.compareTo(java.lang.Float);NaNis greater than positive infinity.- See Also:
-
sortableIntToFloat
public static float sortableIntToFloat(int val) Converts a sortableintback to afloat.- See Also:
-
splitLongRange
public static void splitLongRange(NumericUtils.LongRangeBuilder builder, int precisionStep, long minBound, long maxBound) Splits a long range recursively. You may implement a builder that adds clauses to aBooleanQueryfor each call to itsNumericUtils.LongRangeBuilder.addRange(BytesRef,BytesRef)method.This method is used by
NumericRangeQuery. -
splitIntRange
public static void splitIntRange(NumericUtils.IntRangeBuilder builder, int precisionStep, int minBound, int maxBound) Splits an int range recursively. You may implement a builder that adds clauses to aBooleanQueryfor each call to itsNumericUtils.IntRangeBuilder.addRange(BytesRef,BytesRef)method.This method is used by
NumericRangeQuery. -
filterPrefixCodedLongs
Filters the givenTermsEnumby accepting only prefix coded 64 bit terms with a shift value of 0.- Parameters:
termsEnum- the terms enum to filter- Returns:
- a filtered
TermsEnumthat only returns prefix coded 64 bit terms with a shift value of 0.
-
filterPrefixCodedInts
Filters the givenTermsEnumby accepting only prefix coded 32 bit terms with a shift value of 0.- Parameters:
termsEnum- the terms enum to filter- Returns:
- a filtered
TermsEnumthat only returns prefix coded 32 bit terms with a shift value of 0.
-