Class EliasFanoEncoder
- java.lang.Object
-
- org.apache.lucene.util.packed.EliasFanoEncoder
-
public class EliasFanoEncoder extends Object
Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.The Elias-Fano encoding is a high bits / low bits representation of a monotonically increasing sequence of
numValues > 0
natural numbersx[i]
0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound
where
upperBound > 0
is an upper bound on the last value.
The Elias-Fano encoding uses less than half a bit per encoded number more than the smallest representation that can encode any monotone sequence with the same bounds.The lower
L
bits of eachx[i]
are stored explicitly and contiguously in the lower-bits array, withL
chosen as (log()
base 2):L = max(0, floor(log(upperBound/numValues)))
The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (
x[-1] = 0
):(x[i]/2**L) - (x[i-1]/2**L)
The unary code encodes a natural number
n
byn
0 bits followed by a 1 bit:0...01
.
In the upper bits the total the number of 1 bits isnumValues
and the total number of 0 bits is:floor(x[numValues-1]/2**L) <= upperBound/(2**max(0, floor(log(upperBound/numValues)))) <= 2*numValues
The Elias-Fano encoding uses at most
2 + ceil(log(upperBound/numValues))
bits per encoded number. With
upperBound
in these bounds (p
is an integer):2**p < x[numValues-1] <= upperBound <= 2**(p+1)
the number of bits per encoded number is minimized.
In this implementation the values in the sequence can be given as
long
,numValues = 0
andupperBound = 0
are allowed, and each of the upper and lower bit arrays should fit in along[]
.
An index of positions of zero's in the upper bits is also built.This implementation is based on this article:
Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3, 4 and 9. Retrieved from http://arxiv.org/pdf/1206.4300 .The articles originally describing the Elias-Fano representation are:
Peter Elias, "Efficient storage and retrieval by content and address of static files", J. Assoc. Comput. Mach., 21(2):246–260, 1974.
Robert M. Fano, "On the number of bits required to implement an associative memory", Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.
-
-
Field Summary
Fields Modifier and Type Field Description static long
DEFAULT_INDEX_INTERVAL
The default index interval for zero upper bits.
-
Constructor Summary
Constructors Constructor Description EliasFanoEncoder(long numValues, long upperBound)
Construct an Elias-Fano encoder usingDEFAULT_INDEX_INTERVAL
.EliasFanoEncoder(long numValues, long upperBound, long indexInterval)
Construct an Elias-Fano encoder.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
encodeNext(long x)
Call at mostnumValues
times to encode a non decreasing sequence of non negative numbers.boolean
equals(Object other)
EliasFanoDecoder
getDecoder()
Returns anEliasFanoDecoder
to access the encoded values.long[]
getIndexBits()
Expert.long[]
getLowerBits()
Expert.long[]
getUpperBits()
Expert.int
hashCode()
static boolean
sufficientlySmallerThanBitSet(long numValues, long upperBound)
Provide an indication that it is better to use anEliasFanoEncoder
than aFixedBitSet
to encode document identifiers.String
toString()
-
-
-
Field Detail
-
DEFAULT_INDEX_INTERVAL
public static final long DEFAULT_INDEX_INTERVAL
The default index interval for zero upper bits.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
EliasFanoEncoder
public EliasFanoEncoder(long numValues, long upperBound, long indexInterval)
Construct an Elias-Fano encoder. After construction, callencodeNext(long)
numValues
times to encode a non decreasing sequence of non negative numbers.- Parameters:
numValues
- The number of values that is to be encoded.upperBound
- At least the highest value that will be encoded. For space efficiency this should not exceed the power of two that equals or is the first higher than the actual maximum.
WhennumValues >= (upperBound/3)
aFixedBitSet
will take less space.indexInterval
- The number of high zero bits for which a single index entry is built. The index will have at most2 * numValues / indexInterval
entries and each index entry will use at mostceil(log2(3 * numValues))
bits, seeEliasFanoEncoder
.- Throws:
IllegalArgumentException
- when:numValues
is negative, ornumValues
is non negative andupperBound
is negative, or- the low bits do not fit in a
long[]
:(L * numValues / 64) > Integer.MAX_VALUE
, or - the high bits do not fit in a
long[]
:(2 * numValues / 64) > Integer.MAX_VALUE
, or indexInterval < 2
,- the index bits do not fit in a
long[]
:(numValues / indexInterval * ceil(2log(3 * numValues)) / 64) > Integer.MAX_VALUE
.
-
EliasFanoEncoder
public EliasFanoEncoder(long numValues, long upperBound)
Construct an Elias-Fano encoder usingDEFAULT_INDEX_INTERVAL
.
-
-
Method Detail
-
encodeNext
public void encodeNext(long x)
Call at mostnumValues
times to encode a non decreasing sequence of non negative numbers.- Parameters:
x
- The next number to be encoded.- Throws:
IllegalStateException
- when called more thannumValues
times.IllegalArgumentException
- when:x
is smaller than an earlier encoded value, orx
is larger thanupperBound
.
-
sufficientlySmallerThanBitSet
public static boolean sufficientlySmallerThanBitSet(long numValues, long upperBound)
Provide an indication that it is better to use anEliasFanoEncoder
than aFixedBitSet
to encode document identifiers. This indication is not precise and may change in the future.
An EliasFanoEncoder is favoured when the size of the encoding by the EliasFanoEncoder (including some space for its index) is at most about 5/6 of the size of the FixedBitSet, this is the same as comparing estimates of the number of bits accessed by a pair of FixedBitSets and by a pair of non indexed EliasFanoDocIdSets when determining the intersections of the pairs.
A bit set is preferred whenupperbound <= 256
.
It is assumed thatDEFAULT_INDEX_INTERVAL
is used.- Parameters:
numValues
- The number of document identifiers that is to be encoded. Should be non negative.upperBound
- The maximum possible value for a document identifier. Should be at leastnumValues
.
-
getDecoder
public EliasFanoDecoder getDecoder()
Returns anEliasFanoDecoder
to access the encoded values. Perform all calls toencodeNext(long)
before callinggetDecoder()
.
-
getLowerBits
public long[] getLowerBits()
Expert. The low bits.
-
getUpperBits
public long[] getUpperBits()
Expert. The high bits.
-
getIndexBits
public long[] getIndexBits()
Expert. The index bits.
-
-