Class BloomFilter
- java.lang.Object
-
- org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.utils.BloomFilter
-
public class BloomFilter extends Object
A Bloom filter implementation.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(long hash)
Add an entry.static long
calculateBits(long n, double fpp)
Calculate the number of bits needed for a Bloom filter, given a number of entries and the k parameter.static double
calculateFpp(long n, long bits, int k)
Calculate the false positive probability.static int
calculateK(double bitsPerKey)
Calculate the best k parameter for a Bloom filter.static long
calculateN(long bits, double fpp)
Calculate the maximum number of entries in the set, given the the memory size in bits, and a target false positive probability.static BloomFilter
construct(long n, double fpp)
Construct a Bloom filter.long
getBitCount()
Get the number of bits needed for the array.long
getEstimatedEntryCount()
Get the estimated entry count (number of distinct items added).int
getK()
boolean
mayContain(long hash)
Whether the entry may be in the set.
-
-
-
Method Detail
-
construct
public static BloomFilter construct(long n, double fpp)
Construct a Bloom filter. With a fpp of 0.01, the memory usage is roughly 1 byte per entry.- Parameters:
bytes
- the size in number of bytes (eg. 64_000_000 for 64 MB memory usage)fpp
- the false-positive probability (eg. 0.01 for a 1% false-positive probability)- Returns:
- the Bloom filter
-
calculateK
public static int calculateK(double bitsPerKey)
Calculate the best k parameter for a Bloom filter.- Parameters:
bitsPerKey
- the number of bits per key (eg. 10)- Returns:
- the k parameter
-
calculateBits
public static long calculateBits(long n, double fpp)
Calculate the number of bits needed for a Bloom filter, given a number of entries and the k parameter.- Parameters:
n
- the number of entries (eg. 1_000_000)fpp
- the false positive probability (eg. 0.01)- Returns:
- the bits needed
-
calculateN
public static long calculateN(long bits, double fpp)
Calculate the maximum number of entries in the set, given the the memory size in bits, and a target false positive probability.- Parameters:
bits
- the number of bits (eg. 10_000_000)fpp
- the false positive probability (eg. 0.01)- Returns:
- the maximum number of entries to be added
-
calculateFpp
public static double calculateFpp(long n, long bits, int k)
Calculate the false positive probability.- Parameters:
bits
- the number of bits (eg. 10_000_000)fpp
- the false positive probability (eg. 0.01)- Returns:
- the maximum number of entries to be added
-
add
public void add(long hash)
Add an entry.- Parameters:
hash
- the hash value (need to be a high quality hash code, with all bits having high entropy)
-
mayContain
public boolean mayContain(long hash)
Whether the entry may be in the set.- Parameters:
hash
- the hash value (need to be a high quality hash code, with all bits having high entropy)- Returns:
- true if the entry was added, or, with a certain false positive probability, even if it was not added
-
getBitCount
public long getBitCount()
Get the number of bits needed for the array.- Returns:
- the number of bits
-
getK
public int getK()
-
getEstimatedEntryCount
public long getEstimatedEntryCount()
Get the estimated entry count (number of distinct items added). This operation is relatively slow, as it loops over all the entries.- Returns:
- the estimated entry count, or Long.MAX_VALUE if the number can not be estimated.
-
-