Class BloomFilter
- java.lang.Object
-
- org.apache.jackrabbit.oak.commons.collections.BloomFilter
-
public class BloomFilter extends Object
A Bloom filter implementation.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(long hash)Add an entry.voidadd(String value)Add an entry.static longcalculateBits(long n, double fpp)Calculate the number of bits needed for a Bloom filter, for a given false positive probability.static doublecalculateFpp(long n, long bits, int k)Calculate the false positive probability.static intcalculateK(double bitsPerKey)Calculate the best k parameter for a Bloom filter.static longcalculateN(long bits, double fpp)Calculate the maximum number of entries in the set, given the memory size in bits, and a target false positive probability.static BloomFilterconstruct(long n, double fpp)Construct a Bloom filter.longgetBitCount()Get the number of bits needed for the array.longgetEstimatedEntryCount()Get the estimated entry count (number of distinct items added).intgetK()Get the k parameter (the number of hash functions for an entry).booleanmayContain(long hash)Tests whether the entry might be in the set.booleanmayContain(String value)Tests whether the entry might be in the set.
-
-
-
Method Detail
-
construct
public static BloomFilter construct(long n, double fpp)
Construct a Bloom filter. With a fpp of 0.01, the memory usage is roughly 1 byte per entry.- Parameters:
n- the number of expected entries (eg. 1_000_000)fpp- the false-positive probability (eg. 0.01 for a 1% false-positive probability)- Returns:
- the Bloom filter
-
calculateK
public static int calculateK(double bitsPerKey)
Calculate the best k parameter for a Bloom filter. (k is the number of hash functions to use for one entry).- Parameters:
bitsPerKey- the number of bits per key (eg. 10)- Returns:
- the k parameter
-
calculateBits
public static long calculateBits(long n, double fpp)Calculate the number of bits needed for a Bloom filter, for a given false positive probability.- Parameters:
n- the number of entries (eg. 1_000_000)fpp- the false positive probability (eg. 0.01)- Returns:
- the bits needed
-
calculateN
public static long calculateN(long bits, double fpp)Calculate the maximum number of entries in the set, given the memory size in bits, and a target false positive probability.- Parameters:
bits- the number of bits (eg. 10_000_000)fpp- the false positive probability (eg. 0.01)- Returns:
- the maximum number of entries to be added
-
calculateFpp
public static double calculateFpp(long n, long bits, int k)Calculate the false positive probability.- Parameters:
n- the number of entries (eg. 1_000_000)bits- the number of bits (eg. 10_000_000)k- the number of hash functions- Returns:
- the false positive probability (eg. 0.01)
-
add
public void add(String value)
Add an entry.- Parameters:
value- the value to add
-
add
public void add(long hash)
Add an entry. Note that the false positive rate will increase if the quality of the hash value is low, eg. if only 32 bit hash values are used. If needed, use HashUtils.hash64(value) as a supplemental hash.- Parameters:
hash- the hash value (need to be a high quality hash code, with all bits having high entropy)
-
mayContain
public boolean mayContain(String value)
Tests whether the entry might be in the set.- Parameters:
value- the value to check- Returns:
- true if the entry was added, or, with a certain false positive probability, even if it was not added
-
mayContain
public boolean mayContain(long hash)
Tests whether the entry might be in the set.- Parameters:
hash- the hash value (need to be a high quality hash code, with all bits having high entropy)- Returns:
- true if the entry was added, or, with a certain false positive probability, even if it was not added
-
getBitCount
public long getBitCount()
Get the number of bits needed for the array.- Returns:
- the number of bits
-
getK
public int getK()
Get the k parameter (the number of hash functions for an entry).- Returns:
- the k parameter
-
getEstimatedEntryCount
public long getEstimatedEntryCount()
Get the estimated entry count (number of distinct items added). This operation is relatively slow, as it loops over all the entries.- Returns:
- the estimated entry count, or Long.MAX_VALUE if the number can not be estimated.
-
-