Class BloomFilter


  • public class BloomFilter
    extends Object
    A Bloom filter implementation.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(long hash)
      Add an entry.
      void add​(String value)
      Add an entry.
      static long calculateBits​(long n, double fpp)
      Calculate the number of bits needed for a Bloom filter, for a given false positive probability.
      static double calculateFpp​(long n, long bits, int k)
      Calculate the false positive probability.
      static int calculateK​(double bitsPerKey)
      Calculate the best k parameter for a Bloom filter.
      static long calculateN​(long bits, double fpp)
      Calculate the maximum number of entries in the set, given the memory size in bits, and a target false positive probability.
      static BloomFilter construct​(long n, double fpp)
      Construct a Bloom filter.
      long getBitCount()
      Get the number of bits needed for the array.
      long getEstimatedEntryCount()
      Get the estimated entry count (number of distinct items added).
      int getK()
      Get the k parameter (the number of hash functions for an entry).
      boolean mayContain​(long hash)
      Tests whether the entry might be in the set.
      boolean mayContain​(String value)
      Tests whether the entry might be in the set.
    • Method Detail

      • construct

        public static BloomFilter construct​(long n,
                                            double fpp)
        Construct a Bloom filter. With a fpp of 0.01, the memory usage is roughly 1 byte per entry.
        Parameters:
        n - the number of expected entries (eg. 1_000_000)
        fpp - the false-positive probability (eg. 0.01 for a 1% false-positive probability)
        Returns:
        the Bloom filter
      • calculateK

        public static int calculateK​(double bitsPerKey)
        Calculate the best k parameter for a Bloom filter. (k is the number of hash functions to use for one entry).
        Parameters:
        bitsPerKey - the number of bits per key (eg. 10)
        Returns:
        the k parameter
      • calculateBits

        public static long calculateBits​(long n,
                                         double fpp)
        Calculate the number of bits needed for a Bloom filter, for a given false positive probability.
        Parameters:
        n - the number of entries (eg. 1_000_000)
        fpp - the false positive probability (eg. 0.01)
        Returns:
        the bits needed
      • calculateN

        public static long calculateN​(long bits,
                                      double fpp)
        Calculate the maximum number of entries in the set, given the memory size in bits, and a target false positive probability.
        Parameters:
        bits - the number of bits (eg. 10_000_000)
        fpp - the false positive probability (eg. 0.01)
        Returns:
        the maximum number of entries to be added
      • calculateFpp

        public static double calculateFpp​(long n,
                                          long bits,
                                          int k)
        Calculate the false positive probability.
        Parameters:
        n - the number of entries (eg. 1_000_000)
        bits - the number of bits (eg. 10_000_000)
        k - the number of hash functions
        Returns:
        the false positive probability (eg. 0.01)
      • add

        public void add​(String value)
        Add an entry.
        Parameters:
        value - the value to add
      • add

        public void add​(long hash)
        Add an entry. Note that the false positive rate will increase if the quality of the hash value is low, eg. if only 32 bit hash values are used. If needed, use HashUtils.hash64(value) as a supplemental hash.
        Parameters:
        hash - the hash value (need to be a high quality hash code, with all bits having high entropy)
      • mayContain

        public boolean mayContain​(String value)
        Tests whether the entry might be in the set.
        Parameters:
        value - the value to check
        Returns:
        true if the entry was added, or, with a certain false positive probability, even if it was not added
      • mayContain

        public boolean mayContain​(long hash)
        Tests whether the entry might be in the set.
        Parameters:
        hash - the hash value (need to be a high quality hash code, with all bits having high entropy)
        Returns:
        true if the entry was added, or, with a certain false positive probability, even if it was not added
      • getBitCount

        public long getBitCount()
        Get the number of bits needed for the array.
        Returns:
        the number of bits
      • getK

        public int getK()
        Get the k parameter (the number of hash functions for an entry).
        Returns:
        the k parameter
      • getEstimatedEntryCount

        public long getEstimatedEntryCount()
        Get the estimated entry count (number of distinct items added). This operation is relatively slow, as it loops over all the entries.
        Returns:
        the estimated entry count, or Long.MAX_VALUE if the number can not be estimated.