Class BloomFilter


  • public class BloomFilter
    extends java.lang.Object
    A Bloom filter implementation.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(long hash)
      Add an entry.
      static long calculateBits​(long n, double fpp)
      Calculate the number of bits needed for a Bloom filter, given a number of entries and the k parameter.
      static double calculateFpp​(long n, long bits, int k)
      Calculate the false positive probability.
      static int calculateK​(double bitsPerKey)
      Calculate the best k parameter for a Bloom filter.
      static long calculateN​(long bits, double fpp)
      Calculate the maximum number of entries in the set, given the the memory size in bits, and a target false positive probability.
      static BloomFilter construct​(long n, double fpp)
      Construct a Bloom filter.
      long getBitCount()
      Get the number of bits needed for the array.
      long getEstimatedEntryCount()
      Get the estimated entry count (number of distinct items added).
      int getK()  
      boolean mayContain​(long hash)
      Whether the entry may be in the set.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • construct

        public static BloomFilter construct​(long n,
                                            double fpp)
        Construct a Bloom filter. With a fpp of 0.01, the memory usage is roughly 1 byte per entry.
        Parameters:
        bytes - the size in number of bytes (eg. 64_000_000 for 64 MB memory usage)
        fpp - the false-positive probability (eg. 0.01 for a 1% false-positive probability)
        Returns:
        the Bloom filter
      • calculateK

        public static int calculateK​(double bitsPerKey)
        Calculate the best k parameter for a Bloom filter.
        Parameters:
        bitsPerKey - the number of bits per key (eg. 10)
        Returns:
        the k parameter
      • calculateBits

        public static long calculateBits​(long n,
                                         double fpp)
        Calculate the number of bits needed for a Bloom filter, given a number of entries and the k parameter.
        Parameters:
        n - the number of entries (eg. 1_000_000)
        fpp - the false positive probability (eg. 0.01)
        Returns:
        the bits needed
      • calculateN

        public static long calculateN​(long bits,
                                      double fpp)
        Calculate the maximum number of entries in the set, given the the memory size in bits, and a target false positive probability.
        Parameters:
        bits - the number of bits (eg. 10_000_000)
        fpp - the false positive probability (eg. 0.01)
        Returns:
        the maximum number of entries to be added
      • calculateFpp

        public static double calculateFpp​(long n,
                                          long bits,
                                          int k)
        Calculate the false positive probability.
        Parameters:
        bits - the number of bits (eg. 10_000_000)
        fpp - the false positive probability (eg. 0.01)
        Returns:
        the maximum number of entries to be added
      • add

        public void add​(long hash)
        Add an entry.
        Parameters:
        hash - the hash value (need to be a high quality hash code, with all bits having high entropy)
      • mayContain

        public boolean mayContain​(long hash)
        Whether the entry may be in the set.
        Parameters:
        hash - the hash value (need to be a high quality hash code, with all bits having high entropy)
        Returns:
        true if the entry was added, or, with a certain false positive probability, even if it was not added
      • getBitCount

        public long getBitCount()
        Get the number of bits needed for the array.
        Returns:
        the number of bits
      • getK

        public int getK()
      • getEstimatedEntryCount

        public long getEstimatedEntryCount()
        Get the estimated entry count (number of distinct items added). This operation is relatively slow, as it loops over all the entries.
        Returns:
        the estimated entry count, or Long.MAX_VALUE if the number can not be estimated.