How to use Bloom filter in the Guava library

Overview

Bloom filter is a highly space-efficient probabilistic data structure designed to check the set membership.

The Guava project is a collection of several of Google’s core libraries for string processing, caching, etc.

Here, we use the Bloom filter implementation provided in the Guava library.

Guava dependency

Guava can be included in the Java Maven or Gradle projects. Include the following in the pom.xml of the Maven project.

<dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>30.1.1-jre</version>
</dependency>

Refer to Maven Repository for different versions of Guava.

Funnel class in Guava

A Funnel describes how to decompose a particular object type into primitive field values. This is needed because Bloom filters operate on bytes.

You can create a funnel for custom objects as follows. Consider a Student class like the one below.

class Student{
        final String name;
        final int id;

        Student(String name, int id) {
            this.name = name;
            this.id = id;
        }
    }

We can define a funnel for the Student class as follows.

Funnel<Student> studentFunnel = new Funnel<Student>() {
            @Override
            public void funnel(Student from, PrimitiveSink into) {
                into.putString(from.name, Charsets.UTF_8)
                .putInt(from.id);
            }
        };

In the code above, we define a PrimitiveSink which can receive a stream of primitive values.

Create a Bloom filter in Guava

In order to define a Bloom filter in Guava, we need the following.

  1. A funnel object that can convert an object to bytes.
  2. Size of the Bloom filter.
  3. Expected false positive probability.

We use the static create() method of the BloomFilter class. The method signature is as follows.

public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions, double fpp)

Parameters

  1. Funnel funnel - funnel object.
  2. int expectedInsertions - the number of expected insertions to the constructed Bloom filter.
  3. double fpp – the desired false positive probability. The value must be positive and less than 1.0.

Insert elements to bloom filter

We use the .put() method to add the elements to the Bloom filter.

Check for the existence of an element

We use the .mightContain() method to check if an element is already inserted into the Bloom filter. The method returns true if the element might have been put in this Bloom filter, and it returns false if this is definitely not the case.

Example

In the following code, we define a Bloom filter with a string funnel, with 100 as the capacity of the Bloom filter and 0.01 as the expected false positive probability.

Next, certain elements are inserted into the filter.

Then, we check for the membership of the inserted elements as well as for new elements.

import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;

public class BFilter {

    public static void main(String[] args) {
        BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_16), 100, 0.01);
        bloomFilter.put("hello");
        bloomFilter.put("hi");
        String testString = "india";
        boolean checkStatus = bloomFilter.mightContain(testString);
        System.out.println(testString + (checkStatus?" seen before":" not seen before"));
        testString = "hello";
        checkStatus = bloomFilter.mightContain(testString);
        System.out.println(testString + (checkStatus?" seen before":" not seen before"));
    }
}

Free Resources