Bloom filter is a highly space-efficient probabilistic data structure designed to check the set membership.
The Guava project is a collection of several of Google’s core libraries for string processing, caching, etc.
Here, we use the Bloom filter implementation provided in the Guava library.
Guava can be included in the Java Maven or Gradle projects. Include the following in the pom.xml
of the Maven project.
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>30.1.1-jre</version>
</dependency>
Refer to Maven Repository for different versions of
Guava
.
Funnel
class in GuavaA Funnel
describes how to decompose a particular object type into primitive field values. This is needed because Bloom filters operate on bytes.
You can create a funnel for custom objects as follows.
Consider a Student
class like the one below.
class Student{
final String name;
final int id;
Student(String name, int id) {
this.name = name;
this.id = id;
}
}
We can define a funnel for the Student
class as follows.
Funnel<Student> studentFunnel = new Funnel<Student>() {
@Override
public void funnel(Student from, PrimitiveSink into) {
into.putString(from.name, Charsets.UTF_8)
.putInt(from.id);
}
};
In the code above, we define a PrimitiveSink
which can receive a stream of primitive values.
In order to define a Bloom filter in Guava, we need the following.
We use the static create()
method of the BloomFilter
class.
The method signature is as follows.
public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions, double fpp)
Funnel funnel
- funnel object.int expectedInsertions
- the number of expected insertions to the constructed Bloom filter.double fpp
– the desired false positive probability. The value must be positive and less than 1.0
.We use the .put()
method to add the elements to the Bloom filter.
We use the .mightContain()
method to check if an element is already inserted into the Bloom filter. The method returns true
if the element might have been put in this Bloom filter, and it returns false
if this is definitely not the case.
In the following code, we define a Bloom filter with a string funnel, with 100
as the capacity of the Bloom filter and 0.01
as the expected false positive probability.
Next, certain elements are inserted into the filter.
Then, we check for the membership of the inserted elements as well as for new elements.
import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class BFilter {
public static void main(String[] args) {
BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_16), 100, 0.01);
bloomFilter.put("hello");
bloomFilter.put("hi");
String testString = "india";
boolean checkStatus = bloomFilter.mightContain(testString);
System.out.println(testString + (checkStatus?" seen before":" not seen before"));
testString = "hello";
checkStatus = bloomFilter.mightContain(testString);
System.out.println(testString + (checkStatus?" seen before":" not seen before"));
}
}