Bloom Filters Explained
Introduction Data structures such as HashSet can be used in a small data set to test if an item is a member of the data set. However, the operation to test if an item is a member of a large dataset can be expensive. The time complexity and space complexity can be linear in the worst case. Probabilistic data structures offer constant time complexity and constant space complexity at the expense of providing an answer that is non-deterministic. Requirements Design a data structure with the following characteristics: constant time complexity to test membership a small amount of memory to test membership insert and query operations should be parallelizable test membership can yield approximate results What is a bloom filter? A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an item is a member of a set. The bloom filter will always say yes if an item is a set member. However, the bloom filter might still say yes although an item is not ...