Bloom Filters
A space-efficient probabilistic set that answers membership with no false negatives — but the occasional false positive.
A Bloom filter is a compact, probabilistic way to test set membership. It tells you an element is possibly in the set or definitely not in the set, using a tiny fraction of the memory a real set would need. The catch: it can report a false positive, but it will never report a false negative.
Add a few words below, then query for ones you added and ones you didn’t. Watch a
query fail the moment it finds a 0 bit.
Bits and hashes
A Bloom filter is just two things:
- A bit array of bits, all starting at
0. - A set of independent hash functions, each mapping an element to a bit position in .
It stores no keys — only bits — which is why it is so small.
Add and query
add(x) runs all hash functions on x and sets those bits to 1.
Bits may already be set from earlier elements; that’s fine, they just stay 1.
query(x) hashes x the same way and checks those bits. If all are
1, the element is possibly present. If any is 0, the element is
definitely absent.
add(x):
for i in 1..k:
bits[ h_i(x) ] = 1
contains(x):
for i in 1..k:
if bits[ h_i(x) ] == 0:
return DEFINITELY_NOT
return PROBABLY_YES
Why no false negatives, but some false positives
If you added x, every one of its bits was set, and bits are never cleared —
so a later query(x) must see all 1s. That guarantees no false negatives.
A false positive happens when an element you never added just happens to have
all of its bits already set by other elements. The more elements you add,
the more bits flip to 1, and the likelier this collision becomes. For
inserted elements, the false-positive rate is approximately
You tune it by choosing (more bits) and the optimal .
Where they shine
Bloom filters guard expensive lookups: a database checks the filter before touching disk, a cache checks it before a network call, a crawler checks it before re-fetching a URL. A “definitely not” answer skips the costly work entirely; a “probably yes” falls through to the real, authoritative check.
You cannot delete from a standard Bloom filter (clearing a bit could break another element’s membership). A counting Bloom filter replaces each bit with a small counter to support removal.
Further reading
Takeaways
- A Bloom filter is a bit array plus hash functions — tiny, and it stores no keys.
- Querying never yields a false negative, but can yield a false positive whose rate grows as the filter fills.
- Use it as a fast pre-filter in front of an expensive, authoritative lookup.