Data Structures

Bloom Filter

A space-efficient probabilistic data structure that tests whether an element is a member of a set with possible false positives but no false negatives.

**How it works:** - A bit array of m bits, initialized to 0 - k hash functions, each mapping to a position in the array - Insert: hash element k times, set those k bits to 1 - Lookup: hash element k times, check if ALL k bits are 1 - All 1s → "probably in set" (might be false positive) - Any 0 → "definitely NOT in set" (no false negatives) **Parameters:** - m = number of bits, n = expected elements, k = hash functions - Optimal k = (m/n) × ln(2) - For 1% false positive rate: ~10 bits per element - For 0.1% false positive rate: ~15 bits per element **Counting Bloom Filter:** Uses counters instead of bits, supports deletion. **Cuckoo Filter:** Better for delete-heavy workloads, similar space efficiency. **Implementation:** Redis has built-in Bloom filter support (RedisBloom module).

Common Use Cases

URL deduplication in web crawlers
Avoiding unnecessary disk reads in databases (LSM-tree)
Spam email detection
Checking if a username is already taken

Advantages

+Extremely space-efficient compared to hash sets
+O(k) lookup time (constant, very fast)
+No false negatives - 'definitely not in set' is always correct
+Works well at massive scale (billions of elements)

Disadvantages

-False positives are possible (trade-off with space)
-Cannot delete elements from standard Bloom filter
-Cannot enumerate stored elements
-Requires careful tuning of parameters (m, k)

Related Concepts

Caching Consistent Hashing Database Sharding

← PreviousDistributed Transactions Next →Event-Driven Architecture