← All Concepts
Data Structures
Bloom Filter
A space-efficient probabilistic data structure that tests whether an element is a member of a set with possible false positives but no false negatives.
**How it works:**
- A bit array of m bits, initialized to 0
- k hash functions, each mapping to a position in the array
- Insert: hash element k times, set those k bits to 1
- Lookup: hash element k times, check if ALL k bits are 1
- All 1s → "probably in set" (might be false positive)
- Any 0 → "definitely NOT in set" (no false negatives)
**Parameters:**
- m = number of bits, n = expected elements, k = hash functions
- Optimal k = (m/n) × ln(2)
- For 1% false positive rate: ~10 bits per element
- For 0.1% false positive rate: ~15 bits per element
**Counting Bloom Filter:** Uses counters instead of bits, supports deletion.
**Cuckoo Filter:** Better for delete-heavy workloads, similar space efficiency.
**Implementation:** Redis has built-in Bloom filter support (RedisBloom module).
Common Use Cases
- URL deduplication in web crawlers
- Avoiding unnecessary disk reads in databases (LSM-tree)
- Spam email detection
- Checking if a username is already taken
Advantages
- +Extremely space-efficient compared to hash sets
- +O(k) lookup time (constant, very fast)
- +No false negatives - 'definitely not in set' is always correct
- +Works well at massive scale (billions of elements)
Disadvantages
- -False positives are possible (trade-off with space)
- -Cannot delete elements from standard Bloom filter
- -Cannot enumerate stored elements
- -Requires careful tuning of parameters (m, k)