← All Concepts
Data Structures

Bloom Filter

A space-efficient probabilistic data structure that tests whether an element is a member of a set with possible false positives but no false negatives.

**How it works:** - A bit array of m bits, initialized to 0 - k hash functions, each mapping to a position in the array - Insert: hash element k times, set those k bits to 1 - Lookup: hash element k times, check if ALL k bits are 1 - All 1s → "probably in set" (might be false positive) - Any 0 → "definitely NOT in set" (no false negatives) **Parameters:** - m = number of bits, n = expected elements, k = hash functions - Optimal k = (m/n) × ln(2) - For 1% false positive rate: ~10 bits per element - For 0.1% false positive rate: ~15 bits per element **Counting Bloom Filter:** Uses counters instead of bits, supports deletion. **Cuckoo Filter:** Better for delete-heavy workloads, similar space efficiency. **Implementation:** Redis has built-in Bloom filter support (RedisBloom module).

Common Use Cases

  • URL deduplication in web crawlers
  • Avoiding unnecessary disk reads in databases (LSM-tree)
  • Spam email detection
  • Checking if a username is already taken

Advantages

  • +Extremely space-efficient compared to hash sets
  • +O(k) lookup time (constant, very fast)
  • +No false negatives - 'definitely not in set' is always correct
  • +Works well at massive scale (billions of elements)

Disadvantages

  • -False positives are possible (trade-off with space)
  • -Cannot delete elements from standard Bloom filter
  • -Cannot enumerate stored elements
  • -Requires careful tuning of parameters (m, k)