← All Concepts
Data Structures

Merkle Trees

A tree structure where every leaf is a hash of data and every non-leaf is a hash of its children. Enables efficient data integrity verification and synchronization.

**Merkle trees** (hash trees) enable efficient verification that large datasets are consistent between distributed nodes. **Structure:** ``` Root Hash / \ H(AB) H(CD) / \ / \ H(A) H(B) H(C) H(D) | | | | Data Data Data Data ``` **How synchronization works:** 1. Two nodes compare root hashes 2. If roots match → data is identical (done!) 3. If roots differ → compare children recursively 4. Only descend into branches that differ 5. Result: find exactly which data blocks differ in O(log N) **Key insight:** Instead of comparing all N data blocks (O(N)), Merkle trees let you find differences in O(log N) comparisons. For 1 billion blocks, that's ~30 comparisons vs 1 billion. **Used by**: Git (commit trees), Bitcoin (transaction verification), Cassandra (anti-entropy repair), DynamoDB, IPFS.

Common Use Cases

  • Anti-entropy repair in distributed databases (Cassandra, DynamoDB)
  • Git version control (commit hashing)
  • Blockchain transaction verification
  • File synchronization (detect which blocks changed)

Advantages

  • +O(log N) verification of data consistency
  • +Efficient incremental synchronization
  • +Tamper detection — any change propagates to root
  • +Reduces bandwidth for data sync

Disadvantages

  • -Tree must be rebuilt when data changes
  • -Memory overhead for storing intermediate hashes
  • -Tree structure choice affects performance
  • -Not suitable for streaming data