Data Structures

Merkle Trees

A tree structure where every leaf is a hash of data and every non-leaf is a hash of its children. Enables efficient data integrity verification and synchronization.

**Merkle trees** (hash trees) enable efficient verification that large datasets are consistent between distributed nodes. **Structure:** ``` Root Hash / \ H(AB) H(CD) / \ / \ H(A) H(B) H(C) H(D) | | | | Data Data Data Data ``` **How synchronization works:** 1. Two nodes compare root hashes 2. If roots match → data is identical (done!) 3. If roots differ → compare children recursively 4. Only descend into branches that differ 5. Result: find exactly which data blocks differ in O(log N) **Key insight:** Instead of comparing all N data blocks (O(N)), Merkle trees let you find differences in O(log N) comparisons. For 1 billion blocks, that's ~30 comparisons vs 1 billion. **Used by**: Git (commit trees), Bitcoin (transaction verification), Cassandra (anti-entropy repair), DynamoDB, IPFS.

Common Use Cases

Anti-entropy repair in distributed databases (Cassandra, DynamoDB)
Git version control (commit hashing)
Blockchain transaction verification
File synchronization (detect which blocks changed)

Advantages

+O(log N) verification of data consistency
+Efficient incremental synchronization
+Tamper detection — any change propagates to root
+Reduces bandwidth for data sync

Disadvantages

-Tree must be rebuilt when data changes
-Memory overhead for storing intermediate hashes
-Tree structure choice affects performance
-Not suitable for streaming data

Related Concepts

Consistent Hashing Database Replication Bloom Filter

← PreviousRaft Consensus Algorithm Next →Two-Phase Commit (2PC)