← All Concepts
Data Structures
Merkle Trees
A tree structure where every leaf is a hash of data and every non-leaf is a hash of its children. Enables efficient data integrity verification and synchronization.
**Merkle trees** (hash trees) enable efficient verification that large datasets are consistent between distributed nodes.
**Structure:**
```
Root Hash
/ \
H(AB) H(CD)
/ \ / \
H(A) H(B) H(C) H(D)
| | | |
Data Data Data Data
```
**How synchronization works:**
1. Two nodes compare root hashes
2. If roots match → data is identical (done!)
3. If roots differ → compare children recursively
4. Only descend into branches that differ
5. Result: find exactly which data blocks differ in O(log N)
**Key insight:** Instead of comparing all N data blocks (O(N)), Merkle trees let you find differences in O(log N) comparisons. For 1 billion blocks, that's ~30 comparisons vs 1 billion.
**Used by**: Git (commit trees), Bitcoin (transaction verification), Cassandra (anti-entropy repair), DynamoDB, IPFS.
Common Use Cases
- Anti-entropy repair in distributed databases (Cassandra, DynamoDB)
- Git version control (commit hashing)
- Blockchain transaction verification
- File synchronization (detect which blocks changed)
Advantages
- +O(log N) verification of data consistency
- +Efficient incremental synchronization
- +Tamper detection — any change propagates to root
- +Reduces bandwidth for data sync
Disadvantages
- -Tree must be rebuilt when data changes
- -Memory overhead for storing intermediate hashes
- -Tree structure choice affects performance
- -Not suitable for streaming data