Distributed Systems

Vector Clocks

A mechanism for tracking causality in distributed systems. Each node maintains a vector of logical timestamps to determine the ordering of events.

**Vector clocks** solve the problem of ordering events in distributed systems where there's no global clock. **How it works:** - Each node maintains a vector [N1: t1, N2: t2, N3: t3] with one counter per node - When a node does work, it increments its own counter - When sending a message, attach the full vector - When receiving, take element-wise max and increment own counter **Comparing events:** - Event A **happened before** B if all of A's counters ≤ B's counters - Events are **concurrent** if neither happened before the other - Concurrent events = potential conflict → application must resolve **Example (3 nodes: A, B, C):** ``` A writes x=1 → [A:1, B:0, C:0] B writes x=2 → [A:0, B:1, C:0] (concurrent with A!) A sends to C → C: [A:1, B:0, C:1] ``` **Limitations:** Vector size grows with number of nodes. Use dotted version vectors or interval tree clocks for large clusters.

Common Use Cases

Conflict detection in multi-master replication (DynamoDB, Riak)
Determining causal ordering of events
Optimistic concurrency control in distributed systems
Version tracking in collaborative editing

Advantages

+Accurately detects causality and concurrency
+No need for synchronized clocks
+Enables conflict detection without data loss
+Works across network partitions

Disadvantages

-Vector size grows linearly with number of nodes
-Storage overhead per data item
-Application must handle conflict resolution
-Complex to implement correctly

Related Concepts

CAP Theorem Distributed Transactions Database Replication