Distributed Systems

Leader Election

A process by which distributed nodes choose one node to act as coordinator. Ensures exactly one leader at any time for tasks like write coordination or job scheduling.

**Leader election** selects a single node to coordinate actions in a distributed system. **Why we need it:** - Single writer for consistency (database primary) - Job scheduling (only one node runs a cron job) - Coordination of distributed operations **Common approaches:** 1. **Consensus-based (Raft/Paxos)**: Strongest guarantees. Used by etcd, ZooKeeper. 2. **Lease-based**: Leader holds a time-limited lock (lease). Must renew before expiry. If leader dies, lease expires and new election occurs. - Redis SETNX with TTL - DynamoDB conditional writes - etcd lease API 3. **Bully algorithm**: Node with highest ID wins. Simple but chatty. **Fencing tokens:** After election, the leader gets a monotonically increasing token. This prevents "zombie leaders" — a slow leader that thinks it's still leading after a new leader was elected. **Split-brain prevention:** - Require majority quorum for election - Use fencing tokens to invalidate stale leaders - Short lease times with health checks

Common Use Cases

Database primary selection (PostgreSQL, MySQL)
Distributed job scheduling (run cron on exactly one node)
Partition assignment in Kafka consumer groups
Coordination in distributed processing frameworks

Advantages

+Ensures single coordinator for consistency
+Handles leader failure with automatic re-election
+Well-understood patterns and tools
+Fencing tokens prevent split-brain issues

Disadvantages

-Leader is a potential bottleneck
-Election process causes brief unavailability
-Network partitions can cause split-brain without proper fencing
-Complexity of implementing correctly

Related Concepts

Raft Consensus Algorithm Distributed Transactions CAP Theorem

← PreviousData Partitioning (Sharding)Next →Circuit Breaker Pattern