← All Concepts
Distributed Systems
Leader Election
A process by which distributed nodes choose one node to act as coordinator. Ensures exactly one leader at any time for tasks like write coordination or job scheduling.
**Leader election** selects a single node to coordinate actions in a distributed system.
**Why we need it:**
- Single writer for consistency (database primary)
- Job scheduling (only one node runs a cron job)
- Coordination of distributed operations
**Common approaches:**
1. **Consensus-based (Raft/Paxos)**: Strongest guarantees. Used by etcd, ZooKeeper.
2. **Lease-based**: Leader holds a time-limited lock (lease). Must renew before expiry. If leader dies, lease expires and new election occurs.
- Redis SETNX with TTL
- DynamoDB conditional writes
- etcd lease API
3. **Bully algorithm**: Node with highest ID wins. Simple but chatty.
**Fencing tokens:** After election, the leader gets a monotonically increasing token. This prevents "zombie leaders" — a slow leader that thinks it's still leading after a new leader was elected.
**Split-brain prevention:**
- Require majority quorum for election
- Use fencing tokens to invalidate stale leaders
- Short lease times with health checks
Common Use Cases
- Database primary selection (PostgreSQL, MySQL)
- Distributed job scheduling (run cron on exactly one node)
- Partition assignment in Kafka consumer groups
- Coordination in distributed processing frameworks
Advantages
- +Ensures single coordinator for consistency
- +Handles leader failure with automatic re-election
- +Well-understood patterns and tools
- +Fencing tokens prevent split-brain issues
Disadvantages
- -Leader is a potential bottleneck
- -Election process causes brief unavailability
- -Network partitions can cause split-brain without proper fencing
- -Complexity of implementing correctly