← All Concepts
Distributed Systems

Leader Election

A process by which distributed nodes choose one node to act as coordinator. Ensures exactly one leader at any time for tasks like write coordination or job scheduling.

**Leader election** selects a single node to coordinate actions in a distributed system. **Why we need it:** - Single writer for consistency (database primary) - Job scheduling (only one node runs a cron job) - Coordination of distributed operations **Common approaches:** 1. **Consensus-based (Raft/Paxos)**: Strongest guarantees. Used by etcd, ZooKeeper. 2. **Lease-based**: Leader holds a time-limited lock (lease). Must renew before expiry. If leader dies, lease expires and new election occurs. - Redis SETNX with TTL - DynamoDB conditional writes - etcd lease API 3. **Bully algorithm**: Node with highest ID wins. Simple but chatty. **Fencing tokens:** After election, the leader gets a monotonically increasing token. This prevents "zombie leaders" — a slow leader that thinks it's still leading after a new leader was elected. **Split-brain prevention:** - Require majority quorum for election - Use fencing tokens to invalidate stale leaders - Short lease times with health checks

Common Use Cases

  • Database primary selection (PostgreSQL, MySQL)
  • Distributed job scheduling (run cron on exactly one node)
  • Partition assignment in Kafka consumer groups
  • Coordination in distributed processing frameworks

Advantages

  • +Ensures single coordinator for consistency
  • +Handles leader failure with automatic re-election
  • +Well-understood patterns and tools
  • +Fencing tokens prevent split-brain issues

Disadvantages

  • -Leader is a potential bottleneck
  • -Election process causes brief unavailability
  • -Network partitions can cause split-brain without proper fencing
  • -Complexity of implementing correctly