Scalability Playbook

10 essential scaling strategies with diagrams, trade-offs, and interview talking points. Know when and how to apply each.

Quick Decision Guide

QPS > 10K→ Horizontal scaling + Load balancer

R:W > 10:1→ Caching (Redis) + Read replicas

Data > 1TB→ Database sharding

Latency > 200ms→ Async processing + Message queues

Global users→ CDN + Multi-region deployment

Variable load→ Auto-scaling + Connection pooling

  Client
    │
    ▼
┌──────────┐
│    LB    │
└──┬──┬──┬─┘
   │  │  │
   ▼  ▼  ▼
  S1  S2  S3   ← Stateless servers
   │  │  │
   └──┴──┘
      │
      ▼
   Database

How It Works

Load balancer distributes requests across N identical servers
Servers must be stateless (no local session data)
Auto-scaling adjusts N based on CPU/memory/QPS metrics
New servers register with the load balancer automatically
Health checks remove unhealthy servers from the pool

When to Use

✓ Request rate exceeds what a single server can handle
✓ You need high availability (no single point of failure)
✓ Traffic is unpredictable and needs elastic scaling
✓ During capacity planning when vertical scaling is maxed out

Pitfalls

! Session data can't live on individual servers
! Need distributed caching for shared state
! Database becomes the bottleneck if not scaled too
! More servers = more complexity in deployment and monitoring

Real World

→ Netflix: thousands of EC2 instances behind ELBs
→ Uber: auto-scales based on real-time demand
→ All major cloud-native applications use this

Say This in Interview

"We'll add stateless app servers behind a load balancer and auto-scale based on CPU utilization, targeting 60-70%."