Architecture

Rate Limiting

Controlling the number of requests a client can make to an API within a given time window to prevent abuse and protect resources.

**Rate limiting** protects your system from being overwhelmed by too many requests. **Algorithms:** - **Token Bucket**: Tokens refill at fixed rate. Request consumes token. Allows bursts up to bucket size. Most popular. - **Leaking Bucket**: Requests queued and processed at fixed rate. Smooth output, strict rate enforcement. - **Fixed Window**: Count requests in fixed time windows (e.g., per minute). Simple but burst at window boundaries. - **Sliding Window Log**: Track timestamp of each request. Accurate but memory-heavy. - **Sliding Window Counter**: Weighted average of current and previous window. Good balance of accuracy and memory. **Implementation:** - Use Redis for distributed rate limiting: INCR + EXPIRE in Lua script for atomicity. - Key format: `rate:user_id:window` or `rate:ip:window` - Return HTTP 429 Too Many Requests with Retry-After header. **Where to apply:** API Gateway, per-service, per-user, per-IP, per-endpoint.

Common Use Cases

Protecting APIs from abuse and DDoS
Enforcing API usage quotas per client/plan
Preventing brute-force login attempts
Controlling resource usage in multi-tenant systems

Advantages

+Protects backend services from overload
+Fair resource distribution across clients
+Prevents abuse and scraping
+Can implement tiered pricing (different limits per plan)

Disadvantages

-Adds complexity and slight latency
-Distributed rate limiting requires shared state (Redis)
-Can affect legitimate users during traffic spikes
-Fixed windows have burst issues at boundaries

Related Concepts

API Gateway Cachingdistributed systems

← PreviousBack-of-the-Envelope Estimation Next →WebSocket & Real-Time Communication