← All Concepts
Databases

Database Sharding

Horizontally partitions data across multiple database instances to handle large datasets and high throughput.

**Sharding** splits a database into smaller pieces (shards), each stored on a separate server. **Sharding strategies:** - **Range-based**: Shard by value ranges (A-M on shard 1, N-Z on shard 2). Can lead to hotspots. - **Hash-based**: Hash the shard key, mod by number of shards. Even distribution but range queries are hard. - **Directory-based**: Lookup table maps keys to shards. Flexible but adds lookup overhead. - **Geographic**: Shard by user region. Good for locality but uneven sizes. **Choosing a shard key:** - Should distribute data evenly - Should minimize cross-shard queries - Should align with access patterns (most queries hit one shard) **Challenges:** - Cross-shard joins (expensive or impossible) - Rebalancing when adding shards - Distributed transactions - Auto-increment IDs need a distributed ID generator

Common Use Cases

  • User data partitioned by user_id
  • Multi-tenant SaaS (shard by tenant)
  • Time-series data (shard by time range)
  • Geographic data (shard by region)

Advantages

  • +Handles datasets larger than single server capacity
  • +Increases read/write throughput linearly
  • +Reduces index sizes per shard (faster queries)
  • +Isolates failures to individual shards

Disadvantages

  • -Cross-shard queries are expensive
  • -Application complexity increases significantly
  • -Rebalancing shards is operationally difficult
  • -Distributed transactions add latency and complexity

Related Concepts