← All Concepts
Databases
Database Sharding
Horizontally partitions data across multiple database instances to handle large datasets and high throughput.
**Sharding** splits a database into smaller pieces (shards), each stored on a separate server.
**Sharding strategies:**
- **Range-based**: Shard by value ranges (A-M on shard 1, N-Z on shard 2). Can lead to hotspots.
- **Hash-based**: Hash the shard key, mod by number of shards. Even distribution but range queries are hard.
- **Directory-based**: Lookup table maps keys to shards. Flexible but adds lookup overhead.
- **Geographic**: Shard by user region. Good for locality but uneven sizes.
**Choosing a shard key:**
- Should distribute data evenly
- Should minimize cross-shard queries
- Should align with access patterns (most queries hit one shard)
**Challenges:**
- Cross-shard joins (expensive or impossible)
- Rebalancing when adding shards
- Distributed transactions
- Auto-increment IDs need a distributed ID generator
Common Use Cases
- User data partitioned by user_id
- Multi-tenant SaaS (shard by tenant)
- Time-series data (shard by time range)
- Geographic data (shard by region)
Advantages
- +Handles datasets larger than single server capacity
- +Increases read/write throughput linearly
- +Reduces index sizes per shard (faster queries)
- +Isolates failures to individual shards
Disadvantages
- -Cross-shard queries are expensive
- -Application complexity increases significantly
- -Rebalancing shards is operationally difficult
- -Distributed transactions add latency and complexity