When One Database Isn't Enough
When your database exceeds the capacity of a single server (e.g., 10TB data, 50k writes/sec), you have two choices:
- Vertical Scaling: Buy a bigger machine (more RAM, CPU). Simple but hits a hardware ceiling.
- Horizontal Scaling (Sharding): Split data across smaller machines. Complex but theoretically infinite.
What is Sharding?
Sharding is horizontal partitioning for databases. You split a large table (e.g., Users with 100 million rows) into smaller chunks called Shards. Each shard lives on a separate database server and holds a subset of the data.
- Shard 1: Users ID 0-10,000,000
- Shard 2: Users ID 10,000,001-20,000,000
- Shard 3: Users ID 20,000,001-30,000,000
Each shard is a fully functional database with its own indexes, query engine, and storage.
Sharding Strategies
1. Key-Based (Hash) Sharding
Formula: ShardID = hash(UserID) % NumShards.
- Pros: Even distribution of data. No hotspots if the hash function is good.
- Cons: Adding servers requires re-sharding (moving data). This is solved by Consistent Hashing.
import hashlib
def get_shard(user_id: int, num_shards: int) -> int:
"""Determine which shard a user belongs to."""
key = str(user_id).encode()
hash_value = int(hashlib.sha256(key).hexdigest(), 16)
return hash_value % num_shards
# User 12345 -> Shard 2 (out of 4 shards)
print(f"Shard: {get_shard(12345, 4)}")
2. Range-Based Sharding
Formula: if ID < 10,000,000 -> Shard 1. if ID < 20,000,000 -> Shard 2.
- Pros: Excellent for range queries (
SELECT * WHERE ID BETWEEN 100 AND 200hits only one shard). - Cons: Hotspots. If you shard by
creation_date, all new users go to the latest shard, causing it to become a bottleneck while older shards sit idle.
3. Directory-Based Sharding
A lookup table (service) tells you where data is.
UserID 123->Shard AUserID 999->Shard B- Pros: Extremely flexible. Can move individual users between shards for rebalancing.
- Cons: The lookup table becomes a Single Point of Failure (SPOF) and a performance bottleneck. Must be cached and replicated.
4. Geo-Based Sharding
Data is partitioned by geographic region.
US users->us-east-1 shardEU users->eu-west-1 shard- Pros: Low latency for users (data is close to them). Helps with GDPR data residency requirements.
- Cons: Uneven distribution if one region has more users. Cross-region queries are expensive.
Challenges of Sharding
1. Cross-Shard Joins
You can't easily join tables across different servers. If User #123 is on Shard A and their Orders are on Shard B, a JOIN users, orders query can't run locally.
Solutions:
- Denormalization: Duplicate data so each shard has everything it needs. Store a copy of user info alongside each order.
- Application-level joins: Fetch User from Shard A, then fetch Orders from Shard B, then merge in application code.
- Shared dimension tables: Keep small, frequently-joined tables (countries, categories) replicated on every shard.
2. Distributed Transactions (ACID)
Committing a transaction across Shard A and Shard B is hard. Two-Phase Commit (2PC) is slow and reduces availability.
Solutions:
- Avoid cross-shard transactions: Design your data model so that related data co-locates on the same shard. For example, store a User and all their Orders on the same shard by sharding on
user_id. - Saga Pattern: Replace a distributed transaction with a sequence of local transactions, each with a compensating action for rollback.
- Use a distributed database: CockroachDB and Google Spanner support distributed ACID transactions natively.
3. Unique IDs
Auto-increment IDs don't work across multiple servers ā two shards could both assign ID 1001.
Solutions:
- Snowflake IDs:
Timestamp + MachineID + SequenceNumberproduces globally unique, sortable IDs. - UUID v4: Random 128-bit IDs. No coordination needed, but not sortable.
- Custom ID ranges: Shard 1 uses IDs starting at 1,000,000, Shard 2 at 2,000,000.
- See Unique ID Generation for deep dive.
4. Rebalancing
When one shard grows disproportionately (hotspot) or you need to add capacity, you must move data between shards.
Approaches:
- Pre-split: Create more shards than servers initially. Assign multiple shards per server. To scale, move entire shards to new servers without splitting.
- Consistent Hashing: Minimizes data movement when adding/removing nodes.
- Online Schema Migration: Tools like Vitess handle shard splitting in the background without downtime.
Real-World Examples
Instagram shards its PostgreSQL databases by user_id. Each logical shard maps to a physical database. They use a custom ID scheme where the shard ID is embedded in the photo ID, enabling routing without a lookup table.
Vitess (YouTube)
Vitess is an open-source sharding middleware built by YouTube. It sits between your application and MySQL, transparently routing queries to the correct shard. It handles shard splitting, schema migrations, and query routing.
CockroachDB
CockroachDB automatically shards data across nodes using range-based partitioning. It handles shard splitting, rebalancing, and distributed transactions transparently ā no manual sharding logic needed.
Interview Tips š”
- Delay sharding: "I'd start with read replicas and caching before sharding. Sharding is a last resort due to its operational complexity."
- Choose the right shard key: "The shard key should have high cardinality, even distribution, and align with query patterns. Sharding by
user_idis common because most queries are user-scoped." - Address joins proactively: "I'd co-locate related data on the same shard. For cross-shard queries, I'd use denormalization or materialized views."
- Mention re-sharding: "I'd start with more logical shards than physical servers, so scaling means moving shards ā not splitting them."
Comparison Summary
| Strategy | Distribution | Range Queries | Hotspots | Flexibility |
|---|---|---|---|---|
| Hash | Excellent | Poor | Low | Low (resharding needed) |
| Range | Variable | Excellent | High | Medium |
| Directory | Configurable | Depends | Low | High (per-key routing) |
| Geo | Regional | Regional only | Variable | High (data locality) |
Related Concepts
- Consistent Hashing ā Minimizes data movement when adding nodes
- CAP Theorem ā Trade-offs introduced by distributing data
- Database Indexing ā Indexes within each shard
- Database Replication ā Each shard should be replicated for durability
- Unique ID Generation ā Generating unique IDs across shards
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Consistent Hashing
How to add/remove servers without moving every single key. The Ring, Virtual Nodes, and real-world usage in Cassandra, DynamoDB, and Discord.
Database Indexing
Deep dive into database indexing internals. How B-Trees work, Clustered vs Non-Clustered indexes, Composite Index best practices, and covering indexes.
Distributed Transactions
Managing data consistency across multiple services where a single operation must either fully succeed or fully fail. Deep dive into SAGA, 2PC, and modern patterns.