What is Scaling?
Scaling is the ability of a system to handle increased load by adding resources. It's one of the most fundamental challenges in system design.
The question: Your app has 1,000 users today. How do you handle 1,000,000 users tomorrow?
The Growth Problem
Day 1: 1,000 users → Single server handles it fine Day 30: 10,000 users → Server at 80% CPU Day 90: 100,000 users → Server crashes Day 180: 1M users → ???
Two fundamental approaches: Scale Up or Scale Out
Vertical Scaling (Scale Up)
Add more resources to existing server: Bigger CPU, more RAM, faster disk.
Before: 4 CPU cores, 16GB RAM After: 32 CPU cores, 256GB RAM Same server, more powerful
Example
AWS EC2 instance upgrade: t3.medium → $0.0416/hour (2 vCPU, 4GB RAM) c5.24xlarge → $4.08/hour (96 vCPU, 192GB RAM) 100x more powerful, 100x more expensive
Pros & Cons
✅ Pros:
- Simple (no code changes)
- No distributed system complexity
- Consistent performance
- Easy to manage (one machine)
❌ Cons:
- Physical limits (can't buy infinite CPU)
- Single point of failure
- Downtime during upgrades
- Expensive at high end
- Diminishing returns (2x cost ≠ 2x performance)
When to Use
Good for: - Databases (PostgreSQL, MySQL) - Legacy monoliths - Early stage (< 100k users) - CPU-intensive tasks (video encoding) Bad for: - Web-scale applications - High availability requirements - Cost-sensitive projects
Horizontal Scaling (Scale Out)
Add more servers instead of making one server bigger.
Before: 1 server After: 100 servers Each server handles 1% of traffic
Architecture
graph TB
Users[Users] --> LB[Load Balancer]
LB --> S1[Server 1]
LB --> S2[Server 2]
LB --> S3[Server 3]
LB --> SN[Server N...]
S1 --> Cache[(Redis Cache)]
S2 --> Cache
S3 --> Cache
SN --> Cache
Cache --> DB[(Database)]
Implementation
// Stateless application server (can scale horizontally)
const express = require('express');
const redis = require('redis');
const app = express();
const cache = redis.createClient({ host: 'redis.cache.com' });
app.get('/user/:id', async (req, res) => {
const userId = req.params.id;
// Check cache first
const cached = await cache.get(`user:${userId}`);
if (cached) {
return res.json(JSON.parse(cached));
}
// Fetch from database
const user = await db.users.findById(userId);
// Store in cache
await cache.set(`user:${userId}`, JSON.stringify(user), 'EX', 3600);
res.json(user);
});
app.listen(process.env.PORT || 3000);
Key: Servers are stateless - no session data stored locally.
Pros & Cons
✅ Pros:
- No limits (add infinite servers)
- High availability (one server dies, others continue)
- Cost-effective (use commodity hardware)
- Gradual scaling (add servers as needed)
❌ Cons:
- Complexity (distributed systems)
- Stateless requirement (sessions in Redis/DB)
- Data consistency challenges
- Network overhead
When to Use
Good for: - Web applications - API servers - Microservices - High availability needs - Web-scale (> 1M users) Bad for: - Stateful applications - Single-threaded tasks - Tightly coupled systems
The Scaling Roadmap (1 to 100M Users)
Stage 1: Single Server (0-1,000 users)
┌─────────────────┐ │ Web + DB │ ← Everything on one box │ One Server │ └─────────────────┘
Cost: $50/month
Complexity: Very Low
Stage 2: Separate Database (1k-10k users)
┌─────────────┐
│ Web Server │
└──────┬──────┘
│
┌──────▼──────┐
│ Database │
└─────────────┘
Why: Database and web server compete for resources
Cost: $200/month
Stage 3: Load Balancer + Multiple Servers (10k-100k)
┌──────────────┐
Users ──▶ │Load Balancer │
└──────┬───────┘
│
┌────────────┼────────────┐
│ │ │
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│Server 1│ │Server 2│ │Server 3│
└────────┘ └────────┘ └────────┘
│ │ │
└────────────┼────────────┘
│
┌──────▼──────┐
│ Database │
└─────────────┘
Why: Single server can't handle traffic
Cost: $1,000/month
Stage 4: Database Replication (100k-1M)
Web Servers
│
├──────────────────┐
│ (writes) │ (reads)
▼ ▼
┌─────────┐ ┌─────────┐
│ Master │─────▶│ Replica │
│ DB │ │ DB │
└─────────┘ └─────────┘
│
┌────▼────┐
│Replica 2│
└─────────┘
Why: Database becomes bottleneck
Pattern: Write to master, read from replicas
Cost: $5,000/month
Stage 5: Caching Layer (1M-10M)
Web Servers
│
▼
┌─────────┐
│ Redis │ ← Cache hot data
│ Cache │
└─────────┘
│
▼
Databases
Why: 80% of requests hit same 20% of data
Impact: 10x reduction in database load
Cost: $10,000/month
Stage 6: CDN + Asset Optimization (10M+)
Users │ ▼ ┌─────────┐ │ CDN │ ← Static assets (images, CSS, JS) └─────────┘ │ ▼ Load Balancer → Web Servers
Why: Reduce latency globally
Impact: 95% of bandwidth served from CDN
Cost: $20,000/month
Stage 7: Database Sharding (10M-100M)
User ID % 4: 0 → Shard 0 1 → Shard 1 2 → Shard 2 3 → Shard 3
Why: Single database can't handle writes
Complexity: High
Cost: $50,000/month
Database Scaling Strategies
1. Vertical Scaling (Database)
-- Upgrade database instance RDS db.t3.small → db.r5.24xlarge Limits: - Max 768GB RAM on AWS RDS - Max 128 vCPUs - Single point of failure
2. Read Replicas
Application
│
├─── Writes ──→ Master DB
│
└─── Reads ──→ Replica 1
Replica 2
Replica 3
Implementation:
const { Pool } = require('pg');
const masterPool = new Pool({ host: 'master.db.com' });
const replicaPool = new Pool({ host: 'replica.db.com' });
// Writes go to master
async function createUser(name, email) {
return masterPool.query(
'INSERT INTO users (name, email) VALUES ($1, $2)',
[name, email]
);
}
// Reads go to replica
async function getUser(id) {
return replicaPool.query(
'SELECT * FROM users WHERE id = $1',
[id]
);
}
Pros:
- Easy to implement
- Scales read traffic infinitely
- No application changes (mostly)
Cons:
- Replication lag (eventual consistency)
- Writes still bottlenecked
- Doesn't help with storage
3. Sharding (Horizontal Partitioning)
Split data across multiple databases by key.
// Hash-based sharding
function getUserShard(userId) {
const shardCount = 4;
return userId % shardCount;
// User 1 → Shard 1
// User 5 → Shard 1
// User 7 → Shard 3
}
async function getUser(userId) {
const shard = getUserShard(userId);
const db = databases[shard];
return db.query('SELECT * FROM users WHERE id = $1', [userId]);
}
Sharding strategies:
1. Hash-based: shard = user_id % num_shards 2. Range-based: users 1-1M → Shard 0 users 1M-2M → Shard 1 users 2M-3M → Shard 2 3. Geographic: US users → US Shard EU users → EU Shard Asia users → Asia Shard
Challenges:
-- Cross-shard queries are expensive -- User 1 (Shard 0) follows User 5 (Shard 1) SELECT posts.* FROM posts WHERE user_id IN ( SELECT followed_id FROM follows WHERE follower_id = 1 ) -- Requires querying multiple shards!
Pros:
- Scales writes and storage
- Each shard is independent
- Can use different hardware per shard
Cons:
- Complex application logic
- Difficult to rebalance
- Cross-shard queries expensive
- No transactions across shards
Caching Strategies
Cache Aside (Lazy Loading)
async function getUser(id) {
// 1. Try cache
const cached = await cache.get(`user:${id}`);
if (cached) return JSON.parse(cached);
// 2. Cache miss - fetch from DB
const user = await db.users.findById(id);
// 3. Store in cache
await cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
return user;
}
Write-Through Cache
async function updateUser(id, data) {
// 1. Update database
const user = await db.users.update(id, data);
// 2. Update cache immediately
await cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
return user;
}
Cache Eviction Policies
LRU (Least Recently Used): - Evict least recently accessed items - Redis default with maxmemory-policy LFU (Least Frequently Used): - Evict least frequently accessed items - Better for hot data TTL (Time To Live): - Expire after fixed time - Good for time-sensitive data
Real-World Examples
Instagram's Scaling Journey
2010: Single server 2011: 3 web servers + 1 DB 2012: 25 servers + sharded Cassandra 2013: 100+ servers + Facebook infrastructure 2023: 10,000+ servers serving 2B users
Key decisions:
- PostgreSQL → Cassandra (sharding)
- Django → async Python
- AWS → Facebook datacenters
Twitter's Scaling
2007: Rails monolith 2008: Message queue (Kestrel) 2010: Moved timeline to Scala/JVM 2012: Manhattan (distributed database) 2023: 300K+ tweets/second
Architecture:
- Fanout on write (pre-compute timelines)
- Redis for timeline cache
- Manhattan for distributed storage
Netflix
Users: 230M+ globally Traffic: 15% of global internet bandwidth Architecture: 1,000+ microservices Scaling strategy: - AWS auto-scaling groups - Cassandra (multi-region) - EVCache (memcached) - Zuul (API gateway)
Interview Tips 💡
When discussing scaling in system design interviews:
- Start simple: "Initially, single server with database is fine for < 10k users..."
- Identify bottleneck: "As traffic grows, database becomes bottleneck. Add read replicas..."
- Add layers: "Cache layer with Redis reduces database load by 90%..."
- Estimate numbers: "1M users × 10 requests/day = 115 requests/second. One server handles ~1000 RPS, so need load balancer + 2-3 servers..."
- Discuss trade-offs: "Sharding increases complexity but necessary for > 10M users..."
- Real examples: "Instagram uses similar pattern: PostgreSQL → Cassandra sharding..."
Related Concepts
- Load Balancing — Distribute traffic across servers
- Caching — Reduce database load
- Database Sharding — Horizontal database scaling
- CDN — Global content delivery
- Microservices — Service-based scaling
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Vertical Scaling (Scaling Up)
A deep dive into upgrading server hardware to handle increased load, including its limits and best use cases.
API Gateway Pattern
The single entry point for microservices. Implementing rate limiting, authentication, and protocol translation/aggregation.
Caching Strategies
A breakdown of where to place your cache and how to keep it in sync with your database.