Back to All Concepts
ScalingInfrastructureArchitectureBeginner

Scaling: Overview

Comprehensive guide to scaling systems from 100 to 100 million users, covering vertical scaling (scale up), horizontal scaling (scale out), database sharding, caching strategies, and real-world architecture patterns from Netflix, Instagram, and Twitter.

What is Scaling?

Scaling is the ability of a system to handle increased load by adding resources. It's one of the most fundamental challenges in system design.

The question: Your app has 1,000 users today. How do you handle 1,000,000 users tomorrow?

The Growth Problem

Day 1:  1,000 users    → Single server handles it fine
Day 30: 10,000 users   → Server at 80% CPU
Day 90: 100,000 users  → Server crashes
Day 180: 1M users      → ???
Click to expand code...

Two fundamental approaches: Scale Up or Scale Out


Vertical Scaling (Scale Up)

Add more resources to existing server: Bigger CPU, more RAM, faster disk.

Before:  4 CPU cores, 16GB RAM
After:   32 CPU cores, 256GB RAM

Same server, more powerful
Click to expand code...

Example

AWS EC2 instance upgrade:
t3.medium  → $0.0416/hour (2 vCPU, 4GB RAM)
c5.24xlarge → $4.08/hour (96 vCPU, 192GB RAM)

100x more powerful, 100x more expensive
Click to expand code...

Pros & Cons

Pros:

  • Simple (no code changes)
  • No distributed system complexity
  • Consistent performance
  • Easy to manage (one machine)

Cons:

  • Physical limits (can't buy infinite CPU)
  • Single point of failure
  • Downtime during upgrades
  • Expensive at high end
  • Diminishing returns (2x cost ≠ 2x performance)

When to Use

Good for:
- Databases (PostgreSQL, MySQL)
- Legacy monoliths
- Early stage (< 100k users)
- CPU-intensive tasks (video encoding)

Bad for:
- Web-scale applications
- High availability requirements
- Cost-sensitive projects
Click to expand code...

Horizontal Scaling (Scale Out)

Add more servers instead of making one server bigger.

Before:  1 server
After:   100 servers

Each server handles 1% of traffic
Click to expand code...

Architecture

mermaid
graph TB
    Users[Users] --> LB[Load Balancer]
    LB --> S1[Server 1]
    LB --> S2[Server 2]
    LB --> S3[Server 3]
    LB --> SN[Server N...]
    
    S1 --> Cache[(Redis Cache)]
    S2 --> Cache
    S3 --> Cache
    SN --> Cache
    
    Cache --> DB[(Database)]
Click to expand code...

Implementation

javascript
// Stateless application server (can scale horizontally)
const express = require('express');
const redis = require('redis');

const app = express();
const cache = redis.createClient({ host: 'redis.cache.com' });

app.get('/user/:id', async (req, res) => {
  const userId = req.params.id;
  
  // Check cache first
  const cached = await cache.get(`user:${userId}`);
  if (cached) {
    return res.json(JSON.parse(cached));
  }
  
  // Fetch from database
  const user = await db.users.findById(userId);
  
  // Store in cache
  await cache.set(`user:${userId}`, JSON.stringify(user), 'EX', 3600);
  
  res.json(user);
});

app.listen(process.env.PORT || 3000);
Click to expand code...

Key: Servers are stateless - no session data stored locally.

Pros & Cons

Pros:

  • No limits (add infinite servers)
  • High availability (one server dies, others continue)
  • Cost-effective (use commodity hardware)
  • Gradual scaling (add servers as needed)

Cons:

  • Complexity (distributed systems)
  • Stateless requirement (sessions in Redis/DB)
  • Data consistency challenges
  • Network overhead

When to Use

Good for:
- Web applications
- API servers
- Microservices
- High availability needs
- Web-scale (> 1M users)

Bad for:
- Stateful applications
- Single-threaded tasks
- Tightly coupled systems
Click to expand code...

The Scaling Roadmap (1 to 100M Users)

Stage 1: Single Server (0-1,000 users)

┌─────────────────┐
│   Web + DB      │ ← Everything on one box
│   One Server    │
└─────────────────┘
Click to expand code...

Cost: $50/month
Complexity: Very Low

Stage 2: Separate Database (1k-10k users)

┌─────────────┐
│ Web Server  │
└──────┬──────┘
       │
┌──────▼──────┐
│  Database   │
└─────────────┘
Click to expand code...

Why: Database and web server compete for resources
Cost: $200/month

Stage 3: Load Balancer + Multiple Servers (10k-100k)

           ┌──────────────┐
Users ──▶  │Load Balancer │
           └──────┬───────┘
                  │
     ┌────────────┼────────────┐
     │            │            │
┌────▼───┐   ┌───▼────┐  ┌───▼────┐
│Server 1│   │Server 2│  │Server 3│
└────────┘   └────────┘  └────────┘
     │            │            │
     └────────────┼────────────┘
                  │
           ┌──────▼──────┐
           │  Database   │
           └─────────────┘
Click to expand code...

Why: Single server can't handle traffic
Cost: $1,000/month

Stage 4: Database Replication (100k-1M)

Web Servers
     │
     ├──────────────────┐
     │ (writes)         │ (reads)
     ▼                  ▼
┌─────────┐      ┌─────────┐
│ Master  │─────▶│ Replica │
│   DB    │      │   DB    │
└─────────┘      └─────────┘
                      │
                 ┌────▼────┐
                 │Replica 2│
                 └─────────┘
Click to expand code...

Why: Database becomes bottleneck
Pattern: Write to master, read from replicas
Cost: $5,000/month

Stage 5: Caching Layer (1M-10M)

Web Servers
     │
     ▼
┌─────────┐
│ Redis   │ ← Cache hot data
│ Cache   │
└─────────┘
     │
     ▼
Databases
Click to expand code...

Why: 80% of requests hit same 20% of data
Impact: 10x reduction in database load
Cost: $10,000/month

Stage 6: CDN + Asset Optimization (10M+)

Users
  │
  ▼
┌─────────┐
│   CDN   │ ← Static assets (images, CSS, JS)
└─────────┘
  │
  ▼
Load Balancer → Web Servers
Click to expand code...

Why: Reduce latency globally
Impact: 95% of bandwidth served from CDN
Cost: $20,000/month

Stage 7: Database Sharding (10M-100M)

User ID % 4:
  0 → Shard 0
  1 → Shard 1
  2 → Shard 2
  3 → Shard 3
Click to expand code...

Why: Single database can't handle writes
Complexity: High
Cost: $50,000/month


Database Scaling Strategies

1. Vertical Scaling (Database)

sql
-- Upgrade database instance
RDS db.t3.small  → db.r5.24xlarge

Limits:
- Max 768GB RAM on AWS RDS
- Max 128 vCPUs
- Single point of failure
Click to expand code...

2. Read Replicas

Application
  │
  ├─── Writes ──→ Master DB
  │
  └─── Reads ──→ Replica 1
                 Replica 2
                 Replica 3
Click to expand code...

Implementation:

javascript
const { Pool } = require('pg');

const masterPool = new Pool({ host: 'master.db.com' });
const replicaPool = new Pool({ host: 'replica.db.com' });

// Writes go to master
async function createUser(name, email) {
  return masterPool.query(
    'INSERT INTO users (name, email) VALUES ($1, $2)',
    [name, email]
  );
}

// Reads go to replica
async function getUser(id) {
  return replicaPool.query(
    'SELECT * FROM users WHERE id = $1',
    [id]
  );
}
Click to expand code...

Pros:

  • Easy to implement
  • Scales read traffic infinitely
  • No application changes (mostly)

Cons:

  • Replication lag (eventual consistency)
  • Writes still bottlenecked
  • Doesn't help with storage

3. Sharding (Horizontal Partitioning)

Split data across multiple databases by key.

javascript
// Hash-based sharding
function getUserShard(userId) {
  const shardCount = 4;
  return userId % shardCount;
  // User 1 → Shard 1
  // User 5 → Shard 1
  // User 7 → Shard 3
}

async function getUser(userId) {
  const shard = getUserShard(userId);
  const db = databases[shard];
  return db.query('SELECT * FROM users WHERE id = $1', [userId]);
}
Click to expand code...

Sharding strategies:

1. Hash-based:
   shard = user_id % num_shards
   
2. Range-based:
   users 1-1M    → Shard 0
   users 1M-2M   → Shard 1
   users 2M-3M   → Shard 2
   
3. Geographic:
   US users      → US Shard
   EU users      → EU Shard
   Asia users    → Asia Shard
Click to expand code...

Challenges:

sql
-- Cross-shard queries are expensive
-- User 1 (Shard 0) follows User 5 (Shard 1)
SELECT posts.* FROM posts
WHERE user_id IN (
  SELECT followed_id FROM follows WHERE follower_id = 1
)
-- Requires querying multiple shards!
Click to expand code...

Pros:

  • Scales writes and storage
  • Each shard is independent
  • Can use different hardware per shard

Cons:

  • Complex application logic
  • Difficult to rebalance
  • Cross-shard queries expensive
  • No transactions across shards

Caching Strategies

Cache Aside (Lazy Loading)

javascript
async function getUser(id) {
  // 1. Try cache
  const cached = await cache.get(`user:${id}`);
  if (cached) return JSON.parse(cached);
  
  // 2. Cache miss - fetch from DB
  const user = await db.users.findById(id);
  
  // 3. Store in cache
  await cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
  
  return user;
}
Click to expand code...

Write-Through Cache

javascript
async function updateUser(id, data) {
  // 1. Update database
  const user = await db.users.update(id, data);
  
  // 2. Update cache immediately
  await cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
  
  return user;
}
Click to expand code...

Cache Eviction Policies

LRU (Least Recently Used):
- Evict least recently accessed items
- Redis default with maxmemory-policy

LFU (Least Frequently Used):
- Evict least frequently accessed items
- Better for hot data

TTL (Time To Live):
- Expire after fixed time
- Good for time-sensitive data
Click to expand code...

Real-World Examples

Instagram's Scaling Journey

2010: Single server
2011: 3 web servers + 1 DB
2012: 25 servers + sharded Cassandra
2013: 100+ servers + Facebook infrastructure
2023: 10,000+ servers serving 2B users
Click to expand code...

Key decisions:

  • PostgreSQL → Cassandra (sharding)
  • Django → async Python
  • AWS → Facebook datacenters

Twitter's Scaling

2007: Rails monolith
2008: Message queue (Kestrel)
2010: Moved timeline to Scala/JVM
2012: Manhattan (distributed database)
2023: 300K+ tweets/second
Click to expand code...

Architecture:

  • Fanout on write (pre-compute timelines)
  • Redis for timeline cache
  • Manhattan for distributed storage

Netflix

Users: 230M+ globally
Traffic: 15% of global internet bandwidth
Architecture: 1,000+ microservices

Scaling strategy:
- AWS auto-scaling groups
- Cassandra (multi-region)
- EVCache (memcached)
- Zuul (API gateway)
Click to expand code...

Interview Tips 💡

When discussing scaling in system design interviews:

  1. Start simple: "Initially, single server with database is fine for < 10k users..."
  2. Identify bottleneck: "As traffic grows, database becomes bottleneck. Add read replicas..."
  3. Add layers: "Cache layer with Redis reduces database load by 90%..."
  4. Estimate numbers: "1M users × 10 requests/day = 115 requests/second. One server handles ~1000 RPS, so need load balancer + 2-3 servers..."
  5. Discuss trade-offs: "Sharding increases complexity but necessary for > 10M users..."
  6. Real examples: "Instagram uses similar pattern: PostgreSQL → Cassandra sharding..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles