Why Load Balance?
A single server can handle only so many requests (e.g., 10k RPS). To handle millions, we need Horizontal Scaling — adding more servers to a fleet. A Load Balancer (LB) sits in front of these servers and distributes incoming traffic, ensuring no single server is overwhelmed while others sit idle.
Without a load balancer, if your one server goes down, your entire application is offline. With a fleet of servers behind a load balancer, individual server failures are invisible to users.
Interactive: Algorithms
How does the LB decide which server gets the request?
Interactive: Load Balancer
1. Round Robin
- Logic: Send requests in rotation. Server 1 -> Server 2 -> Server 3 -> Server 1.
- Pros: Simple, stateless, no coordination needed. Fair distribution when servers are identical and requests are uniform.
- Cons: Doesn't account for server load. One request might take 1ms, another 10 seconds. A server handling expensive requests gets overwhelmed while others are idle.
- Variant — Weighted Round Robin: Each server has a weight. A server with weight 3 gets triple the traffic of a server with weight 1. Useful when servers have different hardware.
2. Least Connections
- Logic: Send to the server with the fewest active connections right now.
- Pros: Adapts to slow/fast requests. A server struggling with heavy queries naturally receives fewer new requests.
- Cons: Needs state — the LB must track active connections per server. Slightly more overhead than Round Robin.
- Variant — Weighted Least Connections: Combines connection count with server weights.
3. IP Hash (Sticky Sessions)
- Logic:
hash(ClientIP) % NumServers. The same client always hits the same server. - Pros: Guarantees session affinity. Useful when application state is stored in server memory (e.g., shopping carts, WebSocket connections).
- Cons: If a server dies, all its clients are disrupted. Uneven distribution if some IPs generate more traffic.
4. Least Response Time
- Logic: Send to the server with the lowest average response time AND fewest active connections.
- Pros: Optimizes for user-perceived latency.
- Cons: Requires continuous monitoring of response times. More complex to implement.
5. Random
- Logic: Pick a server at random.
- Pros: Extremely simple. Surprisingly effective with large server fleets.
- Cons: No guarantees about even distribution for small numbers of requests.
L4 vs L7 Load Balancing
Layer 4 (Transport Layer)
- Data: IP Address + Port (TCP/UDP).
- Behavior: "I see a packet for Port 80, I forward it to Server A." The LB never inspects the packet contents.
- Speed: Extremely fast — operates at the kernel level using technologies like IPVS or eBPF.
- Encryption: Doesn't decrypt SSL/TLS. Performs TCP pass-through.
- Examples: AWS NLB (Network Load Balancer), HAProxy in TCP mode, Linux IPVS.
Layer 7 (Application Layer)
- Data: HTTP Headers, URL path, Cookies, Request body.
- Behavior: "URL is
/checkout→ send to Payment Service. URL is/static/logo.png→ send to CDN. Header hasAccept: application/grpc→ send to gRPC backend." - Speed: Slower — must buffer enough packets to parse the HTTP request, decrypt SSL, inspect the payload.
- Smart Features: Content-based routing, A/B testing, authentication, Web Application Firewall (WAF), rate limiting, header injection.
- Examples: AWS ALB (Application Load Balancer), Nginx, Envoy, Traefik, Cloudflare.
Comparison
| Feature | Layer 4 | Layer 7 |
|---|---|---|
| Speed | Very Fast (kernel-level) | Slower (must parse HTTP) |
| Intelligence | None (IP + Port only) | High (URL, headers, cookies) |
| SSL Termination | No (pass-through) | Yes |
| WebSocket Support | Transparent | Requires explicit config |
| Cost | Lower | Higher |
| Use Case | Raw TCP/UDP traffic, gaming | Web apps, APIs, microservices |
Health Checks
LBs must detect when a server is unhealthy and stop sending it traffic.
Active Health Checks
The LB proactively sends a request to each server every N seconds:
GET /health HTTP/1.1 Response 200 OK → Server is healthy Response 503 → Server is unhealthy, remove from rotation Timeout (>3s) → Server is unhealthy
Healthy servers are checked every 10-30 seconds. Unhealthy servers are checked more frequently (every 2-5 seconds) to detect recovery quickly.
Passive Health Checks
The LB monitors actual traffic:
- If Server A returns 3 consecutive 5xx errors, it's temporarily removed.
- If Server A's response time exceeds 5 seconds for the past 10 requests, mark as degraded.
Best practice: Use both active and passive health checks together.
Code Example: Nginx L7 Load Balancer
http {
upstream backend_servers {
least_conn; # Algorithm: Least Connections
server 10.0.0.1:8080 weight=3; # More powerful server
server 10.0.0.2:8080 weight=1;
server 10.0.0.3:8080 weight=1;
server 10.0.0.4:8080 backup; # Only used if others fail
}
server {
listen 80;
listen 443 ssl;
ssl_certificate /etc/ssl/cert.pem;
ssl_certificate_key /etc/ssl/key.pem;
location / {
proxy_pass http://backend_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# Static files served directly (no backend needed)
location /static/ {
root /var/www;
expires 30d;
}
}
}
Global Load Balancing (DNS-Based)
For multi-region deployments, DNS-based load balancing routes users to the nearest data center:
- User in Tokyo queries
api.example.com. - DNS returns IP of the Tokyo data center.
- User in London queries the same domain.
- DNS returns IP of the London data center.
GeoDNS services (Cloudflare, Route 53, NS1) measure latency from the user's resolver to each data center and return the fastest option.
Load Balancer High Availability
What if the load balancer itself fails? It becomes a Single Point of Failure (SPOF).
Solution: Deploy load balancers in Active-Passive or Active-Active pairs:
- Active-Passive: Two LB instances share a Virtual IP (VIP). If the active one fails, the passive takes over using VRRP (Virtual Router Redundancy Protocol).
- Active-Active: Both LBs handle traffic simultaneously. DNS returns both IPs. If one fails, the other absorbs all traffic.
Interview Tips 💡
- Start with why: "A load balancer distributes traffic across servers for scalability and fault tolerance."
- Choose the right algorithm: "Round Robin for stateless APIs, Least Connections for variable-latency backends, IP Hash for sticky sessions."
- L4 vs L7: "Use L4 for raw performance (gaming, TCP), L7 when you need content-based routing (web apps, microservices)."
- Don't forget health checks: "Active checks ping
/healthevery 10 seconds. Passive checks monitor real traffic for errors." - Address the SPOF: "LBs must be deployed in HA pairs to avoid becoming a single point of failure."
Related Concepts
- Consistent Hashing — Advanced balancing for stateful routing
- DNS Architecture — Global load balancing via DNS
- Proxies — Forward vs Reverse Proxy
- Horizontal Scaling — Why you need a load balancer
- Rate Limiting — Often implemented at the LB layer
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
BitTorrent Protocol (P2P File Sharing)
Complete guide to peer-to-peer file sharing using BitTorrent protocol, covering torrent structure, piece exchange, tit-for-tat algorithm, DHT for decentralization, and real-world implementations powering massive file distribution networks.
HTTP Evolution (H1 to H3)
From text-based HTTP/1.1 to binary HTTP/2 and UDP-based HTTP/3 (QUIC). Why we needed upgrades and how they solve Head-of-Line Blocking.
TCP Handshake & Congestion
The 3-way handshake that powers the internet. SYN, SYN-ACK, ACK. Flow control vs Congestion control, and modern algorithms like BBR.