Why Load Balance?

A single server can handle only so many requests (e.g., 10k RPS). To handle millions, we need Horizontal Scaling — adding more servers to a fleet. A Load Balancer (LB) sits in front of these servers and distributes incoming traffic, ensuring no single server is overwhelmed while others sit idle.

Without a load balancer, if your one server goes down, your entire application is offline. With a fleet of servers behind a load balancer, individual server failures are invisible to users.

Interactive: Algorithms

How does the LB decide which server gets the request?

Interactive: Load Balancer

Current Algorithm: round robin

1. Round Robin

Logic: Send requests in rotation. Server 1 -> Server 2 -> Server 3 -> Server 1.
Pros: Simple, stateless, no coordination needed. Fair distribution when servers are identical and requests are uniform.
Cons: Doesn't account for server load. One request might take 1ms, another 10 seconds. A server handling expensive requests gets overwhelmed while others are idle.
Variant — Weighted Round Robin: Each server has a weight. A server with weight 3 gets triple the traffic of a server with weight 1. Useful when servers have different hardware.

2. Least Connections

Logic: Send to the server with the fewest active connections right now.
Pros: Adapts to slow/fast requests. A server struggling with heavy queries naturally receives fewer new requests.
Cons: Needs state — the LB must track active connections per server. Slightly more overhead than Round Robin.
Variant — Weighted Least Connections: Combines connection count with server weights.

3. IP Hash (Sticky Sessions)

Logic: hash(ClientIP) % NumServers. The same client always hits the same server.
Pros: Guarantees session affinity. Useful when application state is stored in server memory (e.g., shopping carts, WebSocket connections).
Cons: If a server dies, all its clients are disrupted. Uneven distribution if some IPs generate more traffic.

4. Least Response Time

Logic: Send to the server with the lowest average response time AND fewest active connections.
Pros: Optimizes for user-perceived latency.
Cons: Requires continuous monitoring of response times. More complex to implement.

5. Random

Logic: Pick a server at random.
Pros: Extremely simple. Surprisingly effective with large server fleets.
Cons: No guarantees about even distribution for small numbers of requests.

L4 vs L7 Load Balancing

Layer 4 (Transport Layer)

Data: IP Address + Port (TCP/UDP).
Behavior: "I see a packet for Port 80, I forward it to Server A." The LB never inspects the packet contents.
Speed: Extremely fast — operates at the kernel level using technologies like IPVS or eBPF.
Encryption: Doesn't decrypt SSL/TLS. Performs TCP pass-through.
Examples: AWS NLB (Network Load Balancer), HAProxy in TCP mode, Linux IPVS.

Layer 7 (Application Layer)

Data: HTTP Headers, URL path, Cookies, Request body.
Behavior: "URL is /checkout → send to Payment Service. URL is /static/logo.png → send to CDN. Header has Accept: application/grpc → send to gRPC backend."
Speed: Slower — must buffer enough packets to parse the HTTP request, decrypt SSL, inspect the payload.
Smart Features: Content-based routing, A/B testing, authentication, Web Application Firewall (WAF), rate limiting, header injection.
Examples: AWS ALB (Application Load Balancer), Nginx, Envoy, Traefik, Cloudflare.

Comparison

Feature	Layer 4	Layer 7
Speed	Very Fast (kernel-level)	Slower (must parse HTTP)
Intelligence	None (IP + Port only)	High (URL, headers, cookies)
SSL Termination	No (pass-through)	Yes
WebSocket Support	Transparent	Requires explicit config
Cost	Lower	Higher
Use Case	Raw TCP/UDP traffic, gaming	Web apps, APIs, microservices

Health Checks

LBs must detect when a server is unhealthy and stop sending it traffic.

Active Health Checks

The LB proactively sends a request to each server every N seconds:

GET /health HTTP/1.1

Response 200 OK → Server is healthy
Response 503    → Server is unhealthy, remove from rotation
Timeout (>3s)   → Server is unhealthy

Click to expand code...

Healthy servers are checked every 10-30 seconds. Unhealthy servers are checked more frequently (every 2-5 seconds) to detect recovery quickly.

Passive Health Checks

The LB monitors actual traffic:

If Server A returns 3 consecutive 5xx errors, it's temporarily removed.
If Server A's response time exceeds 5 seconds for the past 10 requests, mark as degraded.

Best practice: Use both active and passive health checks together.

Code Example: Nginx L7 Load Balancer

nginx

http {
    upstream backend_servers {
        least_conn;  # Algorithm: Least Connections

        server 10.0.0.1:8080 weight=3;  # More powerful server
        server 10.0.0.2:8080 weight=1;
        server 10.0.0.3:8080 weight=1;
        server 10.0.0.4:8080 backup;    # Only used if others fail
    }

    server {
        listen 80;
        listen 443 ssl;

        ssl_certificate     /etc/ssl/cert.pem;
        ssl_certificate_key /etc/ssl/key.pem;

        location / {
            proxy_pass http://backend_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }

        # Static files served directly (no backend needed)
        location /static/ {
            root /var/www;
            expires 30d;
        }
    }
}

Click to expand code...

Global Load Balancing (DNS-Based)

For multi-region deployments, DNS-based load balancing routes users to the nearest data center:

User in Tokyo queries api.example.com.
DNS returns IP of the Tokyo data center.
User in London queries the same domain.
DNS returns IP of the London data center.

GeoDNS services (Cloudflare, Route 53, NS1) measure latency from the user's resolver to each data center and return the fastest option.

Load Balancer High Availability

What if the load balancer itself fails? It becomes a Single Point of Failure (SPOF).

Solution: Deploy load balancers in Active-Passive or Active-Active pairs:

Active-Passive: Two LB instances share a Virtual IP (VIP). If the active one fails, the passive takes over using VRRP (Virtual Router Redundancy Protocol).
Active-Active: Both LBs handle traffic simultaneously. DNS returns both IPs. If one fails, the other absorbs all traffic.

Interview Tips 💡

Start with why: "A load balancer distributes traffic across servers for scalability and fault tolerance."
Choose the right algorithm: "Round Robin for stateless APIs, Least Connections for variable-latency backends, IP Hash for sticky sessions."
L4 vs L7: "Use L4 for raw performance (gaming, TCP), L7 when you need content-based routing (web apps, microservices)."
Don't forget health checks: "Active checks ping /health every 10 seconds. Passive checks monitor real traffic for errors."
Address the SPOF: "LBs must be deployed in HA pairs to avoid becoming a single point of failure."

Related Concepts

Consistent Hashing — Advanced balancing for stateful routing
DNS Architecture — Global load balancing via DNS
Proxies — Forward vs Reverse Proxy
Horizontal Scaling — Why you need a load balancer
Rate Limiting — Often implemented at the LB layer

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Advanced

BitTorrent Protocol (P2P File Sharing)

Complete guide to peer-to-peer file sharing using BitTorrent protocol, covering torrent structure, piece exchange, tit-for-tat algorithm, DHT for decentralization, and real-world implementations powering massive file distribution networks.

System DesignNetworkingDistributed Systems

Intermediate