Token Bucket vs Leaky Bucket

Token Bucket and Leaky Bucket are two fundamental algorithms for rate limiting and traffic shaping. Despite their similar names, they serve different purposes and behave very differently.

Quick comparison:

Token Bucket: Allows bursts, limits average rate
Leaky Bucket: Smooths traffic, enforces constant rate

Token Bucket Algorithm

How It Works

Imagine a bucket that holds tokens (permissions to make requests):

mermaid

graph LR
    Refill["+R tokens/sec"] --> Bucket[Bucket: C capacity]
    Request[Request arrives] -->|"consume 1 token"| Bucket
    Bucket -->|"has tokens"| Allow[✓ Process immediately]
    Bucket -->|"empty"| Reject[✗ Reject 429]

Click to expand code...

Mechanics:

Bucket has capacity C tokens (initially full)
Tokens added at rate R per second
Each request consumes 1 token
If bucket empty → reject request
If tokens available → process immediately

Visual Example

Capacity: 10 tokens
Refill: 1 token/second

Time | Tokens | Event              | Result
-----|--------|--------------------|-----------
0s   | 10     | User idle (10 sec) | Bucket full
0s   | 10     | 5 requests arrive  | ✓ All accepted (5 left)
1s   | 6      | +1 refill          | 6 tokens
1s   | 6      | 8 requests arrive  | ✓ First 6, ✗ Last 2 rejected
2s   | 1      | +1 refill          | 1 token
2s   | 1      | 1 request arrives  | ✓ Accepted

Click to expand code...

Key insight: Bursts allowed up to capacity, then limited to refill rate.

Implementation (Python)

python

import time

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()
    
    def consume(self, tokens=1):
        self._refill()
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False
    
    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        tokens_to_add = elapsed * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now

# Usage
bucket = TokenBucket(capacity=10, refill_rate=1)
if bucket.consume():
    process_request()
else:
    return_429_error()

Click to expand code...

Use Cases

✅ API Rate Limiting

Free tier: 100 req/min average, burst of 20
bucket = TokenBucket(capacity=20, refill_rate=100/60)

Click to expand code...

✅ User-Facing Services

Allows natural user behavior (refresh button = 3 quick requests)
Good UX: occasional bursts don't get penalized

✅ AWS API Gateway Throttling

yaml

throttling:
  burstLimit: 5000    # Token bucket capacity
  rateLimit: 10000    # Refill rate (requests/sec)

Click to expand code...

Leaky Bucket Algorithm

How It Works

Imagine a bucket with a hole in the bottom:

mermaid

graph TD
    Input[Requests arrive] -->|"add to queue"| Queue[Queue: Max size Q]
    Queue -->|"leak at constant rate R"| Process[Process Request]
    Queue -->|"queue full"| Drop[✗ Drop request]

Click to expand code...

Mechanics:

Requests enter a queue (bucket)
Requests processed (leak) at constant rate R
If queue full → drop incoming requests
Output rate is always R, regardless of input

Visual Example

Queue Capacity: 5 requests
Leak Rate: 1 request/second

Time | Queue | Event              | Result
-----|-------|--------------------|-------------
0s   | []    | 3 requests arrive  | Queue: [R1, R2, R3]
1s   | [R2,R3] | Process R1       | R1 processed
1s   | [R2,R3,R4,R5,R6] | 3 more arrive | R6 dropped (queue full)
2s   | [R3,R4,R5] | Process R2    | R2 processed
3s   | [R4,R5] | Process R3       | R3 processed

Click to expand code...

Key insight: Output is smooth and constant, bursts are queued or dropped.

Implementation (Python)

python

import time
from collections import deque

class LeakyBucket:
    def __init__(self, capacity, leak_rate):
        self.capacity = capacity
        self.queue = deque()
        self.leak_rate = leak_rate
        self.last_leak = time.time()
    
    def add_request(self, request):
        self._leak()
        if len(self.queue) < self.capacity:
            self.queue.append(request)
            return True
        return False  # Queue full, drop request
    
    def _leak(self):
        """Process requests at constant rate"""
        now = time.time()
        elapsed = now - self.last_leak
        
        # Process requests that should have leaked
        leaks = int(elapsed * self.leak_rate)
        for _ in range(min(leaks, len(self.queue))):
            request = self.queue.popleft()
            process_request(request)
        
        self.last_leak = now

# Usage
bucket = LeakyBucket(capacity=100, leak_rate=10)
if bucket.add_request(incoming_request):
    print("Request queued")
else:
    print("Request dropped (429)")

Click to expand code...

Use Cases

✅ Network Traffic Shaping

ISPs use leaky bucket to smooth bursty traffic
Prevents network congestion

✅ Video Streaming

Output frames at constant 30 FPS
Buffer handles temporary encoding delays

✅ Background Job Processing

Process tasks at sustainable rate
Prevents worker overload

Side-by-Side Comparison

Aspect	Token Bucket	Leaky Bucket
Burst Handling	✅ Allows controlled bursts	❌ Smooths bursts, no acceleration
Output Rate	Variable (instant if tokens available)	Constant (always leak rate)
Queueing	No queue, immediate decision	Queue requests for processing
Memory	O(1) — just track tokens	O(n) — store queued requests
Latency	Low (immediate response)	Higher (queuing delay)
User Experience	Better (responsive to bursts)	Worse (artificial delays)
Server Protection	Moderate	Excellent (guaranteed max rate)
Implementation	Simple	Moderate (need queue management)

Traffic Pattern Comparison

Input:  ▂▂▂███████▂▂▂▂▂▂▂ (burst in middle)

Token Bucket Output:
        ▂▂▂██████▂▂▂▂▂▂▂▂  (burst passed through, some rejected)
        
Leaky Bucket Output:
        ▂▂▂▃▃▃▃▃▃▃▃▃▃▃▃▂  (smoothed to constant rate)

Click to expand code...

When to Use Each

Choose Token Bucket When:

✅ User-facing APIs

Users expect quick responses to bursts (page refresh)
Good UX is priority

✅ Variable workload patterns

Traffic naturally bursty
Average rate matters more than instantaneous rate

✅ Multi-tier rate limiting

python

free_user = TokenBucket(capacity=10, refill_rate=1)      # 1 req/sec, burst 10
paid_user = TokenBucket(capacity=1000, refill_rate=100)  # 100 req/sec, burst 1000

Click to expand code...

Choose Leaky Bucket When:

✅ Protecting downstream services

Database can only handle 100 QPS consistently
Bursts would cause crashes

✅ Regulatory compliance

Must guarantee max rate (e.g., rate-limited APIs)

✅ Fair scheduling

Process items in order received
No priority to bursty users

Hybrid Approaches

Leaky Bucket with Token Bucket

Combine both for flexibility:

python

class HybridRateLimiter:
    def __init__(self):
        # Token bucket for burst allowance
        self.token_bucket = TokenBucket(capacity=100, refill_rate=10)
        # Leaky bucket for smoothing output
        self.leaky_bucket = LeakyBucket(capacity=50, leak_rate=10)
    
    def handle_request(self, request):
        # First check: Can accept burst?
        if not self.token_bucket.consume():
            return False, "Rate limit exceeded (burst)"
        
        # Second check: Queue for processing
        if not self.leaky_bucket.add_request(request):
            return False, "Server busy (queue full)"
        
        return True, "Request queued"

Click to expand code...

Benefits:

Token bucket prevents abuse
Leaky bucket protects server

Real-World Examples

Nginx Rate Limiting (Leaky Bucket)

nginx

http {
    # Define rate limiting zone (leaky bucket)
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    
    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            # Leak rate: 10/sec
            # Burst capacity: 20 requests
        }
    }
}

Click to expand code...

AWS API Gateway (Token Bucket)

python

# CloudFormation configuration
UsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    Throttle:
      BurstLimit: 5000   # Token bucket capacity
      RateLimit: 10000   # Refill rate (req/sec)

Click to expand code...

Google Cloud Armor (Hybrid)

Combines token bucket (per-user limits) with leaky bucket (server protection):

yaml

rateLimitOptions:
  conformAction: allow
  exceedAction: deny
  rateLimitThreshold:
    count: 100        # Token bucket capacity
    intervalSec: 60   # Refill period
  banThreshold:
    count: 10000      # Leaky bucket queue size
    intervalSec: 600

Click to expand code...

Distributed Implementation Challenges

Problem: Multiple Servers

Server A: User uses 50 tokens
Server B: User uses 50 tokens
Total: 100 tokens consumed (2x the limit!)

Click to expand code...

Solution: Centralized Redis

lua

-- token_bucket.lua (Redis Lua script)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Refill tokens
local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + (elapsed * refill_rate))

-- Consume token
local allowed = 0
if tokens >= 1 then
    tokens = tokens - 1
    allowed = 1
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)

return {allowed, tokens}

Click to expand code...

Interview Tips 💡

When discussing rate limiting in system design interviews:

Clarify requirements: "Do we need to allow bursts for better UX, or enforce strict constant rates?"
Compare algorithms: "Token bucket allows occasional bursts, while leaky bucket smooths traffic..."
Justify choice: "For this user-facing API, token bucket is better because users might refresh pages..."
Discuss distribution: "We'd use Redis with Lua scripts to share rate limit state across servers..."
Mention alternatives: "We could also use fixed window counters for simpler cases..."
Real examples: "AWS API Gateway uses token bucket with separate burst and rate limits..."

Related Concepts

Rate Limiting — Overview of rate limiting strategies
Circuit Breaker — Protecting services from cascading failures
API Gateway — Where rate limiting is often implemented
Backpressure — Related flow control in streaming systems
Redis Internals — Distributed rate limiting with Redis

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Intermediate

Geohashing (Location Encoding)

A geocoding system that encodes latitude/longitude coordinates into short alphanumeric strings for efficient proximity searches and spatial indexing.

System DesignGeospatialAlgorithms

Pro

Raft Consensus Algorithm

A comprehensive guide to Raft, the consensus algorithm powering Etcd, Consul, and Kubernetes. Leader election, log replication, safety guarantees, and production deployment patterns.

Distributed SystemsAlgorithmsConsensus

Pro

Top-K Heavy Hitters (Streaming)

How to find 'Trending Topics' in massive real-time streams. MapReduce, Count-Min Sketches, and Sliding Windows.

AlgorithmsBig DataStreaming

Token Bucket vs Leaky Bucket

Token Bucket Algorithm

How It Works

Visual Example

Implementation (Python)

Use Cases

Leaky Bucket Algorithm

How It Works

Visual Example

Implementation (Python)

Use Cases

Side-by-Side Comparison

Traffic Pattern Comparison

When to Use Each

Choose Token Bucket When:

Choose Leaky Bucket When:

Hybrid Approaches

Leaky Bucket with Token Bucket

Real-World Examples

Nginx Rate Limiting (Leaky Bucket)

AWS API Gateway (Token Bucket)

Google Cloud Armor (Hybrid)

Distributed Implementation Challenges

Problem: Multiple Servers

Solution: Centralized Redis

Interview Tips 💡

Related Concepts

About ScaleWiki

Related Articles

Geohashing (Location Encoding)

Raft Consensus Algorithm

Top-K Heavy Hitters (Streaming)