Back to All Concepts
System DesignAlgorithmsRate LimitingIntermediate

Token Bucket vs Leaky Bucket

A comprehensive comparison of two fundamental rate limiting algorithms: Token Bucket allows controlled bursts while Leaky Bucket enforces constant output rates.

Token Bucket vs Leaky Bucket

Token Bucket and Leaky Bucket are two fundamental algorithms for rate limiting and traffic shaping. Despite their similar names, they serve different purposes and behave very differently.

Quick comparison:

  • Token Bucket: Allows bursts, limits average rate
  • Leaky Bucket: Smooths traffic, enforces constant rate

Token Bucket Algorithm

How It Works

Imagine a bucket that holds tokens (permissions to make requests):

mermaid
graph LR
    Refill["+R tokens/sec"] --> Bucket[Bucket: C capacity]
    Request[Request arrives] -->|"consume 1 token"| Bucket
    Bucket -->|"has tokens"| Allow[✓ Process immediately]
    Bucket -->|"empty"| Reject[✗ Reject 429]
Click to expand code...

Mechanics:

  1. Bucket has capacity C tokens (initially full)
  2. Tokens added at rate R per second
  3. Each request consumes 1 token
  4. If bucket empty → reject request
  5. If tokens available → process immediately

Visual Example

Capacity: 10 tokens
Refill: 1 token/second

Time | Tokens | Event              | Result
-----|--------|--------------------|-----------
0s   | 10     | User idle (10 sec) | Bucket full
0s   | 10     | 5 requests arrive  | ✓ All accepted (5 left)
1s   | 6      | +1 refill          | 6 tokens
1s   | 6      | 8 requests arrive  | ✓ First 6, ✗ Last 2 rejected
2s   | 1      | +1 refill          | 1 token
2s   | 1      | 1 request arrives  | ✓ Accepted
Click to expand code...

Key insight: Bursts allowed up to capacity, then limited to refill rate.

Implementation (Python)

python
import time

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()
    
    def consume(self, tokens=1):
        self._refill()
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False
    
    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        tokens_to_add = elapsed * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now

# Usage
bucket = TokenBucket(capacity=10, refill_rate=1)
if bucket.consume():
    process_request()
else:
    return_429_error()
Click to expand code...

Use Cases

API Rate Limiting

Free tier: 100 req/min average, burst of 20
bucket = TokenBucket(capacity=20, refill_rate=100/60)
Click to expand code...

User-Facing Services

  • Allows natural user behavior (refresh button = 3 quick requests)
  • Good UX: occasional bursts don't get penalized

AWS API Gateway Throttling

yaml
throttling:
  burstLimit: 5000    # Token bucket capacity
  rateLimit: 10000    # Refill rate (requests/sec)
Click to expand code...

Leaky Bucket Algorithm

How It Works

Imagine a bucket with a hole in the bottom:

mermaid
graph TD
    Input[Requests arrive] -->|"add to queue"| Queue[Queue: Max size Q]
    Queue -->|"leak at constant rate R"| Process[Process Request]
    Queue -->|"queue full"| Drop[✗ Drop request]
Click to expand code...

Mechanics:

  1. Requests enter a queue (bucket)
  2. Requests processed (leak) at constant rate R
  3. If queue full → drop incoming requests
  4. Output rate is always R, regardless of input

Visual Example

Queue Capacity: 5 requests
Leak Rate: 1 request/second

Time | Queue | Event              | Result
-----|-------|--------------------|-------------
0s   | []    | 3 requests arrive  | Queue: [R1, R2, R3]
1s   | [R2,R3] | Process R1       | R1 processed
1s   | [R2,R3,R4,R5,R6] | 3 more arrive | R6 dropped (queue full)
2s   | [R3,R4,R5] | Process R2    | R2 processed
3s   | [R4,R5] | Process R3       | R3 processed
Click to expand code...

Key insight: Output is smooth and constant, bursts are queued or dropped.

Implementation (Python)

python
import time
from collections import deque

class LeakyBucket:
    def __init__(self, capacity, leak_rate):
        self.capacity = capacity
        self.queue = deque()
        self.leak_rate = leak_rate
        self.last_leak = time.time()
    
    def add_request(self, request):
        self._leak()
        if len(self.queue) < self.capacity:
            self.queue.append(request)
            return True
        return False  # Queue full, drop request
    
    def _leak(self):
        """Process requests at constant rate"""
        now = time.time()
        elapsed = now - self.last_leak
        
        # Process requests that should have leaked
        leaks = int(elapsed * self.leak_rate)
        for _ in range(min(leaks, len(self.queue))):
            request = self.queue.popleft()
            process_request(request)
        
        self.last_leak = now

# Usage
bucket = LeakyBucket(capacity=100, leak_rate=10)
if bucket.add_request(incoming_request):
    print("Request queued")
else:
    print("Request dropped (429)")
Click to expand code...

Use Cases

Network Traffic Shaping

  • ISPs use leaky bucket to smooth bursty traffic
  • Prevents network congestion

Video Streaming

  • Output frames at constant 30 FPS
  • Buffer handles temporary encoding delays

Background Job Processing

  • Process tasks at sustainable rate
  • Prevents worker overload

Side-by-Side Comparison

AspectToken BucketLeaky Bucket
Burst Handling✅ Allows controlled bursts❌ Smooths bursts, no acceleration
Output RateVariable (instant if tokens available)Constant (always leak rate)
QueueingNo queue, immediate decisionQueue requests for processing
MemoryO(1) — just track tokensO(n) — store queued requests
LatencyLow (immediate response)Higher (queuing delay)
User ExperienceBetter (responsive to bursts)Worse (artificial delays)
Server ProtectionModerateExcellent (guaranteed max rate)
ImplementationSimpleModerate (need queue management)

Traffic Pattern Comparison

Input:  ▂▂▂███████▂▂▂▂▂▂▂ (burst in middle)

Token Bucket Output:
        ▂▂▂██████▂▂▂▂▂▂▂▂  (burst passed through, some rejected)
        
Leaky Bucket Output:
        ▂▂▂▃▃▃▃▃▃▃▃▃▃▃▃▂  (smoothed to constant rate)
Click to expand code...

When to Use Each

Choose Token Bucket When:

User-facing APIs

  • Users expect quick responses to bursts (page refresh)
  • Good UX is priority

Variable workload patterns

  • Traffic naturally bursty
  • Average rate matters more than instantaneous rate

Multi-tier rate limiting

python
free_user = TokenBucket(capacity=10, refill_rate=1)      # 1 req/sec, burst 10
paid_user = TokenBucket(capacity=1000, refill_rate=100)  # 100 req/sec, burst 1000
Click to expand code...

Choose Leaky Bucket When:

Protecting downstream services

  • Database can only handle 100 QPS consistently
  • Bursts would cause crashes

Regulatory compliance

  • Must guarantee max rate (e.g., rate-limited APIs)

Fair scheduling

  • Process items in order received
  • No priority to bursty users

Hybrid Approaches

Leaky Bucket with Token Bucket

Combine both for flexibility:

python
class HybridRateLimiter:
    def __init__(self):
        # Token bucket for burst allowance
        self.token_bucket = TokenBucket(capacity=100, refill_rate=10)
        # Leaky bucket for smoothing output
        self.leaky_bucket = LeakyBucket(capacity=50, leak_rate=10)
    
    def handle_request(self, request):
        # First check: Can accept burst?
        if not self.token_bucket.consume():
            return False, "Rate limit exceeded (burst)"
        
        # Second check: Queue for processing
        if not self.leaky_bucket.add_request(request):
            return False, "Server busy (queue full)"
        
        return True, "Request queued"
Click to expand code...

Benefits:

  • Token bucket prevents abuse
  • Leaky bucket protects server

Real-World Examples

Nginx Rate Limiting (Leaky Bucket)

nginx
http {
    # Define rate limiting zone (leaky bucket)
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    
    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            # Leak rate: 10/sec
            # Burst capacity: 20 requests
        }
    }
}
Click to expand code...

AWS API Gateway (Token Bucket)

python
# CloudFormation configuration
UsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    Throttle:
      BurstLimit: 5000   # Token bucket capacity
      RateLimit: 10000   # Refill rate (req/sec)
Click to expand code...

Google Cloud Armor (Hybrid)

Combines token bucket (per-user limits) with leaky bucket (server protection):

yaml
rateLimitOptions:
  conformAction: allow
  exceedAction: deny
  rateLimitThreshold:
    count: 100        # Token bucket capacity
    intervalSec: 60   # Refill period
  banThreshold:
    count: 10000      # Leaky bucket queue size
    intervalSec: 600
Click to expand code...

Distributed Implementation Challenges

Problem: Multiple Servers

Server A: User uses 50 tokens
Server B: User uses 50 tokens
Total: 100 tokens consumed (2x the limit!)
Click to expand code...

Solution: Centralized Redis

lua
-- token_bucket.lua (Redis Lua script)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Refill tokens
local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + (elapsed * refill_rate))

-- Consume token
local allowed = 0
if tokens >= 1 then
    tokens = tokens - 1
    allowed = 1
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)

return {allowed, tokens}
Click to expand code...

Interview Tips 💡

When discussing rate limiting in system design interviews:

  1. Clarify requirements: "Do we need to allow bursts for better UX, or enforce strict constant rates?"
  2. Compare algorithms: "Token bucket allows occasional bursts, while leaky bucket smooths traffic..."
  3. Justify choice: "For this user-facing API, token bucket is better because users might refresh pages..."
  4. Discuss distribution: "We'd use Redis with Lua scripts to share rate limit state across servers..."
  5. Mention alternatives: "We could also use fixed window counters for simpler cases..."
  6. Real examples: "AWS API Gateway uses token bucket with separate burst and rate limits..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles