What is Backpressure?
Backpressure is a flow control mechanism in streaming systems that prevents a fast producer from overwhelming a slow consumer. When data is generated faster than it can be processed, backpressure signals the producer to slow down, pause, or allows the system to drop data gracefully.
Think of it like a highway: when traffic backs up (consumers can't keep pace), you need mechanisms to prevent a pileup crash.
The Problem: Producer-Consumer Mismatch
Producer: ████████████ (10,000 events/sec) Consumer: ▂▂▂▂▂▂▂▂▂▂▂▂ (100 events/sec) Result: 💥 Out of Memory!
Real-World Scenario
Twitter Firehose API → Your Application 10,000 tweets/sec → Can process 100 tweets/sec Without backpressure: - Queue grows: 100 → 1,000 → 10,000 → 100,000 tweets - Memory exhausted - Application crashes (OOM)
Backpressure Strategies
1. Buffering (Queuing)
Store excess data in a temporary buffer while the consumer catches up.
graph LR
Producer[Producer: Fast] -->|10k/sec| Queue[Buffer Queue]
Queue -->|100/sec| Consumer[Consumer: Slow]
Queue -->|"queue size"| Monitor{Monitor}
Monitor -->|"full"| Alert[⚠️ Backpressure!]
Implementation:
from collections import deque
import time
class BufferedProcessor:
def __init__(self, max_buffer_size=1000):
self.buffer = deque(maxlen=max_buffer_size)
self.dropped_count = 0
def produce(self, item):
if len(self.buffer) >= self.buffer.maxlen:
# Buffer full - apply backpressure
self.dropped_count += 1
return False # Signal producer to slow down
self.buffer.append(item)
return True
def consume(self):
if self.buffer:
item = self.buffer.popleft()
self.process(item)
def process(self, item):
# Slow processing (100ms per item)
time.sleep(0.1)
print(f"Processed: {item}")
# Usage
processor = BufferedProcessor(max_buffer_size=100)
for i in range(1000):
if not processor.produce(i):
print(f"Backpressure applied! Dropped {processor.dropped_count} items")
time.sleep(0.5) # Producer backs off
Pros:
- Smooths temporary spikes
- Simple to implement
Cons:
- Fixed memory overhead
- If producer consistently faster → buffer fills → crash
Best for: Temporary bursts, not sustained overload
2. Load Shedding (Dropping Data)
Intentionally drop data when overwhelmed.
class LoadSheddingProcessor:
def __init__(self, capacity=100):
self.queue = []
self.capacity = capacity
self.dropped = 0
def produce(self, item, priority=0):
if len(self.queue) >= self.capacity:
# Drop lowest priority items
if priority > min(self.queue, key=lambda x: x[1])[1]:
self.queue.sort(key=lambda x: x[1])
dropped_item = self.queue.pop(0)
self.dropped += 1
self.queue.append((item, priority))
else:
self.dropped += 1
return False
else:
self.queue.append((item, priority))
return True
# Example: Drop old tweets when overwhelmed
processor = LoadSheddingProcessor(capacity=100)
for tweet in stream:
processor.produce(tweet, priority=tweet.timestamp)
Strategies:
- Drop oldest: FIFO queue overflow
- Drop newest: Reject incoming (circuit breaker pattern)
- Drop randomly: Probabilistic dropping
- Drop lowest priority: Preserve important data
Best for:
- Real-time systems where old data loses value
- Monitoring systems (can tolerate data loss)
- Video streaming (drop frames, not buffer)
3. Pull-Based Flow Control (Reactive Streams)
Consumer explicitly requests data from producer.
graph LR
Consumer[Consumer] -->|"request(10)"| Producer[Producer]
Producer -->|"send 10 items"| Consumer
Consumer -->|"process..."| Consumer
Consumer -->|"request(10) again"| Producer
Implementation (Reactive Streams Pattern):
class ReactiveProducer:
def __init__(self):
self.subscribers = []
def subscribe(self, subscriber):
self.subscribers.append(subscriber)
subscriber.on_subscribe(self)
def request(self, subscriber, n):
"""Subscriber requests n items"""
for _ in range(n):
if self.has_data():
data = self.get_next()
subscriber.on_next(data)
else:
subscriber.on_complete()
break
class ReactiveConsumer:
def __init__(self, batch_size=10):
self.batch_size = batch_size
self.subscription = None
def on_subscribe(self, producer):
self.subscription = producer
# Request initial batch
self.subscription.request(self, self.batch_size)
def on_next(self, item):
# Process item
self.process(item)
# Request more when done processing batch
if self.should_request_more():
self.subscription.request(self, self.batch_size)
def process(self, item):
# Slow processing
pass
# Usage
producer = ReactiveProducer()
consumer = ReactiveConsumer(batch_size=10)
producer.subscribe(consumer)
Pros:
- Perfect flow control (never overwhelmed)
- Consumer controls rate
- No unbounded buffers
Cons:
- More complex to implement
- Requires both sides to cooperate
Best for:
- RxJava, Project Reactor, Akka Streams
- Systems with highly variable processing times
4. Throttling
Limit producer rate explicitly.
import time
class ThrottledProducer:
def __init__(self, max_rate=100): # items per second
self.max_rate = max_rate
self.interval = 1.0 / max_rate
self.last_sent = 0
def produce(self, item):
now = time.time()
elapsed = now - self.last_sent
if elapsed < self.interval:
# Too fast, wait
time.sleep(self.interval - elapsed)
self.send(item)
self.last_sent = time.time()
def send(self, item):
# Actually send item
pass
# Limit to 100 items/sec
producer = ThrottledProducer(max_rate=100)
for item in data_source:
producer.produce(item)
Best for:
- Rate-limited APIs (prevent 429 errors)
- Controlled system load
5. Pushback (TCP-Style)
Consumer signals producer to slow down/pause.
class PushbackChannel:
def __init__(self):
self.paused = False
self.buffer = []
def send(self, item, producer):
if self.paused:
# Producer should block or buffer
producer.on_backpressure()
return False
self.buffer.append(item)
if len(self.buffer) > 100:
self.pause()
return True
def pause(self):
self.paused = True
# Signal producer to stop sending
def resume(self):
self.paused = False
# Signal producer can send again
def consume(self):
if self.buffer:
item = self.buffer.pop(0)
self.process(item)
if len(self.buffer) < 50:
self.resume()
Real example: TCP Flow Control
TCP Receiver: - Advertises window size: "I can accept 64KB" - Buffer fills up → window size = 0 - Sender stops transmitting - Receiver processes data → advertises larger window - Sender resumes
Real-World Examples
Node.js Streams
const fs = require('fs');
// ReadStream (producer) can overwhelm WriteStream (consumer)
const readStream = fs.createReadStream('large-file.txt');
const writeStream = fs.createWriteStream('output.txt');
// Without backpressure handling:
readStream.pipe(writeStream); // Can cause memory issues
// With backpressure:
readStream.on('data', (chunk) => {
const canContinue = writeStream.write(chunk);
if (!canContinue) {
// Backpressure! Pause reading
readStream.pause();
// Resume when write stream drains
writeStream.once('drain', () => {
readStream.resume();
});
}
});
Apache Kafka
// Producer config
props.put("buffer.memory", 33554432); // 32MB buffer
props.put("linger.ms", 10); // Wait 10ms to batch
// If buffer fills (backpressure):
producer.send(record, (metadata, exception) -> {
if (exception instanceof BufferExhaustedException) {
// Backpressure applied! Producer blocked.
logger.warn("Buffer full, slowing down");
Thread.sleep(100);
}
});
Kafka's backpressure mechanisms:
- Producer buffer: Blocks when full
- Consumer lag: Consumers signal they're behind
- Quotas: Throttle producers/consumers at broker level
RxJava (Reactive Streams)
Observable.range(1, 1_000_000)
.onBackpressureBuffer(100) // Buffer strategy
// or .onBackpressureDrop() // Drop strategy
// or .onBackpressureLatest() // Keep only latest
.observeOn(Schedulers.io())
.subscribe(
item -> {
// Slow consumer
Thread.sleep(10);
System.out.println(item);
}
);
Akka Streams
Source(1 to 1000000) .throttle(100, 1.second) // Max 100 elements/sec .buffer(100, OverflowStrategy.dropHead) .to(Sink.foreach(println)) .run()
Choosing a Strategy
| Strategy | Data Loss | Latency | Complexity | Use Case |
|---|---|---|---|---|
| Buffering | ❌ No | Low-Medium | Low | Temporary spikes |
| Load Shedding | ✅ Yes | Low | Low | Real-time metrics |
| Pull-based | ❌ No | Medium | High | Reactive systems |
| Throttling | ❌ No | High | Low | Rate-limited APIs |
| Pushback | ❌ No | Low | Medium | TCP, streaming |
Decision Tree
Can you tolerate data loss?
├─ Yes → Load Shedding (drop old/new/random)
└─ No
├─ Is producer controllable?
│ ├─ Yes → Throttling or Pull-based
│ └─ No → Buffering + Alerts
└─ Is latency critical?
├─ Yes → Small buffer + load shedding
└─ No → Large buffer + monitoring
Monitoring Backpressure
Key metrics to track:
class BackpressureMonitor:
def __init__(self):
self.buffer_size = 0
self.max_buffer = 1000
self.dropped_count = 0
self.total_processed = 0
def metrics(self):
return {
"buffer_utilization": self.buffer_size / self.max_buffer,
"drop_rate": self.dropped_count / max(self.total_processed, 1),
"is_backpressure_active": self.buffer_size > 0.8 * self.max_buffer
}
# Alert when utilization > 80%
if metrics()["buffer_utilization"] > 0.8:
alert("Backpressure warning: buffer 80% full")
Interview Tips 💡
When discussing backpressure in system design interviews:
- Identify the problem: "The tweet stream produces 10k/sec but we can only process 100/sec..."
- Discuss strategies: "We could buffer temporarily, but if this is sustained we'd need to drop old data or throttle..."
- Choose appropriate mechanism: "For real-time analytics, we'd drop oldest data. For financial transactions, we'd block the producer..."
- Mention monitoring: "We'd track buffer size and drop rate to detect backpressure early..."
- Real-world examples: "Kafka handles this with producer buffering and consumer lag tracking..."
- Trade-offs: "Buffering adds latency but preserves data. Dropping is fast but lossy..."
Related Concepts
- Message Queues — Queues provide backpressure mechanisms
- Rate Limiting — Throttling producers
- Circuit Breaker — Fail fast under overload
- Load Balancing — Distribute load across consumers
- Kafka Architecture — Production-ready backpressure system
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Caching Overview
High-speed data storage to reduce latency. The single most effective way to scale read-heavy systems.
Cache Eviction Policies
When the cache is full, something has to go. A comprehensive guide to LRU, LFU, ARC, and other replacement algorithms with implementation details.
System Design: Netflix (Video Streaming)
Delivering high-quality video to millions of users globally using CDNs, Adaptive Bitrate Streaming, and Microservices.