Why Cache?
In any distributed system, the database is often the bottleneck. Disk I/O is slow (10-100ms). Caching solves this by keeping frequently accessed data in memory (RAM), which is orders of magnitude faster (1-5ms).
The Latency Hierarchy:
- L1 Cache: ~1 ns
- RAM (Memory): ~100 ns
- SSD (Disk): ~100 µs (1,000x slower)
- Network Call: ~50 ms (500,000x slower)
The Core Concepts
When implementing a cache, you must make three major architectural decisions.
1. Where does the cache live?
- Client Side: Browser cache, iOS local storage. Saves network calls entirely.
- CDN: Content Delivery Network. Geographic caching for static assets (images, CSS).
- Server Side: Redis/Memcached. sitting in front of your database.
2. How do we write to it? (Strategies)
Do you write to the cache first? Or the database? Where is the source of truth?
- Should you use Cache-Aside (safest)?
- Or Write-Back (fastest)?
- Read the Deep Dive on Caching Strategies →
3. What do we delete? (Eviction)
When your 16GB Redis instance is full, what gets thrown out?
- Do you delete the Oldest stuff?
- Or the Least Frequently Used stuff?
- Read the Deep Dive on Eviction Policies (LRU/LFU) →
Common Pitfalls
The "Stale Data" Problem
Caching introduces a new problem: Consistency. If you update a User in the DB, but the Cache still shows the old version, the user sees wrong data.
"There are only two hard things in Computer Science: cache invalidation and naming things."
Cache Stampede (Dog-piling)
If a popular key (e.g., "homepage_news") expires, 10,000 users might hit the database simultaneously to regenerate it.
- Solution: Locking (Mutex) or probabilistic early expiration.
Popular Caching Technologies
| Technology | Type | Best For |
|---|---|---|
| Redis | In-memory key-value store | Sessions, real-time features, pub/sub |
| Memcached | Simple in-memory cache | High-throughput simple caching |
| CDN (CloudFlare, Akamai) | Edge caching | Static assets, global distribution |
| Varnish | HTTP accelerator | Web page caching |
| Browser Cache | Client-side | Static assets, reducing server load |
Real-World Examples
1. Netflix
Netflix caches movie metadata, user preferences, and thumbnails at multiple layers: CDN edge, regional data centers, and in-memory on servers.
2. Twitter Timeline
Twitter's timeline is pre-computed and cached using Redis. When you tweet, it's fanned out to followers' cached timelines.
3. E-commerce Product Pages
Amazon caches product details, reviews, and pricing. Inventory is read-through to ensure accuracy.
Interview Tips 💡
When discussing caching in system design interviews:
- Identify read vs. write ratio: Caching benefits read-heavy systems most.
- Estimate cache size: How much data will you cache? Can it fit in RAM?
- Discuss invalidation strategy: How will you keep cache fresh?
- Consider cache stampede: What happens when many requests hit an expired key simultaneously? (Solution: lock and single-fetch, or staggered TTLs)
Related Concepts
- Redis Internals — Deep dive into how Redis works
- CDN — Edge caching for global content delivery
- Database Replication — Another strategy for read scaling
- Rate Limiting — Often used alongside caching
- Consistent Hashing — Used to distribute cache keys across nodes
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Backpressure
Flow control mechanism that prevents fast producers from overwhelming slow consumers by signaling when to slow down, pause, or drop data in streaming systems.
Bloom Filters
Space-efficient probabilistic data structure for membership testing that allows false positives but guarantees no false negatives, using minimal memory compared to hash sets.
Cache Eviction Policies
When the cache is full, something has to go. A comprehensive guide to LRU, LFU, ARC, and other replacement algorithms with implementation details.