Designing a Scalable Notification Service
Sending one email is easy. Sending 10 million push notifications in 2 minutes for "Breaking News" is an architectural challenge.
1. Requirements
Functional
- Send: Support Email (SES), SMS (Twilio), Push (FCM/APNS).
- Bulk: Support "Broadcast" to all users.
- Preferences: Don't email me if I opted out.
Non-Functional
- Reliability: Never lose a notification.
- Rate Limiting: Don't get banned by Apple/Google/Twilio for spamming.
- Latency: Breaking news must be delivered fast.
2. High-Level Architecture
We need to decouple the Producer (The service wanting to send a message) from the Consumer (The worker calling the external API).
Components
- Notification Service: The API gateway. Receives
POST /send. - Message Queue (Kafka): Buffers requests. Prevents system crash if 1M requests hit at once.
- Workers: Pull from Kafka and call external APIs (FCM/Twilio).
- Rate Limiter: Controls the speed of workers.
3. The Message Queue (Buffer)
Why Kafka?
- Buffering: If Apple's configured limit is 10k/sec, but we receive 100k/sec requests, Kafka holds the backlog.
- Topics:
topic-high-priority(OTP codes, Login alerts) -> High number of consumers.topic-low-priority(Marketing emails) -> Fewer consumers.
4. Reliability & Retry Mechanisms
External services (Twilio, FCM) fail all the time.
What if workers call Twilio and get 500 Internal Server Error?
The Retry Queue
- Worker fails to send email.
- Push the message to a Retry Queue with a delay (Exponential Backoff).
- Wait 1s, then 2s, then 4s, then 8s.
- After 5 retries, move to Dead Letter Queue (DLQ) for human inspection.
5. Deduplication
- Problem: Retries might cause duplicate emails if the failure was a network timeout (the server actually sent it, but the ACK was lost).
- Fix: Check
NotificationLogdatabase.INSERT INTO logs (id) VALUES (msg_id)- If insert fails (Duplicate Key), stop.
6. Rate Limiting (Token Bucket)
Token Bucket Algorithm
We must protect third-party quotas.
- Twilio Limit: 100 SMS/sec.
- Worker Logic:
- Before sending, Worker asks Rate Limiter (Redis): "Can I take a token?"
- If Redis says "Yes" (count < 100), proceed.
- If Redis says "No", Worker sleeps or re-queues the message.
7. Preference Service
Before sending "Marketing Email" to User A:
- Worker calls
Preference Service. - Checks: "Did User A unsubscribe from Marketing?"
- If yes, drop message silently.
Summary
- Decouple: Use Queues (Kafka/RabbitMQ) to absorb spikes.
- Sort: Prioritize OTPs over Marketing.
- Protect: Rate limit your workers to respect external API limits.
- Retry: Use exponential backoff for resilience.
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Circuit Breaker Pattern
A mechanism to prevent an application from repeatedly trying to execute an operation that's likely to fail.
System Design: Payment System (Stripe/PayPal)
How to design a financial system that never loses money. Topics include Idempotency, Double-Entry Ledgers, and Reconciliation.
System Design: Uber (Ride Sharing)
A breakdown of the geospatial architecture behind Uber. Validating QuadTrees, Google S2/H3, and handling millions of location updates per second.