What is Database Replication?
Database Replication is the practice of keeping copies of data on multiple database servers. When your primary database receives a write, that change is propagated to one or more replica databases.
Why Replicate?
| Goal | How Replication Helps |
|---|---|
| High Availability | If primary fails, promote a replica |
| Read Scalability | Distribute read queries across replicas |
| Geographic Distribution | Place replicas near users globally |
| Disaster Recovery | Backup in different data center |
| Analytics Isolation | Run heavy queries on replica without affecting production |
Replication Architectures
1. Leader-Follower (Master-Slave)
One leader accepts writes; followers replicate from leader and serve reads.
Writes → [Leader]
↓ replication
[Follower 1] ← Reads
[Follower 2] ← Reads
[Follower 3] ← Reads
Pros: Simple, widely supported, strong consistency for writes Cons: Write bottleneck on single leader, failover complexity Used by: MySQL, PostgreSQL, MongoDB, Redis
2. Leader-Leader (Multi-Master)
Multiple nodes accept writes. Changes sync bidirectionally.
Writes → [Leader A] ←→ [Leader B] ← Writes
↓ ↓
[Follower] [Follower]
Pros: No single point of failure for writes, geographic write distribution Cons: Conflict resolution is hard, eventual consistency Used by: CockroachDB, Galera Cluster, Amazon Aurora Multi-Master
3. Leaderless (Dynamo-style)
No leader. Any node accepts writes. Quorum determines success.
Client writes to N nodes If W nodes acknowledge → Success Client reads from N nodes If R nodes respond → Return most recent (determined by version) W + R > N ensures you read your writes
Pros: Highly available, no leader election needed Cons: Complex conflict resolution, eventual consistency Used by: Cassandra, DynamoDB, Riak
Synchronous vs. Asynchronous Replication
Synchronous Replication
Leader waits for followers to confirm before acknowledging write.
Write → Leader → [waits for confirmation] → Follower
↓
Client gets OK
Pros: Strong consistency, data safer Cons: Slower writes, follower failure blocks writes
Asynchronous Replication
Leader acknowledges immediately, replicates in background.
Write → Leader → Client gets OK immediately
↓
[background] → Follower (may lag)
Pros: Fast writes, follower issues don't block Cons: Replication lag, can lose data on leader failure
Semi-Synchronous
Wait for at least one follower to confirm.
Write → Leader → [wait for 1 follower] → Client OK
↓
Other followers async
Best of both worlds: durability without severe latency hit.
Replication Lag
In async replication, followers may be behind the leader. This is replication lag.
Problems from Replication Lag
Read-your-writes inconsistency
1. User writes profile update to Leader 2. User reads from Follower (hasn't replicated yet) 3. User sees old data! 😱
Solutions:
- Read from leader for recently written data
- Sticky sessions to same replica
- Client tracks last write timestamp
Monotonic reads violation
1. User reads from Follower A (has new data) 2. User reads from Follower B (further behind!) 3. User sees data go backward in time! 😱
Solution: Sticky sessions or read from single replica per user session
Handling Failover
When the leader fails, you need to promote a follower.
Manual Failover
Operator manually promotes a follower. Safe but slow.
Automatic Failover
System detects leader failure and promotes automatically.
Challenges:
- Split brain: Two nodes think they're leader
- Data loss: If async replication, promoted node may be behind
- Detection accuracy: Is leader actually dead or just slow?
Solutions:
- Consensus algorithms (Raft, Paxos)
- Fencing (prevent old leader from accepting writes)
- Require quorum for failover decision
Conflict Resolution (Multi-Master/Leaderless)
When multiple nodes accept writes simultaneously, conflicts occur.
Common Strategies
Last Write Wins (LWW) Most recent write (by timestamp) wins.
Node A at T1: user.name = "Alice" Node B at T2: user.name = "Bob" Result: user.name = "Bob" (T2 > T1)
⚠️ Can lose writes silently
Merge Values For certain data types, merge is possible.
Node A: cart += "apple" Node B: cart += "banana" Result: cart = ["apple", "banana"]
Works for: Sets, counters, CRDTs
Application-Level Resolution Present both versions to user/application to resolve.
Version Vectors Track causality to detect true conflicts vs. sequential edits.
Replication Topologies
Star (Hub and Spoke)
[Leader]
/ | \
[F1] [F2] [F3]
Simple, but leader is bottleneck.
Circular
[A] → [B] → [C] → [A]
Each node replicates to next. Failure breaks chain.
All-to-All
[A] ↔ [B] ↑ ↘ ↗ ↓ [C]
Most fault-tolerant, but complex.
Real-World Examples
Amazon RDS
Offers synchronous Multi-AZ replication (automatic failover) and async read replicas (up to 15).
PostgreSQL
Built-in streaming replication. Supports sync, async, and logical replication.
MongoDB
Replica sets with automatic leader election. Secondaries can serve reads.
Redis
Master-replica replication. Redis Sentinel provides automatic failover. Redis Cluster adds sharding.
Replication + Sharding
In practice, you often combine both:
Shard 1: [Leader] → [Replica 1] → [Replica 2] Shard 2: [Leader] → [Replica 1] → [Replica 2] Shard 3: [Leader] → [Replica 1] → [Replica 2]
Each shard handles a portion of data, replicated for availability.
Interview Tips 💡
When discussing replication in system design:
- Identify the goal: "We need replication for high availability and read scaling..."
- Choose architecture: "We'd use leader-follower with async replication..."
- Address lag: "For read-your-writes consistency, we'd route recent writes to the leader..."
- Failover strategy: "We'd use automatic failover with Raft consensus to avoid split-brain..."
- Combine with sharding: "For large scale, we'd shard the data and replicate each shard..."
Related Concepts
- CAP Theorem — Trade-offs in replicated systems
- Database Sharding — Horizontal partitioning + replication
- Raft Consensus — Leader election for replication
- CRDTs — Conflict-free replicated data types
- Leader Election — Choosing a new leader programmatically
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
ACID vs BASE: Consistency Models
The two philosophies of database transaction handling: Strict guarantees (ACID) versus flexible availability (BASE). Deep dive into isolation levels, transaction anomalies, and hybrid approaches.
Consistent Hashing
How to add/remove servers without moving every single key. The Ring, Virtual Nodes, and real-world usage in Cassandra, DynamoDB, and Discord.
CRDTs (Real-time Collaboration)
Conflict-free Replicated Data Types enable distributed systems to achieve eventual consistency without coordination, powering Google Docs, Figma, and collaborative editing through mathematically proven merge algorithms.