Back to All Concepts
DatabaseAvailabilityDistributed SystemsIntermediate

Database Replication

The process of copying and maintaining database objects in multiple databases to improve reliability, fault-tolerance, and accessibility.

What is Database Replication?

Database Replication is the practice of keeping copies of data on multiple database servers. When your primary database receives a write, that change is propagated to one or more replica databases.

Why Replicate?

GoalHow Replication Helps
High AvailabilityIf primary fails, promote a replica
Read ScalabilityDistribute read queries across replicas
Geographic DistributionPlace replicas near users globally
Disaster RecoveryBackup in different data center
Analytics IsolationRun heavy queries on replica without affecting production

Replication Architectures

1. Leader-Follower (Master-Slave)

One leader accepts writes; followers replicate from leader and serve reads.

Writes  → [Leader] 
                ↓ replication
         [Follower 1] ← Reads
         [Follower 2] ← Reads
         [Follower 3] ← Reads
Click to expand code...

Pros: Simple, widely supported, strong consistency for writes Cons: Write bottleneck on single leader, failover complexity Used by: MySQL, PostgreSQL, MongoDB, Redis

2. Leader-Leader (Multi-Master)

Multiple nodes accept writes. Changes sync bidirectionally.

Writes → [Leader A] ←→ [Leader B] ← Writes
              ↓              ↓
         [Follower]    [Follower]
Click to expand code...

Pros: No single point of failure for writes, geographic write distribution Cons: Conflict resolution is hard, eventual consistency Used by: CockroachDB, Galera Cluster, Amazon Aurora Multi-Master

3. Leaderless (Dynamo-style)

No leader. Any node accepts writes. Quorum determines success.

Client writes to N nodes
If W nodes acknowledge → Success
Client reads from N nodes  
If R nodes respond → Return most recent (determined by version)

W + R > N ensures you read your writes
Click to expand code...

Pros: Highly available, no leader election needed Cons: Complex conflict resolution, eventual consistency Used by: Cassandra, DynamoDB, Riak

Synchronous vs. Asynchronous Replication

Synchronous Replication

Leader waits for followers to confirm before acknowledging write.

Write → Leader → [waits for confirmation] → Follower
                         ↓
                     Client gets OK
Click to expand code...

Pros: Strong consistency, data safer Cons: Slower writes, follower failure blocks writes

Asynchronous Replication

Leader acknowledges immediately, replicates in background.

Write → Leader → Client gets OK immediately
           ↓
    [background] → Follower (may lag)
Click to expand code...

Pros: Fast writes, follower issues don't block Cons: Replication lag, can lose data on leader failure

Semi-Synchronous

Wait for at least one follower to confirm.

Write → Leader → [wait for 1 follower] → Client OK
                        ↓
              Other followers async
Click to expand code...

Best of both worlds: durability without severe latency hit.

Replication Lag

In async replication, followers may be behind the leader. This is replication lag.

Problems from Replication Lag

Read-your-writes inconsistency

1. User writes profile update to Leader
2. User reads from Follower (hasn't replicated yet)
3. User sees old data! 😱
Click to expand code...

Solutions:

  • Read from leader for recently written data
  • Sticky sessions to same replica
  • Client tracks last write timestamp

Monotonic reads violation

1. User reads from Follower A (has new data)
2. User reads from Follower B (further behind!)
3. User sees data go backward in time! 😱
Click to expand code...

Solution: Sticky sessions or read from single replica per user session

Handling Failover

When the leader fails, you need to promote a follower.

Manual Failover

Operator manually promotes a follower. Safe but slow.

Automatic Failover

System detects leader failure and promotes automatically.

Challenges:

  1. Split brain: Two nodes think they're leader
  2. Data loss: If async replication, promoted node may be behind
  3. Detection accuracy: Is leader actually dead or just slow?

Solutions:

  • Consensus algorithms (Raft, Paxos)
  • Fencing (prevent old leader from accepting writes)
  • Require quorum for failover decision

Conflict Resolution (Multi-Master/Leaderless)

When multiple nodes accept writes simultaneously, conflicts occur.

Common Strategies

Last Write Wins (LWW) Most recent write (by timestamp) wins.

Node A at T1: user.name = "Alice"
Node B at T2: user.name = "Bob"
Result: user.name = "Bob" (T2 > T1)
Click to expand code...

⚠️ Can lose writes silently

Merge Values For certain data types, merge is possible.

Node A: cart += "apple"
Node B: cart += "banana"  
Result: cart = ["apple", "banana"]
Click to expand code...

Works for: Sets, counters, CRDTs

Application-Level Resolution Present both versions to user/application to resolve.

Version Vectors Track causality to detect true conflicts vs. sequential edits.

Replication Topologies

Star (Hub and Spoke)

        [Leader]
       /   |   \
    [F1] [F2] [F3]
Click to expand code...

Simple, but leader is bottleneck.

Circular

[A] → [B] → [C] → [A]
Click to expand code...

Each node replicates to next. Failure breaks chain.

All-to-All

[A] ↔ [B]
 ↑ ↘ ↗ ↓
   [C]
Click to expand code...

Most fault-tolerant, but complex.

Real-World Examples

Amazon RDS

Offers synchronous Multi-AZ replication (automatic failover) and async read replicas (up to 15).

PostgreSQL

Built-in streaming replication. Supports sync, async, and logical replication.

MongoDB

Replica sets with automatic leader election. Secondaries can serve reads.

Redis

Master-replica replication. Redis Sentinel provides automatic failover. Redis Cluster adds sharding.

Replication + Sharding

In practice, you often combine both:

Shard 1: [Leader] → [Replica 1] → [Replica 2]
Shard 2: [Leader] → [Replica 1] → [Replica 2]
Shard 3: [Leader] → [Replica 1] → [Replica 2]
Click to expand code...

Each shard handles a portion of data, replicated for availability.

Interview Tips 💡

When discussing replication in system design:

  1. Identify the goal: "We need replication for high availability and read scaling..."
  2. Choose architecture: "We'd use leader-follower with async replication..."
  3. Address lag: "For read-your-writes consistency, we'd route recent writes to the leader..."
  4. Failover strategy: "We'd use automatic failover with Raft consensus to avoid split-brain..."
  5. Combine with sharding: "For large scale, we'd shard the data and replicate each shard..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles