What is Database Replication?

Database Replication is the practice of keeping copies of data on multiple database servers. When your primary database receives a write, that change is propagated to one or more replica databases.

Why Replicate?

Goal	How Replication Helps
High Availability	If primary fails, promote a replica
Read Scalability	Distribute read queries across replicas
Geographic Distribution	Place replicas near users globally
Disaster Recovery	Backup in different data center
Analytics Isolation	Run heavy queries on replica without affecting production

Replication Architectures

1. Leader-Follower (Master-Slave)

One leader accepts writes; followers replicate from leader and serve reads.

Writes  → [Leader] 
                ↓ replication
         [Follower 1] ← Reads
         [Follower 2] ← Reads
         [Follower 3] ← Reads

Click to expand code...

Pros: Simple, widely supported, strong consistency for writes Cons: Write bottleneck on single leader, failover complexity Used by: MySQL, PostgreSQL, MongoDB, Redis

2. Leader-Leader (Multi-Master)

Multiple nodes accept writes. Changes sync bidirectionally.

Writes → [Leader A] ←→ [Leader B] ← Writes
              ↓              ↓
         [Follower]    [Follower]

Click to expand code...

Pros: No single point of failure for writes, geographic write distribution Cons: Conflict resolution is hard, eventual consistency Used by: CockroachDB, Galera Cluster, Amazon Aurora Multi-Master

3. Leaderless (Dynamo-style)

No leader. Any node accepts writes. Quorum determines success.

Client writes to N nodes
If W nodes acknowledge → Success
Client reads from N nodes  
If R nodes respond → Return most recent (determined by version)

W + R > N ensures you read your writes

Click to expand code...

Pros: Highly available, no leader election needed Cons: Complex conflict resolution, eventual consistency Used by: Cassandra, DynamoDB, Riak

Synchronous vs. Asynchronous Replication

Synchronous Replication

Leader waits for followers to confirm before acknowledging write.

Write → Leader → [waits for confirmation] → Follower
                         ↓
                     Client gets OK

Click to expand code...

Pros: Strong consistency, data safer Cons: Slower writes, follower failure blocks writes

Asynchronous Replication

Leader acknowledges immediately, replicates in background.

Write → Leader → Client gets OK immediately
           ↓
    [background] → Follower (may lag)

Click to expand code...

Pros: Fast writes, follower issues don't block Cons: Replication lag, can lose data on leader failure

Semi-Synchronous

Wait for at least one follower to confirm.

Write → Leader → [wait for 1 follower] → Client OK
                        ↓
              Other followers async

Click to expand code...

Best of both worlds: durability without severe latency hit.

Replication Lag

In async replication, followers may be behind the leader. This is replication lag.

Problems from Replication Lag

Read-your-writes inconsistency

1. User writes profile update to Leader
2. User reads from Follower (hasn't replicated yet)
3. User sees old data! 😱

Click to expand code...

Solutions:

Read from leader for recently written data
Sticky sessions to same replica
Client tracks last write timestamp

Monotonic reads violation

1. User reads from Follower A (has new data)
2. User reads from Follower B (further behind!)
3. User sees data go backward in time! 😱

Click to expand code...

Solution: Sticky sessions or read from single replica per user session

Handling Failover

When the leader fails, you need to promote a follower.

Manual Failover

Operator manually promotes a follower. Safe but slow.

Automatic Failover

System detects leader failure and promotes automatically.

Challenges:

Split brain: Two nodes think they're leader
Data loss: If async replication, promoted node may be behind
Detection accuracy: Is leader actually dead or just slow?

Solutions:

Consensus algorithms (Raft, Paxos)
Fencing (prevent old leader from accepting writes)
Require quorum for failover decision

Conflict Resolution (Multi-Master/Leaderless)

When multiple nodes accept writes simultaneously, conflicts occur.

Common Strategies

Last Write Wins (LWW) Most recent write (by timestamp) wins.

Node A at T1: user.name = "Alice"
Node B at T2: user.name = "Bob"
Result: user.name = "Bob" (T2 > T1)

Click to expand code...

⚠️ Can lose writes silently

Merge Values For certain data types, merge is possible.

Node A: cart += "apple"
Node B: cart += "banana"  
Result: cart = ["apple", "banana"]

Click to expand code...

Works for: Sets, counters, CRDTs

Application-Level Resolution Present both versions to user/application to resolve.

Version Vectors Track causality to detect true conflicts vs. sequential edits.

Replication Topologies

Star (Hub and Spoke)

        [Leader]
       /   |   \
    [F1] [F2] [F3]

Click to expand code...

Simple, but leader is bottleneck.

Circular

[A] → [B] → [C] → [A]

Click to expand code...

Each node replicates to next. Failure breaks chain.

All-to-All

[A] ↔ [B]
 ↑ ↘ ↗ ↓
   [C]

Click to expand code...

Most fault-tolerant, but complex.

Real-World Examples

Amazon RDS

Offers synchronous Multi-AZ replication (automatic failover) and async read replicas (up to 15).

PostgreSQL

Built-in streaming replication. Supports sync, async, and logical replication.

MongoDB

Replica sets with automatic leader election. Secondaries can serve reads.

Redis

Master-replica replication. Redis Sentinel provides automatic failover. Redis Cluster adds sharding.

Replication + Sharding

In practice, you often combine both:

Shard 1: [Leader] → [Replica 1] → [Replica 2]
Shard 2: [Leader] → [Replica 1] → [Replica 2]
Shard 3: [Leader] → [Replica 1] → [Replica 2]

Click to expand code...

Each shard handles a portion of data, replicated for availability.

Interview Tips 💡

When discussing replication in system design:

Identify the goal: "We need replication for high availability and read scaling..."
Choose architecture: "We'd use leader-follower with async replication..."
Address lag: "For read-your-writes consistency, we'd route recent writes to the leader..."
Failover strategy: "We'd use automatic failover with Raft consensus to avoid split-brain..."
Combine with sharding: "For large scale, we'd shard the data and replicate each shard..."

Related Concepts

CAP Theorem — Trade-offs in replicated systems
Database Sharding — Horizontal partitioning + replication
Raft Consensus — Leader election for replication
CRDTs — Conflict-free replicated data types
Leader Election — Choosing a new leader programmatically

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Intermediate

ACID vs BASE: Consistency Models

The two philosophies of database transaction handling: Strict guarantees (ACID) versus flexible availability (BASE). Deep dive into isolation levels, transaction anomalies, and hybrid approaches.

DatabaseTransactionsConsistency

Advanced

Consistent Hashing

How to add/remove servers without moving every single key. The Ring, Virtual Nodes, and real-world usage in Cassandra, DynamoDB, and Discord.

DatabasesDistributed SystemsAlgorithms

Expert

CRDTs (Real-time Collaboration)

Conflict-free Replicated Data Types enable distributed systems to achieve eventual consistency without coordination, powering Google Docs, Figma, and collaborative editing through mathematically proven merge algorithms.

AlgorithmsDistributed SystemsCollaboration

What is Database Replication?

Why Replicate?

Replication Architectures

1. Leader-Follower (Master-Slave)

2. Leader-Leader (Multi-Master)

3. Leaderless (Dynamo-style)

Synchronous vs. Asynchronous Replication

Synchronous Replication

Asynchronous Replication

Semi-Synchronous

Replication Lag

Problems from Replication Lag

Handling Failover

Manual Failover

Automatic Failover

Conflict Resolution (Multi-Master/Leaderless)

Common Strategies

Replication Topologies

Star (Hub and Spoke)

Circular

All-to-All

Real-World Examples

Amazon RDS

PostgreSQL

MongoDB

Redis

Replication + Sharding

Interview Tips 💡

Related Concepts

About ScaleWiki

Related Articles

ACID vs BASE: Consistency Models

Consistent Hashing

CRDTs (Real-time Collaboration)