The Most Important Decision

Every system starts with: "How do we store data?"

Wrong choice:
  - Slow queries
  - Expensive rewrites
  - Scalability bottlenecks
  - Engineering frustration

Right choice:
  - Natural data model
  - Fast queries
  - Easy scaling
  - Happy team

Click to expand code...

The Two Worlds

1. Relational (SQL)

Examples: PostgreSQL, MySQL, Oracle, SQL Server

Philosophy: Structure first, flexibility later
Data model: Tables with rows/columns
Schema: Strict (defined upfront)
Relationships: Foreign keys + JOINs
Query language: SQL
Scaling: Primarily vertical
Consistency: ACID guarantees

Click to expand code...

2. Non-Relational (NoSQL)

Examples: MongoDB, Cassandra, Redis, Neo4j

Philosophy: Flexibility first, structure optional
Data model: Varies (documents, key-value, graphs, columns)
Schema: Dynamic or schema-less
Relationships: Embedded or denormalized
Query language: Varies by database
Scaling: Primarily horizontal
Consistency: Eventually consistent (BASE)

Click to expand code...

SQL Databases In-Depth

Strengths

✅ ACID Transactions

sql

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Guaranteed: Both happen or neither happens

Click to expand code...

✅ Complex Queries

sql

-- Find top customers who bought products in multiple categories
SELECT 
    c.name,
    COUNT(DISTINCT p.category) AS categories_purchased,
    SUM(o.total) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
GROUP BY c.id
HAVING COUNT(DISTINCT p.category) >= 3
ORDER BY total_spent DESC
LIMIT 10;

Click to expand code...

✅ Data Integrity

sql

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    age INTEGER CHECK (age >= 18),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Database enforces:
-- - Unique emails
-- - Non-null emails
-- - Age >= 18
-- - Auto-generated IDs

Click to expand code...

Weaknesses

❌ Rigid Schema

sql

-- Adding a column requires migration
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Affects entire table (millions of rows!)
-- Requires downtime or careful migration

Click to expand code...

❌ Vertical Scaling Limits

Single server limitations:
  - Max CPU: 128 cores
  - Max RAM: 4TB
  - Max storage: ~100TB
  - Cost: $$$$$

Beyond this: Must shard manually (complex!)

Click to expand code...

❌ JOIN Performance

sql

-- 5-way JOIN on large tables = slow
SELECT *
FROM table1
JOIN table2 ON ...
JOIN table3 ON ...
JOIN table4 ON ...
JOIN table5 ON ...
WHERE ...;

-- Time: can be seconds on large datasets

Click to expand code...

NoSQL Families

A. Document Stores (MongoDB, CouchDB)

Model: JSON-like documents

javascript

// User document
{
  "_id": "user123",
  "name": "Alice",
  "email": "alice@example.com",
  "address": {
    "street": "123 Main St",
    "city": "NYC"
  },
  "orders": [
    { "id": "order1", "total": 99.99, "date": "2024-01-01" },
    { "id": "order2", "total": 149.99, "date": "2024-01-15" }
  ]
}

// Query
db.users.find({ "address.city": "NYC", "orders.total": { $gt: 100 } })

Click to expand code...

When to use:

Flexible schemas
Nested/hierarchical data
Rapid prototyping
Content management

Example: Blog platform with varying article structures

B. Key-Value Stores (Redis, DynamoDB)

Model: Simple key → value mapping

python

# Redis
redis.set("user:123:name", "Alice")
redis.set("user:123:email", "alice@example.com")

# Retrieve
name = redis.get("user:123:name")

# Atomic operations
redis.incr("page:views")
redis.expire("session:abc", 3600)  # TTL

Click to expand code...

When to use:

Caching
Session storage
Real-time counters
Simple lookups
Sub-millisecond latency required

Example: Session store for web app

C. Wide-Column Stores (Cassandra, HBase)

Model: Rows with dynamic columns

Column Family: user_events
Row Key: user_123

Column: event_2024-01-01-10:00:00 → "login"
Column: event_2024-01-01-10:05:23 → "click_button"
Column: event_2024-01-01-10:10:15 → "purchase"
...
Column: event_2024-01-15-09:30:00 → "logout"

// Can have billions of columns per row!

Click to expand code...

When to use:

Time-series data
Massive write throughput
Log aggregation
IoT sensor data

Example: Netflix viewing history (billions of events)

D. Graph Databases (Neo4j, Neptune)

Model: Nodes + relationships

cypher

// Social network
(Alice:Person)-[:FRIEND]->(Bob:Person)-[:FRIEND]->(Carol:Person)
(Alice)-[:BOUGHT]->(Product:Laptop)
(Bob)-[:BOUGHT]->(Product)

// Query: Friends who bought same product
MATCH (me:Person)-[:FRIEND]-(friend)-[:BOUGHT]->(product)
WHERE me.name = 'Alice'
  AND (me)-[:BOUGHT]-> (product)
RETURN friend.name, product.name

Click to expand code...

When to use:

Social networks
Recommendation engines
Fraud detection
Knowledge graphs

Example: LinkedIn "People you may know"

ACID vs BASE

ACID (SQL)

Atomicity:   All or nothing
Consistency: Valid state always
Isolation:   Transactions don't interfere
Durability:  Committed = persisted

Example:
  Bank transfer: $100 from A to B
  Either both happen or neither
  Database never shows inconsistent state

Click to expand code...

BASE (NoSQL)

Basically Available:   System responds even if some nodes down
Soft state:            State may change over time
Eventually consistent: Reads may be stale, but will converge

Example:
  Twitter follower count
  Might show 1,000 on one server, 1,001 on another
  Eventually both show 1,001
  Acceptable trade-off for speed

Click to expand code...

Decision Framework

Choose SQL when:

✅ Structured data with clear relationships

E-commerce:
  - Users, Orders, Products, Inventory
  - Clear foreign keys
  - Need JOINs (order → order_items → products)

Click to expand code...

✅ Data integrity is critical

Banking, Healthcare, Payroll:
  - Cannot lose transactions
  - Must maintain constraints
  - ACID guarantees required

Click to expand code...

✅ Complex queries required

Analytics, Reporting:
  - Multi-table aggregations
  - Ad-hoc queries
  - Business intelligence

Click to expand code...

✅ Moderate scale

 < 10M active rows
- Single datacenter
- Vertical scaling acceptable

Click to expand code...

Choose NoSQL when:

✅ Flexible/evolving schema

Content platforms, catalogs:
  - Product attributes vary
  - User profiles with custom fields
  - Rapid feature development

Click to expand code...

✅ Massive scale required

Social media, IoT, logs:
  - Billions of records
  - Petabytes of data
  - Horizontal scaling needed

Click to expand code...

✅ High write throughput

Time-series, events, metrics:
  - 100k+ writes/second
  - Append-only workload
  - Eventual consistency OK

Click to expand code...

✅ Geographic distribution

Global apps:
  - Low-latency worldwide
  - Multi-region replication
  - Partition tolerance critical

Click to expand code...

✅ Specific access patterns

Key-value: Cache, sessions (Redis)
Graph: Social networks (Neo4j)
Document: CMS, catalogs (MongoDB)
Wide-column: Time-series (Cassandra)

Click to expand code...

Real-World Examples

1. Instagram (PostgreSQL → Cassandra)

Phase 1 (Early): PostgreSQL
  - Simple schema (users, photos, likes)
  - ACID transactions
  - <1M users

Phase 2 (Growth): Still PostgreSQL
  - Vertical scaling to largest machines
  - Read replicas
  - ~10M users

Phase 3 (Scale): Cassandra
  - Billions of photos
  - Petabytes of data
  - Global distribution
  - Horizontal scaling
  - Eventual consistency acceptable

Click to expand code...

2. Uber (MySQL + Redis + Cassandra)

MySQL:
  - Driver/rider accounts
  - Trip transactions
  - Payment processing
  (ACID critical)

Redis:
  - Driver location cache
  - Surge pricing calculations
  - Real-time matching
  (Speed critical)

Cassandra:
  - Trip history
  - Location logs
  - Analytics events
  (Scale + writes critical)

Click to expand code...

3. Netflix (Multiple Databases)

MySQL:
  - User accounts
  - Billing
  - Subscription management

Cassandra:
  - Viewing history (billions of events)
  - Recommendations pre-computations
  - A/B test results

EVCache (Memcached):
  - Movie metadata
  - User preferences
  - API response caching

Elasticsearch:
  - Search and discovery
  - Content recommendations

Click to expand code...

Polyglot Persistence

Modern approach: Use multiple databases for different needs.

mermaid

graph TB
    App[Application]
    
    App -->|User Auth| PG[PostgreSQL<br/>ACID transactions]
    App -->|Caching| Redis[Redis<br/>Sub-ms reads]
    App -->|Events| Kafka[Kafka<br/>Stream processing]
    App -->|Analytics| Cassandra[Cassandra<br/>Time-series]
    App -->|Search| ES[Elasticsearch<br/>Full-text search]
    App -->|Recommendations| Neo4j[Neo4j<br/>Graph queries]

Click to expand code...

Example: E-commerce platform

PostgreSQL:
  - Orders, payments, inventory
  - ACID for financial accuracy

Redis:
  - Shopping cart (session data)
  - Product catalog cache
  - Real-time inventory count

Elasticsearch:
  - Product search
  - Faceted navigation
  - Autocomplete

Cassandra:
  - User click streams
  - Product view history
  - Recommendation training data

Neo4j:
  - Product recommendations
  - "Customers also bought..."
  - Fraud detection rules

Click to expand code...

Migration Considerations

SQL → NoSQL

Reasons:

Outgrown vertical scaling
Need horizontal scaling
Schema too rigid
Geographic distribution required

Challenges:

1. No JOINs → Denormalize data
2. No transactions → Application-level consistency
3. Different query patterns → Rethink access patterns
4. Data migration → Dual-write period

Click to expand code...

Strategy:

1. Identify bounded contexts
2. Migrate one service/table at a time
3. Use dual-write during transition
4. Validate data consistency
5. Gradual traffic shift

Click to expand code...

NoSQL → SQL

Reasons:

Need complex queries
ACID transactions required
Data model stabilized
JOIN performance needed

Example: Early startup with MongoDB

Phase 1: MongoDB (rapid prototyping)
Phase 2: PostgreSQL (after PMF, stable schema)

Reason: Complex reporting needs emerged

Click to expand code...

Performance Comparison

Scenario: 1M user profiles

SQL (PostgreSQL):
  - Read: ~1ms (indexed)
  - Write: ~2ms (ACID overhead)
  - JOIN (3 tables): ~10-50ms
  - Analytics query: ~100ms-1s
  - Max throughput: ~10k writes/sec (single node)

NoSQL (MongoDB):
  - Read: ~0.5ms (embedded docs, no JOINs)
  - Write: ~0.8ms (eventual consistency)
  - JOIN equivalent: N/A (denormalized)
  - Analytics query: ~1-10s (MapReduce)
  - Max throughput: ~100k writes/sec (sharded)

Key-Value (Redis):
  - Read: ~0.1ms (in-memory)
  - Write: ~0.1ms (in-memory)
  - JOIN equivalent: N/A
  - Analytics query: N/A (not designed for this)
  - Max throughput: ~1M ops/sec

Click to expand code...

Interview Tips 💡

When discussing SQL vs NoSQL in interviews:

It's not either/or: "Modern apps often use both - PostgreSQL for transactions, Redis for cache, Cassandra for events..."
Start with use case: "For banking? SQL (ACID critical). For social feed? NoSQL (scale critical)..."
Mention trade-offs: "SQL gives ACID but limited horizontal scaling. NoSQL scales but eventual consistency..."
Real examples: "Instagram started PostgreSQL, migrated to Cassandra at scale. Uber uses both..."
Schema evolution: "SQL: rigid schema, migrations. NoSQL: flexible, but must handle multiple versions in code..."
Query patterns: "SQL excels at ad-hoc queries. NoSQL requires known access patterns upfront..."

Related Concepts

ACID vs BASE — Consistency models
CAP Theorem — Distributed system trade-offs
Database Sharding — Horizontal partitioning
Replication — Data redundancy
Indexing — Query optimization

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Intermediate

ACID vs BASE: Consistency Models

The two philosophies of database transaction handling: Strict guarantees (ACID) versus flexible availability (BASE). Deep dive into isolation levels, transaction anomalies, and hybrid approaches.

DatabaseTransactionsConsistency

Intermediate

Database Replication

The process of copying and maintaining database objects in multiple databases to improve reliability, fault-tolerance, and accessibility.

DatabaseAvailabilityDistributed Systems

Advanced

Database Sharding

How to split a massive database across multiple servers. Horizontal scaling strategies, challenges (Joins, ACID), and real-world algorithms used by Instagram, Vitess, and CockroachDB.

DatabasesScalabilitySystem Design

SQL vs NoSQL: The Database Guide

The Most Important Decision

The Two Worlds

1. Relational (SQL)

2. Non-Relational (NoSQL)

SQL Databases In-Depth

Strengths

Weaknesses

NoSQL Families

A. Document Stores (MongoDB, CouchDB)

B. Key-Value Stores (Redis, DynamoDB)

C. Wide-Column Stores (Cassandra, HBase)

D. Graph Databases (Neo4j, Neptune)

ACID vs BASE

ACID (SQL)

BASE (NoSQL)

Decision Framework

Choose SQL when:

Choose NoSQL when:

Real-World Examples

1. Instagram (PostgreSQL → Cassandra)

2. Uber (MySQL + Redis + Cassandra)

3. Netflix (Multiple Databases)

Polyglot Persistence

Migration Considerations

SQL → NoSQL

NoSQL → SQL

Performance Comparison

Interview Tips 💡

Related Concepts

About ScaleWiki

Related Articles

ACID vs BASE: Consistency Models

Database Replication

Database Sharding