Back to All Concepts
DatabaseData ModelingArchitectureBeginner

SQL vs NoSQL: The Database Guide

Complete database selection guide covering relational vs non-relational systems, ACID vs BASE, when to use PostgreSQL vs MongoDB vs Redis vs Neo4j, and production decision frameworks from Netflix, Uber, and Instagram choosing databases for specific use cases.

The Most Important Decision

Every system starts with: "How do we store data?"

Wrong choice:
  - Slow queries
  - Expensive rewrites
  - Scalability bottlenecks
  - Engineering frustration

Right choice:
  - Natural data model
  - Fast queries
  - Easy scaling
  - Happy team
Click to expand code...

The Two Worlds

1. Relational (SQL)

Examples: PostgreSQL, MySQL, Oracle, SQL Server

Philosophy: Structure first, flexibility later
Data model: Tables with rows/columns
Schema: Strict (defined upfront)
Relationships: Foreign keys + JOINs
Query language: SQL
Scaling: Primarily vertical
Consistency: ACID guarantees
Click to expand code...

2. Non-Relational (NoSQL)

Examples: MongoDB, Cassandra, Redis, Neo4j

Philosophy: Flexibility first, structure optional
Data model: Varies (documents, key-value, graphs, columns)
Schema: Dynamic or schema-less
Relationships: Embedded or denormalized
Query language: Varies by database
Scaling: Primarily horizontal
Consistency: Eventually consistent (BASE)
Click to expand code...

SQL Databases In-Depth

Strengths

ACID Transactions

sql
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Guaranteed: Both happen or neither happens
Click to expand code...

Complex Queries

sql
-- Find top customers who bought products in multiple categories
SELECT 
    c.name,
    COUNT(DISTINCT p.category) AS categories_purchased,
    SUM(o.total) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
GROUP BY c.id
HAVING COUNT(DISTINCT p.category) >= 3
ORDER BY total_spent DESC
LIMIT 10;
Click to expand code...

Data Integrity

sql
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    age INTEGER CHECK (age >= 18),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Database enforces:
-- - Unique emails
-- - Non-null emails
-- - Age >= 18
-- - Auto-generated IDs
Click to expand code...

Weaknesses

Rigid Schema

sql
-- Adding a column requires migration
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Affects entire table (millions of rows!)
-- Requires downtime or careful migration
Click to expand code...

Vertical Scaling Limits

Single server limitations:
  - Max CPU: 128 cores
  - Max RAM: 4TB
  - Max storage: ~100TB
  - Cost: $$$$$

Beyond this: Must shard manually (complex!)
Click to expand code...

JOIN Performance

sql
-- 5-way JOIN on large tables = slow
SELECT *
FROM table1
JOIN table2 ON ...
JOIN table3 ON ...
JOIN table4 ON ...
JOIN table5 ON ...
WHERE ...;

-- Time: can be seconds on large datasets
Click to expand code...

NoSQL Families

A. Document Stores (MongoDB, CouchDB)

Model: JSON-like documents

javascript
// User document
{
  "_id": "user123",
  "name": "Alice",
  "email": "alice@example.com",
  "address": {
    "street": "123 Main St",
    "city": "NYC"
  },
  "orders": [
    { "id": "order1", "total": 99.99, "date": "2024-01-01" },
    { "id": "order2", "total": 149.99, "date": "2024-01-15" }
  ]
}

// Query
db.users.find({ "address.city": "NYC", "orders.total": { $gt: 100 } })
Click to expand code...

When to use:

  • Flexible schemas
  • Nested/hierarchical data
  • Rapid prototyping
  • Content management

Example: Blog platform with varying article structures

B. Key-Value Stores (Redis, DynamoDB)

Model: Simple key → value mapping

python
# Redis
redis.set("user:123:name", "Alice")
redis.set("user:123:email", "alice@example.com")

# Retrieve
name = redis.get("user:123:name")

# Atomic operations
redis.incr("page:views")
redis.expire("session:abc", 3600)  # TTL
Click to expand code...

When to use:

  • Caching
  • Session storage
  • Real-time counters
  • Simple lookups
  • Sub-millisecond latency required

Example: Session store for web app

C. Wide-Column Stores (Cassandra, HBase)

Model: Rows with dynamic columns

Column Family: user_events
Row Key: user_123

Column: event_2024-01-01-10:00:00 → "login"
Column: event_2024-01-01-10:05:23 → "click_button"
Column: event_2024-01-01-10:10:15 → "purchase"
...
Column: event_2024-01-15-09:30:00 → "logout"

// Can have billions of columns per row!
Click to expand code...

When to use:

  • Time-series data
  • Massive write throughput
  • Log aggregation
  • IoT sensor data

Example: Netflix viewing history (billions of events)

D. Graph Databases (Neo4j, Neptune)

Model: Nodes + relationships

cypher
// Social network
(Alice:Person)-[:FRIEND]->(Bob:Person)-[:FRIEND]->(Carol:Person)
(Alice)-[:BOUGHT]->(Product:Laptop)
(Bob)-[:BOUGHT]->(Product)

// Query: Friends who bought same product
MATCH (me:Person)-[:FRIEND]-(friend)-[:BOUGHT]->(product)
WHERE me.name = 'Alice'
  AND (me)-[:BOUGHT]-> (product)
RETURN friend.name, product.name
Click to expand code...

When to use:

  • Social networks
  • Recommendation engines
  • Fraud detection
  • Knowledge graphs

Example: LinkedIn "People you may know"


ACID vs BASE

ACID (SQL)

Atomicity:   All or nothing
Consistency: Valid state always
Isolation:   Transactions don't interfere
Durability:  Committed = persisted

Example:
  Bank transfer: $100 from A to B
  Either both happen or neither
  Database never shows inconsistent state
Click to expand code...

BASE (NoSQL)

Basically Available:   System responds even if some nodes down
Soft state:            State may change over time
Eventually consistent: Reads may be stale, but will converge

Example:
  Twitter follower count
  Might show 1,000 on one server, 1,001 on another
  Eventually both show 1,001
  Acceptable trade-off for speed
Click to expand code...

Decision Framework

Choose SQL when:

Structured data with clear relationships

E-commerce:
  - Users, Orders, Products, Inventory
  - Clear foreign keys
  - Need JOINs (order → order_items → products)
Click to expand code...

Data integrity is critical

Banking, Healthcare, Payroll:
  - Cannot lose transactions
  - Must maintain constraints
  - ACID guarantees required
Click to expand code...

Complex queries required

Analytics, Reporting:
  - Multi-table aggregations
  - Ad-hoc queries
  - Business intelligence
Click to expand code...

Moderate scale

 < 10M active rows
- Single datacenter
- Vertical scaling acceptable
Click to expand code...

Choose NoSQL when:

Flexible/evolving schema

Content platforms, catalogs:
  - Product attributes vary
  - User profiles with custom fields
  - Rapid feature development
Click to expand code...

Massive scale required

Social media, IoT, logs:
  - Billions of records
  - Petabytes of data
  - Horizontal scaling needed
Click to expand code...

High write throughput

Time-series, events, metrics:
  - 100k+ writes/second
  - Append-only workload
  - Eventual consistency OK
Click to expand code...

Geographic distribution

Global apps:
  - Low-latency worldwide
  - Multi-region replication
  - Partition tolerance critical
Click to expand code...

Specific access patterns

Key-value: Cache, sessions (Redis)
Graph: Social networks (Neo4j)
Document: CMS, catalogs (MongoDB)
Wide-column: Time-series (Cassandra)
Click to expand code...

Real-World Examples

1. Instagram (PostgreSQL → Cassandra)

Phase 1 (Early): PostgreSQL
  - Simple schema (users, photos, likes)
  - ACID transactions
  - <1M users

Phase 2 (Growth): Still PostgreSQL
  - Vertical scaling to largest machines
  - Read replicas
  - ~10M users

Phase 3 (Scale): Cassandra
  - Billions of photos
  - Petabytes of data
  - Global distribution
  - Horizontal scaling
  - Eventual consistency acceptable
Click to expand code...

2. Uber (MySQL + Redis + Cassandra)

MySQL:
  - Driver/rider accounts
  - Trip transactions
  - Payment processing
  (ACID critical)

Redis:
  - Driver location cache
  - Surge pricing calculations
  - Real-time matching
  (Speed critical)

Cassandra:
  - Trip history
  - Location logs
  - Analytics events
  (Scale + writes critical)
Click to expand code...

3. Netflix (Multiple Databases)

MySQL:
  - User accounts
  - Billing
  - Subscription management

Cassandra:
  - Viewing history (billions of events)
  - Recommendations pre-computations
  - A/B test results

EVCache (Memcached):
  - Movie metadata
  - User preferences
  - API response caching

Elasticsearch:
  - Search and discovery
  - Content recommendations
Click to expand code...

Polyglot Persistence

Modern approach: Use multiple databases for different needs.

mermaid
graph TB
    App[Application]
    
    App -->|User Auth| PG[PostgreSQL<br/>ACID transactions]
    App -->|Caching| Redis[Redis<br/>Sub-ms reads]
    App -->|Events| Kafka[Kafka<br/>Stream processing]
    App -->|Analytics| Cassandra[Cassandra<br/>Time-series]
    App -->|Search| ES[Elasticsearch<br/>Full-text search]
    App -->|Recommendations| Neo4j[Neo4j<br/>Graph queries]
Click to expand code...

Example: E-commerce platform

PostgreSQL:
  - Orders, payments, inventory
  - ACID for financial accuracy

Redis:
  - Shopping cart (session data)
  - Product catalog cache
  - Real-time inventory count

Elasticsearch:
  - Product search
  - Faceted navigation
  - Autocomplete

Cassandra:
  - User click streams
  - Product view history
  - Recommendation training data

Neo4j:
  - Product recommendations
  - "Customers also bought..."
  - Fraud detection rules
Click to expand code...

Migration Considerations

SQL → NoSQL

Reasons:

  • Outgrown vertical scaling
  • Need horizontal scaling
  • Schema too rigid
  • Geographic distribution required

Challenges:

1. No JOINs → Denormalize data
2. No transactions → Application-level consistency
3. Different query patterns → Rethink access patterns
4. Data migration → Dual-write period
Click to expand code...

Strategy:

1. Identify bounded contexts
2. Migrate one service/table at a time
3. Use dual-write during transition
4. Validate data consistency
5. Gradual traffic shift
Click to expand code...

NoSQL → SQL

Reasons:

  • Need complex queries
  • ACID transactions required
  • Data model stabilized
  • JOIN performance needed

Example: Early startup with MongoDB

Phase 1: MongoDB (rapid prototyping)
Phase 2: PostgreSQL (after PMF, stable schema)

Reason: Complex reporting needs emerged
Click to expand code...

Performance Comparison

Scenario: 1M user profiles

SQL (PostgreSQL):
  - Read: ~1ms (indexed)
  - Write: ~2ms (ACID overhead)
  - JOIN (3 tables): ~10-50ms
  - Analytics query: ~100ms-1s
  - Max throughput: ~10k writes/sec (single node)

NoSQL (MongoDB):
  - Read: ~0.5ms (embedded docs, no JOINs)
  - Write: ~0.8ms (eventual consistency)
  - JOIN equivalent: N/A (denormalized)
  - Analytics query: ~1-10s (MapReduce)
  - Max throughput: ~100k writes/sec (sharded)

Key-Value (Redis):
  - Read: ~0.1ms (in-memory)
  - Write: ~0.1ms (in-memory)
  - JOIN equivalent: N/A
  - Analytics query: N/A (not designed for this)
  - Max throughput: ~1M ops/sec
Click to expand code...

Interview Tips 💡

When discussing SQL vs NoSQL in interviews:

  1. It's not either/or: "Modern apps often use both - PostgreSQL for transactions, Redis for cache, Cassandra for events..."

  2. Start with use case: "For banking? SQL (ACID critical). For social feed? NoSQL (scale critical)..."

  3. Mention trade-offs: "SQL gives ACID but limited horizontal scaling. NoSQL scales but eventual consistency..."

  4. Real examples: "Instagram started PostgreSQL, migrated to Cassandra at scale. Uber uses both..."

  5. Schema evolution: "SQL: rigid schema, migrations. NoSQL: flexible, but must handle multiple versions in code..."

  6. Query patterns: "SQL excels at ad-hoc queries. NoSQL requires known access patterns upfront..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles