Relationships Are Data Too

Traditional Database:
User Table:        Friend Table:
┌────┬──────┐     ┌──────┬──────┐
│ ID │ Name │     │ User │Friend│
├────┼──────┤     ├──────┼──────┤
│ 1  │Alice │     │  1   │  2   │
│ 2  │ Bob  │     │  1   │  3   │
│ 3  │Carol │     │  2   │  3   │
└────┴──────┘     └──────┴──────┘

Query "Friends of friends":
SELECT f2.friend 
FROM friends f1
JOIN friends f2 ON f1.friend = f2.user
WHERE f1.user = 1
Time: O(N²) with JOINs

Graph Database:
(Alice)-[:FRIEND]->(Bob)-[:FRIEND]->(Carol)

Query "Friends of friends":
MATCH (alice:Person {name:'Alice'})-[:FRIEND]->()-[:FRIEND]->(fof)
RETURN fof.name
Time: O(1) per hop (index-free adjacency!)

Click to expand code...

What are Graph Databases?

Graph databases store data as nodes (entities) and edges (relationships), optimizing for relationship traversal and connected data queries.

Core Concepts

1. Nodes (Vertices)

Entities in the graph
Have properties (key-value pairs)
Can have labels (types)

Node: Person
Properties: {name: "Alice", age: 30}
Label: :Person

Click to expand code...

2. Edges (Relationships)

Connections between nodes
Have direction
Have type
Can have properties

Relationship: FRIEND
Direction: Alice -> Bob
Properties: {since: "2020-01-01"}
Type: :FRIEND

Click to expand code...

3. Properties

Key-value pairs on nodes/edges
Schema-flexible

Property Graph Model

mermaid

graph LR
    A["(Alice)<br/>:Person<br/>{age:30}"] -->|":FRIEND<br/>{since:2020}"| B["(Bob)<br/>:Person<br/>{age:28}"]
    A -->|":BOUGHT<br/>{price:99}"| C["(Laptop)<br/>:Product<br/>{brand:'Dell'}"]
    B -->|":BOUGHT"| C
    B -->|":LIVES_IN"| D["(NYC)<br/>:City"]
    A -->|":LIVES_IN"| D

Click to expand code...

Why Use Graph Databases?

The JOIN Problem

Relational Database:

sql

-- Find friends of friends of friends
SELECT u3.name
FROM users u1
JOIN friendships f1 ON u1.id = f1.user_id
JOIN users u2 ON f1.friend_id = u2.id
JOIN friendships f2 ON u2.id = f2.user_id
JOIN users u3 ON f2.friend_id = u3.id
WHERE u1.name = 'Alice';

-- Problems:
-- - 3 JOINs (slow!)
-- - Performance degrades: O(N³)
-- - Can't easily go 4, 5, 6 hops deep

Click to expand code...

Graph Database:

cypher

// Find friends of friends of friends
MATCH (alice:Person {name:'Alice'})-[:FRIEND*3]->(distant)
RETURN distant.name

// Advantages:
// - One line!
// - O(1) per hop (index-free adjacency)
// - Easy to go N hops: -[:FRIEND*N]->

Click to expand code...

Cypher Query Language (Neo4j)

Basic Queries

Create nodes:

cypher

CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (laptop:Product {brand: 'Dell', price: 999})
CREATE (nyc:City {name: 'New York'})

Click to expand code...

Create relationships:

cypher

MATCH (alice:Person {name: 'Alice'})
MATCH (bob:Person {name: 'Bob'})
CREATE (alice)-[:FRIEND {since: '2020-01-01'}]->(bob)

MATCH (alice:Person {name: 'Alice'})
MATCH (laptop:Product {brand: 'Dell'})
CREATE (alice)-[:BOUGHT {date: '2024-01-15'}]->(laptop)

Click to expand code...

Find patterns:

cypher

// Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name

// Mutual friends
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]-(friend)-[:FRIEND]-(bob:Person {name: 'Bob'})
RETURN friend.name

// People who bought same product
MATCH (p1:Person)-[:BOUGHT]->(product:Product)<-[:BOUGHT]-(p2:Person)
WHERE p1.name = 'Alice'
RETURN p2.name, product.brand

Click to expand code...

Advanced Queries

Variable-length paths:

cypher

// Friends within 3 hops
MATCH (alice:Person {name: 'Alice'})-[:FRIEND*1..3]->(distant)
RETURN DISTINCT distant.name

// Shortest path between two people
MATCH path = shortestPath(
  (alice:Person {name: 'Alice'})-[:FRIEND*]-(bob:Person {name: 'Bob'})
)
RETURN length(path), nodes(path)

Click to expand code...

Aggregation:

cypher

// Most popular products
MATCH (p:Person)-[:BOUGHT]->(product:Product)
RETURN product.brand, COUNT(p) AS buyers
ORDER BY buyers DESC
LIMIT 10

// People with most friends
MATCH (p:Person)-[:FRIEND]->(friend)
RETURN p.name, COUNT(friend) AS friend_count
ORDER BY friend_count DESC
LIMIT 10

Click to expand code...

Implementation Examples

Python (Neo4j Driver)

python

from neo4j import GraphDatabase

class SocialGraph:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def close(self):
        self.driver.close()
    
    def create_person(self, name, age):
        with self.driver.session() as session:
            result = session.run(
                "CREATE (p:Person {name: $name, age: $age}) RETURN p",
                name=name, age=age
            )
            return result.single()['p']
    
    def add_friendship(self, person1_name, person2_name, since):
        with self.driver.session() as session:
            session.run("""
                MATCH (p1:Person {name: $name1})
                MATCH (p2:Person {name: $name2})
                CREATE (p1)-[:FRIEND {since: $since}]->(p2)
                CREATE (p2)-[:FRIEND {since: $since}]->(p1)
            """, name1=person1_name, name2=person2_name, since=since)
    
    def find_friends(self, person_name):
        with self.driver.session() as session:
            result = session.run("""
                MATCH (p:Person {name: $name})-[:FRIEND]->(friend)
                RETURN friend.name AS name, friend.age AS age
            """, name=person_name)
            
            return [{"name": record["name"], "age": record["age"]} 
                    for record in result]
    
    def recommend_friends(self, person_name, limit=5):
        """Friend of friends not yet friends with"""
        with self.driver.session() as session:
            result = session.run("""
                MATCH (person:Person {name: $name})-[:FRIEND]->(friend)
                      -[:FRIEND]->(fof:Person)
                WHERE NOT (person)-[:FRIEND]-(fof)
                  AND person <> fof
                RETURN fof.name AS name, COUNT(*) AS mutual_friends
                ORDER BY mutual_friends DESC
                LIMIT $limit
            """, name=person_name, limit=limit)
            
            return [{"name": record["name"], 
                     "mutual_friends": record["mutual_friends"]} 
                    for record in result]
    
    def shortest_path(self, person1, person2):
        """Find shortest friendship path between two people"""
        with self.driver.session() as session:
            result = session.run("""
                MATCH path = shortestPath(
                  (p1:Person {name: $name1})-[:FRIEND*]-(p2:Person {name: $name2})
                )
                RETURN [node IN nodes(path) | node.name] AS path,
                       length(path) AS hops
            """, name1=person1, name2=person2)
            
            record = result.single()
            if record:
                return {
                    "path": record["path"],
                    "hops": record["hops"]
                }
            return None

# Usage
graph = SocialGraph("bolt://localhost:7687", "neo4j", "password")

graph.create_person("Alice", 30)
graph.create_person("Bob", 28)
graph.create_person("Carol", 32)

graph.add_friendship("Alice", "Bob", "2020-01-01")
graph.add_friendship("Bob", "Carol", "2021-05-15")

friends = graph.find_friends("Alice")
print(friends)  # [{'name': 'Bob', 'age': 28}]

recommendations = graph.recommend_friends("Alice")
print(recommendations)  # [{'name': 'Carol', 'mutual_friends': 1}]

path = graph.shortest_path("Alice", "Carol")
print(path)  # {'path': ['Alice', 'Bob', 'Carol'], 'hops': 2}

graph.close()

Click to expand code...

Graph Algorithms

1. PageRank (Importance)

cypher

// Find most influential people
CALL gds.pageRank.stream('socialNetwork')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10

Click to expand code...

2. Community Detection (Louvain)

cypher

// Find friend communities
CALL gds.louvain.stream('socialNetwork')
YIELD nodeId, communityId
RETURN communityId, COLLECT(gds.util.asNode(nodeId).name) AS members
ORDER BY SIZE(members) DESC

Click to expand code...

3. Centrality (Betweenness)

cypher

// Find connectors (bridge between groups)
CALL gds.betweenness.stream('socialNetwork')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10

Click to expand code...

Real-World Use Cases

1. Social Networks (LinkedIn)

cypher

// Degrees of connection
MATCH path = shortestPath(
  (you:Person {id: 'user123'})-[:CONNECTION*]-(them:Person {id: 'target456'})
)
RETURN length(path) AS degrees

// Result: "2nd degree connection"

// People you may know
MATCH (you:Person {id: 'user123'})-[:CONNECTION]->(friend)
      -[:CONNECTION]->(suggestion)
WHERE NOT (you)-[:CONNECTION]-(suggestion)
  AND suggestion.industry = you.industry
RETURN suggestion.name, COUNT(friend) AS mutual_connections
ORDER BY mutual_connections DESC
LIMIT 10

Click to expand code...

2. Fraud Detection (Banks)

cypher

// Find fraud rings
// Multiple accounts sharing devices/IPs
MATCH (account1:Account)-[:USED_DEVICE]->(device:Device)
      <-[:USED_DEVICE]-(account2:Account)
WHERE account1.flagged = true
  AND account2 <> account1
RETURN account2.id AS suspicious_account, device.id

// Money laundering circles
MATCH path = (account:Account)-[:TRANSFER*4..10]->(account)
WHERE ALL(rel IN relationships(path) WHERE rel.amount > 10000)
RETURN path

Click to expand code...

3. Recommendation Engine (Amazon)

cypher

// Product recommendations
// "Customers who bought X also bought Y"
MATCH (product:Product {id: 'product123'})<-[:BOUGHT]-(customer)
      -[:BOUGHT]->(recommendation:Product)
WHERE recommendation <> product
RETURN recommendation.name, COUNT(*) AS purchased_together
ORDER BY purchased_together DESC
LIMIT 10

// Personalized recommendations
MATCH (user:User {id: 'user456'})-[:INTERESTED_IN]->(category:Category)
      <-[:BELONGS_TO]-(product:Product)
WHERE NOT (user)-[:BOUGHT]->(product)
RETURN product.name, product.rating
ORDER BY product.rating DESC
LIMIT 20

Click to expand code...

4. Knowledge Graphs (Google)

cypher

// Entity relationships
MATCH (actor:Person {name: 'Leonardo DiCaprio'})-[:ACTED_IN]->(movie:Movie)
      -[:DIRECTED_BY]->(director:Director)
RETURN movie.title, director.name

// Multi-hop queries
MATCH (person:Person {name: 'Kevin Bacon'})-[:ACTED_IN*1..6]-(other:Person)
RETURN other.name, length(path) AS bacon_number
ORDER BY bacon_number

Click to expand code...

Performance: Index-Free Adjacency

Key advantage: Traversal is O(1) per relationship.

Traditional Database (with indexes):
- Look up User 1 in index: O(log N)
- Scan friends table: O(log M)
- For each friend, repeat: O(K * log N)
Total: O(K * log N * log M)

Graph Database:
- Node directly points to relationships in memory
- Follow pointer: O(1)
- For each friend: O(K)
Total: O(K)

Example: 3 hops in million-user graph
SQL: ~60ms
Graph: ~3ms (20x faster!)

Click to expand code...

Graph DB Comparison

Database	Type	Best For	Language
Neo4j	Native graph	OLTP, general purpose	Cypher
Amazon Neptune	Managed	AWS ecosystem	Gremlin/SPARQL
JanusGraph	Distributed	Large scale	Gremlin
ArangoDB	Multi-model	Flexible schemas	AQL
OrientDB	Multi-model	Document+Graph	SQL-like

When to Use Graph Databases

✅ Good fit:

Social networks
Recommendation engines
Fraud detection
Network/IT infrastructure
Knowledge graphs
Access control (who can access what)

❌ Poor fit:

Analytical queries on entire dataset
Simple CRUD applications
Transactional applications (ACID critical)
Time-series data
Large aggregations

Interview Tips 💡

When discussing graph databases in interviews:

Problem: "In SQL, friend-of-friend query needs 2 JOINs - expensive at scale..."
Index-free adjacency: "Nodes directly point to neighbors in memory - O(1) traversal..."
Cypher: "Pattern matching with ASCII art: (alice)-[:FRIEND]->(bob)..."
Use cases: "LinkedIn connections, fraud rings, product recommendations..."
Trade-offs: "Fast traversal but slower for aggregations across all data..."
Examples: "Neo4j most popular, Amazon Neptune for AWS..."

Related Concepts

NoSQL Databases — Database categories
Graph Algorithms — PageRank, community detection
Social Networks — Application patterns
Recommendation Systems — Collaborative filtering
Neo4j — Specific implementation

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Beginner

Content Delivery Network (CDN)

Geographically distributed network of edge servers that cache and deliver content from locations closest to users, dramatically reducing latency and improving performance, availability, and security.

PerformanceNetworkingInfrastructure

Beginner

Document Databases

The most popular type of NoSQL database. Storing data in flexible, JSON-like documents with embedded structures and dynamic schemas.

DatabaseNoSQLMongoDB

Intermediate

ACID vs BASE: Consistency Models

The two philosophies of database transaction handling: Strict guarantees (ACID) versus flexible availability (BASE). Deep dive into isolation levels, transaction anomalies, and hybrid approaches.

DatabaseTransactionsConsistency

Relationships Are Data Too

What are Graph Databases?

Core Concepts

Property Graph Model

Why Use Graph Databases?

The JOIN Problem

Cypher Query Language (Neo4j)

Basic Queries

Advanced Queries

Implementation Examples

Python (Neo4j Driver)

Graph Algorithms

1. PageRank (Importance)

2. Community Detection (Louvain)

3. Centrality (Betweenness)

Real-World Use Cases

1. Social Networks (LinkedIn)

2. Fraud Detection (Banks)

3. Recommendation Engine (Amazon)

4. Knowledge Graphs (Google)

Performance: Index-Free Adjacency

Graph DB Comparison

When to Use Graph Databases

Interview Tips 💡

Related Concepts

About ScaleWiki

Related Articles

Content Delivery Network (CDN)

Document Databases

ACID vs BASE: Consistency Models