Relationships Are Data Too
Traditional Database:
User Table: Friend Table:
┌────┬──────┐ ┌──────┬──────┐
│ ID │ Name │ │ User │Friend│
├────┼──────┤ ├──────┼──────┤
│ 1 │Alice │ │ 1 │ 2 │
│ 2 │ Bob │ │ 1 │ 3 │
│ 3 │Carol │ │ 2 │ 3 │
└────┴──────┘ └──────┴──────┘
Query "Friends of friends":
SELECT f2.friend
FROM friends f1
JOIN friends f2 ON f1.friend = f2.user
WHERE f1.user = 1
Time: O(N²) with JOINs
Graph Database:
(Alice)-[:FRIEND]->(Bob)-[:FRIEND]->(Carol)
Query "Friends of friends":
MATCH (alice:Person {name:'Alice'})-[:FRIEND]->()-[:FRIEND]->(fof)
RETURN fof.name
Time: O(1) per hop (index-free adjacency!)
What are Graph Databases?
Graph databases store data as nodes (entities) and edges (relationships), optimizing for relationship traversal and connected data queries.
Core Concepts
1. Nodes (Vertices)
- Entities in the graph
- Have properties (key-value pairs)
- Can have labels (types)
Node: Person
Properties: {name: "Alice", age: 30}
Label: :Person
2. Edges (Relationships)
- Connections between nodes
- Have direction
- Have type
- Can have properties
Relationship: FRIEND
Direction: Alice -> Bob
Properties: {since: "2020-01-01"}
Type: :FRIEND
3. Properties
- Key-value pairs on nodes/edges
- Schema-flexible
Property Graph Model
graph LR
A["(Alice)<br/>:Person<br/>{age:30}"] -->|":FRIEND<br/>{since:2020}"| B["(Bob)<br/>:Person<br/>{age:28}"]
A -->|":BOUGHT<br/>{price:99}"| C["(Laptop)<br/>:Product<br/>{brand:'Dell'}"]
B -->|":BOUGHT"| C
B -->|":LIVES_IN"| D["(NYC)<br/>:City"]
A -->|":LIVES_IN"| D
Why Use Graph Databases?
The JOIN Problem
Relational Database:
-- Find friends of friends of friends SELECT u3.name FROM users u1 JOIN friendships f1 ON u1.id = f1.user_id JOIN users u2 ON f1.friend_id = u2.id JOIN friendships f2 ON u2.id = f2.user_id JOIN users u3 ON f2.friend_id = u3.id WHERE u1.name = 'Alice'; -- Problems: -- - 3 JOINs (slow!) -- - Performance degrades: O(N³) -- - Can't easily go 4, 5, 6 hops deep
Graph Database:
// Find friends of friends of friends
MATCH (alice:Person {name:'Alice'})-[:FRIEND*3]->(distant)
RETURN distant.name
// Advantages:
// - One line!
// - O(1) per hop (index-free adjacency)
// - Easy to go N hops: -[:FRIEND*N]->
Cypher Query Language (Neo4j)
Basic Queries
Create nodes:
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (laptop:Product {brand: 'Dell', price: 999})
CREATE (nyc:City {name: 'New York'})
Create relationships:
MATCH (alice:Person {name: 'Alice'})
MATCH (bob:Person {name: 'Bob'})
CREATE (alice)-[:FRIEND {since: '2020-01-01'}]->(bob)
MATCH (alice:Person {name: 'Alice'})
MATCH (laptop:Product {brand: 'Dell'})
CREATE (alice)-[:BOUGHT {date: '2024-01-15'}]->(laptop)
Find patterns:
// Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name
// Mutual friends
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]-(friend)-[:FRIEND]-(bob:Person {name: 'Bob'})
RETURN friend.name
// People who bought same product
MATCH (p1:Person)-[:BOUGHT]->(product:Product)<-[:BOUGHT]-(p2:Person)
WHERE p1.name = 'Alice'
RETURN p2.name, product.brand
Advanced Queries
Variable-length paths:
// Friends within 3 hops
MATCH (alice:Person {name: 'Alice'})-[:FRIEND*1..3]->(distant)
RETURN DISTINCT distant.name
// Shortest path between two people
MATCH path = shortestPath(
(alice:Person {name: 'Alice'})-[:FRIEND*]-(bob:Person {name: 'Bob'})
)
RETURN length(path), nodes(path)
Aggregation:
// Most popular products MATCH (p:Person)-[:BOUGHT]->(product:Product) RETURN product.brand, COUNT(p) AS buyers ORDER BY buyers DESC LIMIT 10 // People with most friends MATCH (p:Person)-[:FRIEND]->(friend) RETURN p.name, COUNT(friend) AS friend_count ORDER BY friend_count DESC LIMIT 10
Implementation Examples
Python (Neo4j Driver)
from neo4j import GraphDatabase
class SocialGraph:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def create_person(self, name, age):
with self.driver.session() as session:
result = session.run(
"CREATE (p:Person {name: $name, age: $age}) RETURN p",
name=name, age=age
)
return result.single()['p']
def add_friendship(self, person1_name, person2_name, since):
with self.driver.session() as session:
session.run("""
MATCH (p1:Person {name: $name1})
MATCH (p2:Person {name: $name2})
CREATE (p1)-[:FRIEND {since: $since}]->(p2)
CREATE (p2)-[:FRIEND {since: $since}]->(p1)
""", name1=person1_name, name2=person2_name, since=since)
def find_friends(self, person_name):
with self.driver.session() as session:
result = session.run("""
MATCH (p:Person {name: $name})-[:FRIEND]->(friend)
RETURN friend.name AS name, friend.age AS age
""", name=person_name)
return [{"name": record["name"], "age": record["age"]}
for record in result]
def recommend_friends(self, person_name, limit=5):
"""Friend of friends not yet friends with"""
with self.driver.session() as session:
result = session.run("""
MATCH (person:Person {name: $name})-[:FRIEND]->(friend)
-[:FRIEND]->(fof:Person)
WHERE NOT (person)-[:FRIEND]-(fof)
AND person <> fof
RETURN fof.name AS name, COUNT(*) AS mutual_friends
ORDER BY mutual_friends DESC
LIMIT $limit
""", name=person_name, limit=limit)
return [{"name": record["name"],
"mutual_friends": record["mutual_friends"]}
for record in result]
def shortest_path(self, person1, person2):
"""Find shortest friendship path between two people"""
with self.driver.session() as session:
result = session.run("""
MATCH path = shortestPath(
(p1:Person {name: $name1})-[:FRIEND*]-(p2:Person {name: $name2})
)
RETURN [node IN nodes(path) | node.name] AS path,
length(path) AS hops
""", name1=person1, name2=person2)
record = result.single()
if record:
return {
"path": record["path"],
"hops": record["hops"]
}
return None
# Usage
graph = SocialGraph("bolt://localhost:7687", "neo4j", "password")
graph.create_person("Alice", 30)
graph.create_person("Bob", 28)
graph.create_person("Carol", 32)
graph.add_friendship("Alice", "Bob", "2020-01-01")
graph.add_friendship("Bob", "Carol", "2021-05-15")
friends = graph.find_friends("Alice")
print(friends) # [{'name': 'Bob', 'age': 28}]
recommendations = graph.recommend_friends("Alice")
print(recommendations) # [{'name': 'Carol', 'mutual_friends': 1}]
path = graph.shortest_path("Alice", "Carol")
print(path) # {'path': ['Alice', 'Bob', 'Carol'], 'hops': 2}
graph.close()
Graph Algorithms
1. PageRank (Importance)
// Find most influential people
CALL gds.pageRank.stream('socialNetwork')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10
2. Community Detection (Louvain)
// Find friend communities
CALL gds.louvain.stream('socialNetwork')
YIELD nodeId, communityId
RETURN communityId, COLLECT(gds.util.asNode(nodeId).name) AS members
ORDER BY SIZE(members) DESC
3. Centrality (Betweenness)
// Find connectors (bridge between groups)
CALL gds.betweenness.stream('socialNetwork')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10
Real-World Use Cases
1. Social Networks (LinkedIn)
// Degrees of connection
MATCH path = shortestPath(
(you:Person {id: 'user123'})-[:CONNECTION*]-(them:Person {id: 'target456'})
)
RETURN length(path) AS degrees
// Result: "2nd degree connection"
// People you may know
MATCH (you:Person {id: 'user123'})-[:CONNECTION]->(friend)
-[:CONNECTION]->(suggestion)
WHERE NOT (you)-[:CONNECTION]-(suggestion)
AND suggestion.industry = you.industry
RETURN suggestion.name, COUNT(friend) AS mutual_connections
ORDER BY mutual_connections DESC
LIMIT 10
2. Fraud Detection (Banks)
// Find fraud rings
// Multiple accounts sharing devices/IPs
MATCH (account1:Account)-[:USED_DEVICE]->(device:Device)
<-[:USED_DEVICE]-(account2:Account)
WHERE account1.flagged = true
AND account2 <> account1
RETURN account2.id AS suspicious_account, device.id
// Money laundering circles
MATCH path = (account:Account)-[:TRANSFER*4..10]->(account)
WHERE ALL(rel IN relationships(path) WHERE rel.amount > 10000)
RETURN path
3. Recommendation Engine (Amazon)
// Product recommendations
// "Customers who bought X also bought Y"
MATCH (product:Product {id: 'product123'})<-[:BOUGHT]-(customer)
-[:BOUGHT]->(recommendation:Product)
WHERE recommendation <> product
RETURN recommendation.name, COUNT(*) AS purchased_together
ORDER BY purchased_together DESC
LIMIT 10
// Personalized recommendations
MATCH (user:User {id: 'user456'})-[:INTERESTED_IN]->(category:Category)
<-[:BELONGS_TO]-(product:Product)
WHERE NOT (user)-[:BOUGHT]->(product)
RETURN product.name, product.rating
ORDER BY product.rating DESC
LIMIT 20
4. Knowledge Graphs (Google)
// Entity relationships
MATCH (actor:Person {name: 'Leonardo DiCaprio'})-[:ACTED_IN]->(movie:Movie)
-[:DIRECTED_BY]->(director:Director)
RETURN movie.title, director.name
// Multi-hop queries
MATCH (person:Person {name: 'Kevin Bacon'})-[:ACTED_IN*1..6]-(other:Person)
RETURN other.name, length(path) AS bacon_number
ORDER BY bacon_number
Performance: Index-Free Adjacency
Key advantage: Traversal is O(1) per relationship.
Traditional Database (with indexes): - Look up User 1 in index: O(log N) - Scan friends table: O(log M) - For each friend, repeat: O(K * log N) Total: O(K * log N * log M) Graph Database: - Node directly points to relationships in memory - Follow pointer: O(1) - For each friend: O(K) Total: O(K) Example: 3 hops in million-user graph SQL: ~60ms Graph: ~3ms (20x faster!)
Graph DB Comparison
| Database | Type | Best For | Language |
|---|---|---|---|
| Neo4j | Native graph | OLTP, general purpose | Cypher |
| Amazon Neptune | Managed | AWS ecosystem | Gremlin/SPARQL |
| JanusGraph | Distributed | Large scale | Gremlin |
| ArangoDB | Multi-model | Flexible schemas | AQL |
| OrientDB | Multi-model | Document+Graph | SQL-like |
When to Use Graph Databases
✅ Good fit:
- Social networks
- Recommendation engines
- Fraud detection
- Network/IT infrastructure
- Knowledge graphs
- Access control (who can access what)
❌ Poor fit:
- Analytical queries on entire dataset
- Simple CRUD applications
- Transactional applications (ACID critical)
- Time-series data
- Large aggregations
Interview Tips 💡
When discussing graph databases in interviews:
- Problem: "In SQL, friend-of-friend query needs 2 JOINs - expensive at scale..."
- Index-free adjacency: "Nodes directly point to neighbors in memory - O(1) traversal..."
- Cypher: "Pattern matching with ASCII art:
(alice)-[:FRIEND]->(bob)..." - Use cases: "LinkedIn connections, fraud rings, product recommendations..."
- Trade-offs: "Fast traversal but slower for aggregations across all data..."
- Examples: "Neo4j most popular, Amazon Neptune for AWS..."
Related Concepts
- NoSQL Databases — Database categories
- Graph Algorithms — PageRank, community detection
- Social Networks — Application patterns
- Recommendation Systems — Collaborative filtering
- Neo4j — Specific implementation
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Content Delivery Network (CDN)
Geographically distributed network of edge servers that cache and deliver content from locations closest to users, dramatically reducing latency and improving performance, availability, and security.
Document Databases
The most popular type of NoSQL database. Storing data in flexible, JSON-like documents with embedded structures and dynamic schemas.
ACID vs BASE: Consistency Models
The two philosophies of database transaction handling: Strict guarantees (ACID) versus flexible availability (BASE). Deep dive into isolation levels, transaction anomalies, and hybrid approaches.