Back to All Concepts
DatabaseNoSQLGraphIntermediate

Graph Databases

Complete guide to graph databases treating relationships as first-class citizens, covering property graphs, Cypher query language, graph algorithms, and production implementations in Neo4j, Amazon Neptune powering social networks, fraud detection, and recommendation systems.

Relationships Are Data Too

Traditional Database:
User Table:        Friend Table:
┌────┬──────┐     ┌──────┬──────┐
│ ID │ Name │     │ User │Friend│
├────┼──────┤     ├──────┼──────┤
│ 1  │Alice │     │  1   │  2   │
│ 2  │ Bob  │     │  1   │  3   │
│ 3  │Carol │     │  2   │  3   │
└────┴──────┘     └──────┴──────┘

Query "Friends of friends":
SELECT f2.friend 
FROM friends f1
JOIN friends f2 ON f1.friend = f2.user
WHERE f1.user = 1
Time: O(N²) with JOINs

Graph Database:
(Alice)-[:FRIEND]->(Bob)-[:FRIEND]->(Carol)

Query "Friends of friends":
MATCH (alice:Person {name:'Alice'})-[:FRIEND]->()-[:FRIEND]->(fof)
RETURN fof.name
Time: O(1) per hop (index-free adjacency!)
Click to expand code...

What are Graph Databases?

Graph databases store data as nodes (entities) and edges (relationships), optimizing for relationship traversal and connected data queries.

Core Concepts

1. Nodes (Vertices)

  • Entities in the graph
  • Have properties (key-value pairs)
  • Can have labels (types)
Node: Person
Properties: {name: "Alice", age: 30}
Label: :Person
Click to expand code...

2. Edges (Relationships)

  • Connections between nodes
  • Have direction
  • Have type
  • Can have properties
Relationship: FRIEND
Direction: Alice -> Bob
Properties: {since: "2020-01-01"}
Type: :FRIEND
Click to expand code...

3. Properties

  • Key-value pairs on nodes/edges
  • Schema-flexible

Property Graph Model

mermaid
graph LR
    A["(Alice)<br/>:Person<br/>{age:30}"] -->|":FRIEND<br/>{since:2020}"| B["(Bob)<br/>:Person<br/>{age:28}"]
    A -->|":BOUGHT<br/>{price:99}"| C["(Laptop)<br/>:Product<br/>{brand:'Dell'}"]
    B -->|":BOUGHT"| C
    B -->|":LIVES_IN"| D["(NYC)<br/>:City"]
    A -->|":LIVES_IN"| D
Click to expand code...

Why Use Graph Databases?

The JOIN Problem

Relational Database:

sql
-- Find friends of friends of friends
SELECT u3.name
FROM users u1
JOIN friendships f1 ON u1.id = f1.user_id
JOIN users u2 ON f1.friend_id = u2.id
JOIN friendships f2 ON u2.id = f2.user_id
JOIN users u3 ON f2.friend_id = u3.id
WHERE u1.name = 'Alice';

-- Problems:
-- - 3 JOINs (slow!)
-- - Performance degrades: O(N³)
-- - Can't easily go 4, 5, 6 hops deep
Click to expand code...

Graph Database:

cypher
// Find friends of friends of friends
MATCH (alice:Person {name:'Alice'})-[:FRIEND*3]->(distant)
RETURN distant.name

// Advantages:
// - One line!
// - O(1) per hop (index-free adjacency)
// - Easy to go N hops: -[:FRIEND*N]->
Click to expand code...

Cypher Query Language (Neo4j)

Basic Queries

Create nodes:

cypher
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (laptop:Product {brand: 'Dell', price: 999})
CREATE (nyc:City {name: 'New York'})
Click to expand code...

Create relationships:

cypher
MATCH (alice:Person {name: 'Alice'})
MATCH (bob:Person {name: 'Bob'})
CREATE (alice)-[:FRIEND {since: '2020-01-01'}]->(bob)

MATCH (alice:Person {name: 'Alice'})
MATCH (laptop:Product {brand: 'Dell'})
CREATE (alice)-[:BOUGHT {date: '2024-01-15'}]->(laptop)
Click to expand code...

Find patterns:

cypher
// Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name

// Mutual friends
MATCH (alice:Person {name: 'Alice'})-[:FRIEND]-(friend)-[:FRIEND]-(bob:Person {name: 'Bob'})
RETURN friend.name

// People who bought same product
MATCH (p1:Person)-[:BOUGHT]->(product:Product)<-[:BOUGHT]-(p2:Person)
WHERE p1.name = 'Alice'
RETURN p2.name, product.brand
Click to expand code...

Advanced Queries

Variable-length paths:

cypher
// Friends within 3 hops
MATCH (alice:Person {name: 'Alice'})-[:FRIEND*1..3]->(distant)
RETURN DISTINCT distant.name

// Shortest path between two people
MATCH path = shortestPath(
  (alice:Person {name: 'Alice'})-[:FRIEND*]-(bob:Person {name: 'Bob'})
)
RETURN length(path), nodes(path)
Click to expand code...

Aggregation:

cypher
// Most popular products
MATCH (p:Person)-[:BOUGHT]->(product:Product)
RETURN product.brand, COUNT(p) AS buyers
ORDER BY buyers DESC
LIMIT 10

// People with most friends
MATCH (p:Person)-[:FRIEND]->(friend)
RETURN p.name, COUNT(friend) AS friend_count
ORDER BY friend_count DESC
LIMIT 10
Click to expand code...

Implementation Examples

Python (Neo4j Driver)

python
from neo4j import GraphDatabase

class SocialGraph:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def close(self):
        self.driver.close()
    
    def create_person(self, name, age):
        with self.driver.session() as session:
            result = session.run(
                "CREATE (p:Person {name: $name, age: $age}) RETURN p",
                name=name, age=age
            )
            return result.single()['p']
    
    def add_friendship(self, person1_name, person2_name, since):
        with self.driver.session() as session:
            session.run("""
                MATCH (p1:Person {name: $name1})
                MATCH (p2:Person {name: $name2})
                CREATE (p1)-[:FRIEND {since: $since}]->(p2)
                CREATE (p2)-[:FRIEND {since: $since}]->(p1)
            """, name1=person1_name, name2=person2_name, since=since)
    
    def find_friends(self, person_name):
        with self.driver.session() as session:
            result = session.run("""
                MATCH (p:Person {name: $name})-[:FRIEND]->(friend)
                RETURN friend.name AS name, friend.age AS age
            """, name=person_name)
            
            return [{"name": record["name"], "age": record["age"]} 
                    for record in result]
    
    def recommend_friends(self, person_name, limit=5):
        """Friend of friends not yet friends with"""
        with self.driver.session() as session:
            result = session.run("""
                MATCH (person:Person {name: $name})-[:FRIEND]->(friend)
                      -[:FRIEND]->(fof:Person)
                WHERE NOT (person)-[:FRIEND]-(fof)
                  AND person <> fof
                RETURN fof.name AS name, COUNT(*) AS mutual_friends
                ORDER BY mutual_friends DESC
                LIMIT $limit
            """, name=person_name, limit=limit)
            
            return [{"name": record["name"], 
                     "mutual_friends": record["mutual_friends"]} 
                    for record in result]
    
    def shortest_path(self, person1, person2):
        """Find shortest friendship path between two people"""
        with self.driver.session() as session:
            result = session.run("""
                MATCH path = shortestPath(
                  (p1:Person {name: $name1})-[:FRIEND*]-(p2:Person {name: $name2})
                )
                RETURN [node IN nodes(path) | node.name] AS path,
                       length(path) AS hops
            """, name1=person1, name2=person2)
            
            record = result.single()
            if record:
                return {
                    "path": record["path"],
                    "hops": record["hops"]
                }
            return None

# Usage
graph = SocialGraph("bolt://localhost:7687", "neo4j", "password")

graph.create_person("Alice", 30)
graph.create_person("Bob", 28)
graph.create_person("Carol", 32)

graph.add_friendship("Alice", "Bob", "2020-01-01")
graph.add_friendship("Bob", "Carol", "2021-05-15")

friends = graph.find_friends("Alice")
print(friends)  # [{'name': 'Bob', 'age': 28}]

recommendations = graph.recommend_friends("Alice")
print(recommendations)  # [{'name': 'Carol', 'mutual_friends': 1}]

path = graph.shortest_path("Alice", "Carol")
print(path)  # {'path': ['Alice', 'Bob', 'Carol'], 'hops': 2}

graph.close()
Click to expand code...

Graph Algorithms

1. PageRank (Importance)

cypher
// Find most influential people
CALL gds.pageRank.stream('socialNetwork')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10
Click to expand code...

2. Community Detection (Louvain)

cypher
// Find friend communities
CALL gds.louvain.stream('socialNetwork')
YIELD nodeId, communityId
RETURN communityId, COLLECT(gds.util.asNode(nodeId).name) AS members
ORDER BY SIZE(members) DESC
Click to expand code...

3. Centrality (Betweenness)

cypher
// Find connectors (bridge between groups)
CALL gds.betweenness.stream('socialNetwork')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10
Click to expand code...

Real-World Use Cases

1. Social Networks (LinkedIn)

cypher
// Degrees of connection
MATCH path = shortestPath(
  (you:Person {id: 'user123'})-[:CONNECTION*]-(them:Person {id: 'target456'})
)
RETURN length(path) AS degrees

// Result: "2nd degree connection"

// People you may know
MATCH (you:Person {id: 'user123'})-[:CONNECTION]->(friend)
      -[:CONNECTION]->(suggestion)
WHERE NOT (you)-[:CONNECTION]-(suggestion)
  AND suggestion.industry = you.industry
RETURN suggestion.name, COUNT(friend) AS mutual_connections
ORDER BY mutual_connections DESC
LIMIT 10
Click to expand code...

2. Fraud Detection (Banks)

cypher
// Find fraud rings
// Multiple accounts sharing devices/IPs
MATCH (account1:Account)-[:USED_DEVICE]->(device:Device)
      <-[:USED_DEVICE]-(account2:Account)
WHERE account1.flagged = true
  AND account2 <> account1
RETURN account2.id AS suspicious_account, device.id

// Money laundering circles
MATCH path = (account:Account)-[:TRANSFER*4..10]->(account)
WHERE ALL(rel IN relationships(path) WHERE rel.amount > 10000)
RETURN path
Click to expand code...

3. Recommendation Engine (Amazon)

cypher
// Product recommendations
// "Customers who bought X also bought Y"
MATCH (product:Product {id: 'product123'})<-[:BOUGHT]-(customer)
      -[:BOUGHT]->(recommendation:Product)
WHERE recommendation <> product
RETURN recommendation.name, COUNT(*) AS purchased_together
ORDER BY purchased_together DESC
LIMIT 10

// Personalized recommendations
MATCH (user:User {id: 'user456'})-[:INTERESTED_IN]->(category:Category)
      <-[:BELONGS_TO]-(product:Product)
WHERE NOT (user)-[:BOUGHT]->(product)
RETURN product.name, product.rating
ORDER BY product.rating DESC
LIMIT 20
Click to expand code...

4. Knowledge Graphs (Google)

cypher
// Entity relationships
MATCH (actor:Person {name: 'Leonardo DiCaprio'})-[:ACTED_IN]->(movie:Movie)
      -[:DIRECTED_BY]->(director:Director)
RETURN movie.title, director.name

// Multi-hop queries
MATCH (person:Person {name: 'Kevin Bacon'})-[:ACTED_IN*1..6]-(other:Person)
RETURN other.name, length(path) AS bacon_number
ORDER BY bacon_number
Click to expand code...

Performance: Index-Free Adjacency

Key advantage: Traversal is O(1) per relationship.

Traditional Database (with indexes):
- Look up User 1 in index: O(log N)
- Scan friends table: O(log M)
- For each friend, repeat: O(K * log N)
Total: O(K * log N * log M)

Graph Database:
- Node directly points to relationships in memory
- Follow pointer: O(1)
- For each friend: O(K)
Total: O(K)

Example: 3 hops in million-user graph
SQL: ~60ms
Graph: ~3ms (20x faster!)
Click to expand code...

Graph DB Comparison

DatabaseTypeBest ForLanguage
Neo4jNative graphOLTP, general purposeCypher
Amazon NeptuneManagedAWS ecosystemGremlin/SPARQL
JanusGraphDistributedLarge scaleGremlin
ArangoDBMulti-modelFlexible schemasAQL
OrientDBMulti-modelDocument+GraphSQL-like

When to Use Graph Databases

Good fit:

  • Social networks
  • Recommendation engines
  • Fraud detection
  • Network/IT infrastructure
  • Knowledge graphs
  • Access control (who can access what)

Poor fit:

  • Analytical queries on entire dataset
  • Simple CRUD applications
  • Transactional applications (ACID critical)
  • Time-series data
  • Large aggregations

Interview Tips 💡

When discussing graph databases in interviews:

  1. Problem: "In SQL, friend-of-friend query needs 2 JOINs - expensive at scale..."
  2. Index-free adjacency: "Nodes directly point to neighbors in memory - O(1) traversal..."
  3. Cypher: "Pattern matching with ASCII art: (alice)-[:FRIEND]->(bob)..."
  4. Use cases: "LinkedIn connections, fraud rings, product recommendations..."
  5. Trade-offs: "Fast traversal but slower for aggregations across all data..."
  6. Examples: "Neo4j most popular, Amazon Neptune for AWS..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles