Back to All Concepts
System DesignSocialDatabasesDistributed SystemsExpert

System Design: Instagram News Feed

Designing a scalable social feed. Fan-out on Write vs Fan-out on Read, and solving the Justin Bieber problem.

Designing a News Feed

The "News Feed" is the core of Facebook, Instagram, and Twitter. The challenge isn't storing posts; the challenge is retrieving the right posts for a user in milliseconds.

1. Requirements

Functional

  • Post: User can upload image/text.
  • Follow: User can follow others.
  • Feed: User sees a list of posts from people they follow.

Non-Functional

  • Latency: Feed generation must be < 200ms.
  • Availability: Posting must succeed even if the feed is delayed.
  • Lag: Ideally, a new post appears in followers' feeds within 5 seconds.

2. API Design

GET /feed?cursor=123&limit=10

  • Cursor: Don't use OFFSET. Use a cursor (Timestamp or ID) for efficient pagination.

3. Architecture Approaches

Approach 1: Pull Model (Fan-out on Read)

When User Bob loads his feed:

  1. Fetch Following: Get IDs of everyone Bob follows (e.g., 500 users).
  2. Fetch Posts: Query DB: SELECT * FROM posts WHERE user_id IN (500_ids) ORDER BY time DESC LIMIT 10.
  3. Merge: Return to Bob.
  • Pros: Writes are fast (O(1)O(1)). No storage overhead.
  • Cons: Reads are Slow (O(N)O(N)). If Bob follows 5,000 users, the query is heavy. Twitter crashed frequently in 2010 due to this.

Approach 2: Push Model (Fan-out on Write)

We pre-compute the feed. Every user has a "Home Feed" list (Redis List) stored in memory.

When Alice posts:

  1. Fetch Followers: Get IDs of everyone following Alice (e.g., 500 users).
  2. Push: Insert Post ID into all 500 Redis lists.
  3. Read: When Bob loads his feed, we just return GET Bob_Feed_List.
  • Pros: Reads are Instant (O(1)O(1)).
  • Cons: Writes are slow (O(N)O(N)). "The Celebrity Problem".

The "Celebrity" Problem (Thundering Herd)

Justin Bieber has 100 Million followers. If he tweets, Approach 2 means we have to do 100 Million Redis writes instantly. This creates a massive lag/backlog.

4. The Hybrid Solution (Instagram/Twitter)

We combine both models based on the user type.

  1. Normal Users: Use Push.
    • If I post (100 followers), push it to their feeds.
  2. Celebrities (VIPs): Use Pull.
    • If Bieber posts, don't push it to 100M lists. Just save it to his DB.
  3. Reading the Feed:
    • When Bob loads his feed, we fetch his pre-computed "Push" feed.
    • We also check: "Does Bob follow any VIPs (Bieber)?"
    • If yes, we fetch Bieber's latest posts (Pull) and merge them into the feed at runtime.

5. Storage

Relational DB (Postgres/MySQL)

  • Users: Profile data.
  • Follows: Graph relationships (User A -> User B).
  • Metadata: Post text, geotags.

NoSQL / Blob (Cassandra + S3)

  • Media: Images/Videos go to S3. CDN handles delivery.
  • UserFeed: Redis (for active users) + Cassandra (for archived feed history).

6. Feed Ranking (Algorithmic Feed)

A chronological feed is easy. An algorithmic feed (Facebook style) is harder.

  1. Candidate Generation: Get 1,000 recent posts from friends.
  2. Scoring: Weigh features.
    • EdgeRank = Affinity * Weight * Time_Decay
    • Probability of click? Probability of Like?
  3. Sorting: Return Top 10 by score.

Summary

  1. Pull: Good for small scale / VIPs.
  2. Push: Good for high read throughput / normal users.
  3. Hybrid: Best of both worlds. Push for most, Pull for celebrities.

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles