Back to All Concepts
System DesignStorageDistributed SystemsCloudAdvanced

System Design: Dropbox (Google Drive)

Designing a file synchronization service like Dropbox or Google Drive. Key concepts: Block-level Deduplication, Delta Sync, and Strong Consistency.

Designing a File Sync Service

Dropbox looks simple: A folder that appears everywhere. Under the hood, it's a complex distributed system ensuring that if you edit a file on your laptop, it instantly updates on your phone.

1. Requirements

Functional

  • Sync: Add/Update/Delete files.
  • History: Restore previous versions.
  • Sharing: Share file with other users.

Non-Functional

  • Consistency: Clients must view the same state of the file. No "conflict" files if possible.
  • Bandwidth Efficiency: Don't upload the whole file if only one line changed.
  • Reliability: 99.9999% durability. Do not lose user data.

2. High-Level Architecture

We decouple the "Data" from the "Metadata".

Components

  1. Block Server: Stores raw chunks of data (Blob Storage / S3). It doesn't know what a "file" is. It just knows Hash(Chunk) -> Bytes.
  2. Metadata Server: Knows the file system structure. "Folder A contains File B. File B is made of [Chunk1, Chunk2]".
  3. Synchronization Service: Handles the "Chat" between client and server to figure out what needs dragging.

3. The Magic: Chunking & Deduplication

How do we save bandwidth and storage?

Naive Approach

Upload the whole 100MB file every time it saves.

  • Bad: Slow, destroys bandwidth.

Block-Level Deduplication

We split every file into fixed-size blocks (e.g., 4MB). File A (10 MB) -> [Block 1, Block 2, Block 3]

Scenario 1: Small Edit

  • User changes one character in the first paragraph.
  • Only Block 1 changes. Block 2 and Block 3 remain identical.
  • Client Uploads: Only the new Block 1'.
  • Server Stores: Block 1' (New), Block 2 (Ref), Block 3 (Ref).

Scenario 2: Cross-User Deduplication

  • User A uploads movie.mkv.
  • User B uploads movie.mkv.
  • Client B calculates hash of blocks. Sends hashes to server.
  • Server says: "I already have these blocks from User A. No need to upload."
  • Result: Instant upload (Zero seconds). Massive storage savings for Dropbox.

4. Metadata Database (Namespace)

We need to store the file tree: directory structure, permissions, and version history.

  • SQL (MySQL): Dropbox originally used MySQL. Why? Strong ACID consistency. If you move a folder, the database must purely reflect the new state immediately. No "Eventual Consistency" allowed here.
  • Schema:
    • FileID: UUID
    • ParentID: UUID (Folder)
    • Version: Int
    • BlockList: List<Hash>

5. Synchronization Workflow

1. Client detects change

The Dropbox client uses inotify (Linux) or FSEvents (macOS) to watch the local file system.

2. Client asks for instructions

Client talks to Sync Service:

"I have File A version 5. Server has version 6. What Changed?"

3. Delta Sync (Rsync Algorithm)

Instead of re-downloading the whole block, sophisticated clients use a rolling hash (Rsync) to download only the changed bytes within the block. (Though mostly, block-level replacement is sufficient).

4. Conflict Resolution

What if User A and User B edit the same file offline, then both go online?

  • Strategy: "Last Write Wins" is dangerous.
  • Strategy: "Conflict File". Dropbox creates File A (User B's Conflicted Copy). We let the human resolve it.

6. Cold Storage

Users rarely access old versions of files.

  • Hot Storage (S3 Standard): Current file versions.
  • Cold Storage (Glacier): History versions (v1, v2, v3). Cheaper, slower retrieval.

Summary

  1. Split data into blocks (4MB).
  2. Deduplicate blocks globally (hash-based).
  3. Store structure in a consistent SQL DB (Metadata).
  4. Sync only the delta blocks.

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles