Designing YouTube / Netflix
Video systems are heavyweight. A single user upload can trigger thousands of CPU hours of processing. The core challenge is managing this asynchronous workflow efficiently.
1. Requirements
Functional
- Upload: Users upload raw video files (MOV, AVI, MKV).
- View: Users stream videos with zero buffering.
- Quality: Support 360p, 480p, 720p, 1080p, 4K.
Non-Functional
- Reliability: Uploads must not be lost.
- Scalability: Handle variable spikes (e.g., Viral events).
- Latency: Processing shouldn't take forever, but it's not strictly real-time.
2. API Design
Uploading a 50GB File
You cannot use a simple POST /upload. HTTP times out.
Solution: Pre-signed URLs (S3) with Resumable Uploads.
- Client:
POST /initiate-upload. - Server: Returns a session ID and a chunk offset
0. - Client: Uploads chunk 1.
- Client: Connection dies? Client retries from offset
X.
3. The Transcoding Pipeline (DAG)
We don't just convert format A to format B. We need thumbnails, captions, copyright checks, and AI content moderation. We model this as a Directed Acyclic Graph (DAG).
The Components
- Inspector: Extracts metadata (resolution, codec).
- Splitter: Breaks video into 10-minute chunks.
- Transcoder: Parallel workers convert chunks to H.264 / VP9.
- Thumbnailer: Grabs frames at timestamp
t. - Joiner: Stitches processed chunks back (if necessary for storage, though streaming uses chunks).
Orchestration
- Tech: AWS Step Functions / Airflow / Temporal.io.
- Why?: If the "Copyright Check" fails, we stop the "4K Transcode" immediately to save money.
4. Adaptive Bitrate Streaming (ABR)
You don't stream one big .mp4 file anymore.
We use HLS (HTTP Live Streaming) or MPEG-DASH.
How it works
- Break it up: Video is sliced into 4-second
.ts(Transport Stream) segments. - Variants: Each 4-second segment is encoded in multiple qualities:
segment_001_1080p.ts(High Bitrate)segment_001_360p.ts(Low Bitrate)
- Manifest File (.m3u8): A text file listing all available variants.
The Player Logic:
- Client downloads Manifest.
- Client detects bandwidth (e.g., 4G is slow).
- Client requests
segment_001_360p.ts. - Bandwidth improves to 5G?
- Client requests
segment_002_1080p.ts. - Result: Seamless quality switching without buffering.
5. Storage & Optimization
Optimization: GOP Alignment
When splitting video for parallel processing, you must split at Keyframes (I-Frames). If you split in the middle of a "P-Frame" (which relies on previous frames data), the video will glitch.
Storage Tiers
- Raw Video: Store in Cold Storage (AWS Glacier). We rarely need the raw file again, but we keep it for future re-encoding (e.g., when VR/8K comes out).
- Encoded Segments: Store in Hot Storage (S3 Standard) -> Pushed to CDN.
6. Content Delivery Network (CDN)
- Popular Content (Taylor Swift video): Cache at the Edge (ISP level). 99% cache hit ratio.
- Long Tail (Unwatched vlogs): Fetch from Origin (S3) on demand.
Summary
- Upload: Resumable upload to Object Storage (S3).
- Process: DAG Workflow using parallel workers (Split -> Transcode -> Merge).
- Delivery: HLS/DASH Adaptive Streaming via CDN.
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
System Design: Dropbox (Google Drive)
Designing a file synchronization service like Dropbox or Google Drive. Key concepts: Block-level Deduplication, Delta Sync, and Strong Consistency.
Geohashing (Location Encoding)
A geocoding system that encodes latitude/longitude coordinates into short alphanumeric strings for efficient proximity searches and spatial indexing.
Horizontal Scaling (Scaling Out)
The definitive guide to adding more servers to your infrastructure pool to handle infinite growth.