Core Concepts
When you run a program, the operating system creates an instance of execution. But is it a process or a thread? Understanding the distinction is critical for building high-performance systems.
| Feature | Process | Thread |
|---|---|---|
| Definition | Independent program execution unit | Lightweight unit within a process |
| Memory | Isolated memory space | Shared memory (heap, code, globals) |
| Communication | IPC (pipes, sockets, shm) | Shared variables, locking |
| Context Switch | Expensive (flush TLB, cache) | Cheap (save registers/stack) |
| Failure | Isolated (crash doesn't affect others) | Crash kills entire process |
Memory Architecture
The key difference lies in how memory is organized.
Process Layout
Each process has its own virtual address space.
graph TD
subgraph Process A [Process A Memory Space]
CodeA[Code Segment]
HeapA[Heap Memory]
StackA[Stack]
GlobalA[Global Data]
end
subgraph Process B [Process B Memory Space]
CodeB[Code Segment]
HeapB[Heap Memory]
StackB[Stack]
GlobalB[Global Data]
end
Kernel[OS Kernel Space] --> ProcessA
Kernel --> ProcessB
Thread Layout
Threads live inside a process and share resources.
graph TD
subgraph Process [Single Process Address Space]
Code[Code Segment - SHARED]
Heap[Heap Memory - SHARED]
Global[Global Data - SHARED]
subgraph T1 [Thread 1]
Stack1[Stack]
Reg1[Registers]
end
subgraph T2 [Thread 2]
Stack2[Stack]
Reg2[Registers]
end
end
[!NOTE] Threads share the Heap but have their own Stacks. This is why you can share objects between threads but local variables are thread-safe.
Context Switching Cost
Switching between execution units isn't free. The CPU must save state and load new state.
Process Context Switch (Heavy)
- Save registers
- Save stack pointer
- Switch Page Table Directory (Virtual Memory) šØ
- Flush TLB (Translation Lookaside Buffer) šØ
- Load new state
Thread Context Switch (Light)
- Save registers
- Save stack pointer
- Load new state (No memory map change, No TLB flush)
Code Examples
Python: Multiprocessing vs Threading
Python is unique because of the Global Interpreter Lock (GIL).
import time
import threading
import multiprocessing
def cpu_bound_task():
count = 0
while count < 100_000_000:
count += 1
def run_threading():
start = time.time()
t1 = threading.Thread(target=cpu_bound_task)
t2 = threading.Thread(target=cpu_bound_task)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"Threading: {time.time() - start:.2f}s")
# Result: ~10s (Concurrency, NO Parallelism due to GIL)
def run_multiprocessing():
start = time.time()
p1 = multiprocessing.Process(target=cpu_bound_task)
p2 = multiprocessing.Process(target=cpu_bound_task)
p1.start(); p2.start()
p1.join(); p2.join()
print(f"Multiprocessing: {time.time() - start:.2f}s")
# Result: ~5s (True Parallelism)
if __name__ == "__main__":
run_threading()
run_multiprocessing()
Golang: Goroutines (User-Space Threads)
Go uses "Goroutines" which are M:N mapped (M goroutines on N OS threads).
package main
import (
"fmt"
"sync"
"time"
)
func main() {
var wg sync.WaitGroup
start := time.Now()
// Launch 100,000 lightweight threads
for i := 0; i < 100000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// Tiny memory footprint (~2KB)
// Fast context switch in user space
}()
}
wg.Wait()
fmt.Printf("Spawned 100k goroutines in %v\n", time.Since(start))
}
Real-World Scaling Decisions
Chrome: Multi-Process Architecture
Chrome uses a separate process for each tab.
- Pros: If one tab crashes (segfault), the browser stays alive. Security isolation (sandboxing).
- Cons: High memory usage (each tab needs its own memory overhead).
Redis: Single-Threaded Event Loop
Redis uses a single thread for command processing.
- Why?: Avoids context switching and lock contention.
- Scale: Used I/O multiplexing to handle thousands of connections efficiently.
Apache vs Nginx
- Apache (Prefork): One process per request. Stable but memory hungry.
- Nginx (Event-Driven): One process handles thousands of requests via async I/O.
Selection Guide
Choose Processes When:
- You need high isolation (security/stability).
- The task is CPU-bound (in Python/Ruby with GIL).
- Failure in one unit shouldn't crash the system.
Choose Threads When:
- You need to share a lot of data.
- High frequency communication is required.
- Creating thousands of units (though async/coroutines are better here).
- The task is I/O bound.
Interview Tips š”
- "Is it better to use threads or processes?" ā Always answer "It depends". Mention isolation vs sharing.
- "What is the GIL?" ā Explain how it prevents Python threads from running in parallel on multi-core CPUs.
- "User vs Kernel Threads" ā Kernel threads are managed by OS (expensive). User threads (Green threads/Goroutines) are managed by runtime (cheap).
- "Copy-on-Write (COW)" ā Explain how
fork()is optimized by sharing memory pages until a write occurs.
Related Concepts
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Concurrency Patterns
Essential patterns for managing concurrent execution. Mutex, Semaphores, Monitors, and modern approaches like the Actor Model and CSP.
Backpressure
Flow control mechanism that prevents fast producers from overwhelming slow consumers by signaling when to slow down, pause, or drop data in streaming systems.
Caching Overview
High-speed data storage to reduce latency. The single most effective way to scale read-heavy systems.