Core Concepts

When you run a program, the operating system creates an instance of execution. But is it a process or a thread? Understanding the distinction is critical for building high-performance systems.

Feature	Process	Thread
Definition	Independent program execution unit	Lightweight unit within a process
Memory	Isolated memory space	Shared memory (heap, code, globals)
Communication	IPC (pipes, sockets, shm)	Shared variables, locking
Context Switch	Expensive (flush TLB, cache)	Cheap (save registers/stack)
Failure	Isolated (crash doesn't affect others)	Crash kills entire process

Memory Architecture

The key difference lies in how memory is organized.

Process Layout

Each process has its own virtual address space.

mermaid

graph TD
    subgraph Process A [Process A Memory Space]
        CodeA[Code Segment]
        HeapA[Heap Memory]
        StackA[Stack]
        GlobalA[Global Data]
    end
    
    subgraph Process B [Process B Memory Space]
        CodeB[Code Segment]
        HeapB[Heap Memory]
        StackB[Stack]
        GlobalB[Global Data]
    end
    
    Kernel[OS Kernel Space] --> ProcessA
    Kernel --> ProcessB

Click to expand code...

Thread Layout

Threads live inside a process and share resources.

mermaid

graph TD
    subgraph Process [Single Process Address Space]
        Code[Code Segment - SHARED]
        Heap[Heap Memory - SHARED]
        Global[Global Data - SHARED]
        
        subgraph T1 [Thread 1]
            Stack1[Stack]
            Reg1[Registers]
        end
        
        subgraph T2 [Thread 2]
            Stack2[Stack]
            Reg2[Registers]
        end
    end

Click to expand code...

[!NOTE] Threads share the Heap but have their own Stacks. This is why you can share objects between threads but local variables are thread-safe.

Context Switching Cost

Switching between execution units isn't free. The CPU must save state and load new state.

Process Context Switch (Heavy)

Save registers
Save stack pointer
Switch Page Table Directory (Virtual Memory) 🚨
Flush TLB (Translation Lookaside Buffer) 🚨
Load new state

Thread Context Switch (Light)

Save registers
Save stack pointer
Load new state (No memory map change, No TLB flush)

Code Examples

Python: Multiprocessing vs Threading

Python is unique because of the Global Interpreter Lock (GIL).

python

import time
import threading
import multiprocessing

def cpu_bound_task():
    count = 0
    while count < 100_000_000:
        count += 1

def run_threading():
    start = time.time()
    t1 = threading.Thread(target=cpu_bound_task)
    t2 = threading.Thread(target=cpu_bound_task)
    t1.start(); t2.start()
    t1.join(); t2.join()
    print(f"Threading: {time.time() - start:.2f}s")
    # Result: ~10s (Concurrency, NO Parallelism due to GIL)

def run_multiprocessing():
    start = time.time()
    p1 = multiprocessing.Process(target=cpu_bound_task)
    p2 = multiprocessing.Process(target=cpu_bound_task)
    p1.start(); p2.start()
    p1.join(); p2.join()
    print(f"Multiprocessing: {time.time() - start:.2f}s")
    # Result: ~5s (True Parallelism)

if __name__ == "__main__":
    run_threading()
    run_multiprocessing()

Click to expand code...

Golang: Goroutines (User-Space Threads)

Go uses "Goroutines" which are M:N mapped (M goroutines on N OS threads).

package main

import (
	"fmt"
	"sync"
	"time"
)

func main() {
	var wg sync.WaitGroup
	start := time.Now()

	// Launch 100,000 lightweight threads
	for i := 0; i < 100000; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			// Tiny memory footprint (~2KB)
			// Fast context switch in user space
		}()
	}

	wg.Wait()
	fmt.Printf("Spawned 100k goroutines in %v\n", time.Since(start))
}

Click to expand code...

Real-World Scaling Decisions

Chrome: Multi-Process Architecture

Chrome uses a separate process for each tab.

Pros: If one tab crashes (segfault), the browser stays alive. Security isolation (sandboxing).
Cons: High memory usage (each tab needs its own memory overhead).

Redis: Single-Threaded Event Loop

Redis uses a single thread for command processing.

Why?: Avoids context switching and lock contention.
Scale: Used I/O multiplexing to handle thousands of connections efficiently.

Apache vs Nginx

Apache (Prefork): One process per request. Stable but memory hungry.
Nginx (Event-Driven): One process handles thousands of requests via async I/O.

Selection Guide

Choose Processes When:

You need high isolation (security/stability).
The task is CPU-bound (in Python/Ruby with GIL).
Failure in one unit shouldn't crash the system.

Choose Threads When:

You need to share a lot of data.
High frequency communication is required.
Creating thousands of units (though async/coroutines are better here).
The task is I/O bound.

Interview Tips 💡

"Is it better to use threads or processes?" — Always answer "It depends". Mention isolation vs sharing.
"What is the GIL?" — Explain how it prevents Python threads from running in parallel on multi-core CPUs.
"User vs Kernel Threads" — Kernel threads are managed by OS (expensive). User threads (Green threads/Goroutines) are managed by runtime (cheap).
"Copy-on-Write (COW)" — Explain how fork() is optimized by sharing memory pages until a write occurs.

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Advanced

Concurrency Patterns

Essential patterns for managing concurrent execution. Mutex, Semaphores, Monitors, and modern approaches like the Actor Model and CSP.

ConcurrencyPatternsLow Level

Intermediate

Backpressure

Flow control mechanism that prevents fast producers from overwhelming slow consumers by signaling when to slow down, pause, or drop data in streaming systems.

StreamingPerformanceReactive Programming

Beginner

Caching Overview

High-speed data storage to reduce latency. The single most effective way to scale read-heavy systems.

PerformanceOptimizationDatabase

Process vs Thread

Core Concepts

Memory Architecture

Process Layout

Thread Layout

Context Switching Cost

Process Context Switch (Heavy)

Thread Context Switch (Light)

Code Examples

Python: Multiprocessing vs Threading

Golang: Goroutines (User-Space Threads)

Real-World Scaling Decisions

Chrome: Multi-Process Architecture

Redis: Single-Threaded Event Loop

Apache vs Nginx

Selection Guide

Interview Tips 💡

Related Concepts

About ScaleWiki

Related Articles

Concurrency Patterns

Backpressure

Caching Overview