Use It, Don't Just Say It

Everyone puts "K8s" on their resume. Few understand how the scheduler actually selects a node, what happens when a pod crashes, or how networking works. This article goes under the hood.

1. The Control Plane (The Brain)

These components manage the cluster's global state. They make decisions (scheduling, scaling, self-healing) and run on dedicated "master" nodes.

A. API Server (`kube-apiserver`)

Role: The single front door to the entire cluster. Every interaction — kubectl, the dashboard, other control plane components — goes through the API Server.
Function: Validates requests (kubectl create pod), authenticates users (RBAC), and persists state to Etcd. It is the only component that talks directly to Etcd.
Design: Stateless and horizontally scalable. You can run multiple API Server instances behind a load balancer for high availability.

B. Etcd

Role: The cluster's source of truth. A distributed, consistent Key-Value store (based on the Raft consensus algorithm).
Data Stored: The entire cluster state — pod specs, service definitions, secrets, ConfigMaps, namespace info, RBAC policies.
Criticality: If Etcd is corrupted or lost, the cluster is effectively dead. Production clusters always run Etcd as a 3- or 5-node HA cluster.
Performance: All reads/writes go through the Raft leader. Etcd can handle ~10,000 writes/sec — sufficient for most clusters but can become a bottleneck at very large scale (5,000+ nodes).

C. Scheduler (`kube-scheduler`)

Interactive: The Scheduling Algorithm

K8s Scheduler Logic

1. Predicates (Filter)

2. Priorities (Score)

3. Bind

Pending Pod

CPU: 2RAM: 4GB

Worker-1

Free CPU:1

Free RAM:8GB

Worker-2

Free CPU:4

Free RAM:16GB

Worker-3

Free CPU:8

Free RAM:32GB

Role: Watches for newly created Pods with no assigned Node and decides where to run them.
Algorithm (two phases):
1. Filtering: Which nodes satisfy the pod's constraints?
  - Does the node have enough CPU and RAM?
  - Does it match nodeSelector or nodeAffinity rules?
  - Does it tolerate the pod's tolerations?
  - Is it tainted to reject this pod?
2. Scoring: Among feasible nodes, which is "best"?
  - Prefer nodes with less resource utilization (spread load).
  - Prefer nodes that already have the container image cached.
  - Apply custom priority functions.
3. Binding: The scheduler binds the Pod to the winning Node by updating Etcd via the API Server.

D. Controller Manager

2. The Data Plane (Worker Nodes)

Where the actual application containers run. Each worker node has three components:

A. Kubelet

Role: The agent that runs on every node (both master and worker).
Function: Talks to the API Server every few seconds: "Do you have work for me?" → "Yes, ensure these Pods are running." → Tells the Container Runtime to start/stop containers.
Pod lifecycle management: Runs liveness and readiness probes. If a container's liveness probe fails, Kubelet restarts it.

B. Kube-proxy

Role: The networking agent. Runs on every node.
Function: Maintains network rules (iptables or IPVS) that enable Services. When you create a Kubernetes Service, kube-proxy ensures that traffic to the Service's ClusterIP is routed to one of the Service's healthy Pods.
Modes:
- iptables (default): Creates iptables rules for load balancing. Simple but can get slow with thousands of services.
- IPVS: Uses kernel-level IPVS for faster load balancing at scale.

C. Container Runtime

Role: The engine that actually runs containers.
containerd (default since K8s 1.24): Low-level runtime. Docker was removed in K8s 1.24 because it added unnecessary overhead — Kubernetes only needs the OCI-compliant runtime, not the full Docker daemon.
CRI-O: An alternative lightweight runtime designed specifically for Kubernetes.

3. The Complete Flow: `kubectl run nginx`

What actually happens when you run this command?

kubectl run nginx --image=nginx:latest

Click to expand code...

kubectl sends an HTTP POST request to the API Server with the Pod spec.
API Server validates the request (authentication, RBAC, admission controllers), then stores "Pending Nginx Pod" in Etcd.
Scheduler watches for unassigned Pods. It sees the new Pod, evaluates nodes (filter + score), picks "Node-1", and updates the Pod's nodeName in Etcd via the API Server.
Kubelet on Node-1 notices (via its API Server watch) that it has been assigned a new Pod.
Kubelet tells containerd to pull the nginx:latest image and start the container.
Kube-proxy sets up networking rules so other Pods and Services can reach the Nginx Pod.
Kubelet reports Pod status as Running back to the API Server → Etcd is updated.

4. Self-Healing in Action

What happens when Node-1 crashes?

Node Controller stops receiving heartbeats from Node-1.
After 40 seconds (default), it marks Node-1 as NotReady.
After 5 minutes, it evicts all Pods from Node-1.
ReplicaSet Controller detects that the desired replica count (3) > actual running replicas (2).
It creates a new Pod spec.
Scheduler assigns the new Pod to a healthy node (Node-2 or Node-3).
Kubelet on the chosen node starts the container.

The entire process is automatic — no human intervention needed.

5. Key Kubernetes Objects

Object	Purpose
Pod	Smallest deployable unit. One or more containers.
Deployment	Manages ReplicaSets. Handles rolling updates.
Service	Stable networking endpoint for a set of Pods.
ConfigMap	Store config data as key-value pairs.
Secret	Store sensitive data (passwords, tokens).
Namespace	Logical isolation for multi-tenant clusters.
Ingress	HTTP routing rules (virtual hosts, paths).

Interview Tips 💡

Don't just say what Kubernetes is: Explain the control loop — "K8s continuously compares desired state (Etcd) with actual state (kubelet reports) and reconciles."
Know the scheduling algorithm: "Filter feasible nodes, score them, bind the best one."
Explain the flow: Walk through what happens when kubectl apply is run — API Server → Etcd → Scheduler → Kubelet → Container Runtime.
Discuss high availability: "API Server is stateless and load-balanced. Etcd runs as a 3-node Raft cluster. Workers are inherently redundant."
Networking: "Services provide stable IPs via kube-proxy (iptables/IPVS). Ingress controllers handle HTTP routing."

Summary

API Server: Central hub — the only gateway to Etcd.
Etcd: Source of truth — stores all cluster state.
Scheduler: Decides where to run new Pods (filter → score → bind).
Controllers: Decide what to do (self-healing via reconciliation loops).
Kubelet: The worker agent — ensures Pods are running on each node.
Kube-proxy: Networking — routes traffic to healthy Pods.

Related Concepts

Docker Internals — Container images and runtimes
Service Discovery — How services find each other
CI/CD Pipeline — Deploying to Kubernetes
Blue-Green Deployment — Deployment strategies in K8s
Observability Stack — Monitoring K8s clusters

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Intermediate

Blue-Green Deployment

Zero-downtime deployment strategy using two identical production environments (Blue and Green) to enable instant rollbacks, reduce risk, and allow thorough testing before directing traffic.

DevOpsDeploymentInfrastructure

Advanced

Docker Internals

What actually is a container? Just a Linux process with a mask on. Deep dive into Namespaces, Cgroups, and Union Filesystems (OverlayFS).

DevOpsContainersLinux

Intermediate

CI/CD Pipeline Architecture

Designing robust Continuous Integration and Continuous Deployment pipelines. Strategies for artifact promotion, testing pyramids, canary deployments, and rollback mechanisms.

DevOpsCICDAutomation

Kubernetes Architecture Explained

Use It, Don't Just Say It

1. The Control Plane (The Brain)

A. API Server (`kube-apiserver`)

B. Etcd

C. Scheduler (`kube-scheduler`)

Interactive: The Scheduling Algorithm

K8s Scheduler Logic

D. Controller Manager

2. The Data Plane (Worker Nodes)

A. Kubelet

B. Kube-proxy

C. Container Runtime

3. The Complete Flow: `kubectl run nginx`

4. Self-Healing in Action

5. Key Kubernetes Objects

Interview Tips 💡

Summary

Related Concepts

About ScaleWiki

Related Articles

Blue-Green Deployment

Docker Internals

CI/CD Pipeline Architecture

Use It, Don't Just Say It

1. The Control Plane (The Brain)

A. API Server (kube-apiserver)

B. Etcd

C. Scheduler (kube-scheduler)

Interactive: The Scheduling Algorithm

K8s Scheduler Logic

D. Controller Manager

2. The Data Plane (Worker Nodes)

A. Kubelet

B. Kube-proxy

C. Container Runtime

3. The Complete Flow: kubectl run nginx

4. Self-Healing in Action

5. Key Kubernetes Objects

Interview Tips 💡

Summary

Related Concepts

About ScaleWiki

Related Articles

Blue-Green Deployment

Docker Internals

CI/CD Pipeline Architecture

A. API Server (`kube-apiserver`)

C. Scheduler (`kube-scheduler`)

3. The Complete Flow: `kubectl run nginx`