Back to All Concepts
DevOpsInfrastructureContainersSystem DesignPro

Kubernetes Architecture Explained

Under the hood of K8s: The Control Plane (API Server, Scheduler, Etcd, Controllers) and Data Plane (Kubelet, Kube-proxy, Container Runtime).

Use It, Don't Just Say It

Everyone puts "K8s" on their resume. Few understand how the scheduler actually selects a node, what happens when a pod crashes, or how networking works. This article goes under the hood.

1. The Control Plane (The Brain)

These components manage the cluster's global state. They make decisions (scheduling, scaling, self-healing) and run on dedicated "master" nodes.

A. API Server (kube-apiserver)

  • Role: The single front door to the entire cluster. Every interaction — kubectl, the dashboard, other control plane components — goes through the API Server.
  • Function: Validates requests (kubectl create pod), authenticates users (RBAC), and persists state to Etcd. It is the only component that talks directly to Etcd.
  • Design: Stateless and horizontally scalable. You can run multiple API Server instances behind a load balancer for high availability.

B. Etcd

  • Role: The cluster's source of truth. A distributed, consistent Key-Value store (based on the Raft consensus algorithm).
  • Data Stored: The entire cluster state — pod specs, service definitions, secrets, ConfigMaps, namespace info, RBAC policies.
  • Criticality: If Etcd is corrupted or lost, the cluster is effectively dead. Production clusters always run Etcd as a 3- or 5-node HA cluster.
  • Performance: All reads/writes go through the Raft leader. Etcd can handle ~10,000 writes/sec — sufficient for most clusters but can become a bottleneck at very large scale (5,000+ nodes).

C. Scheduler (kube-scheduler)

Interactive: The Scheduling Algorithm

K8s Scheduler Logic

1. Predicates (Filter)
2. Priorities (Score)
3. Bind
Pending Pod
CPU: 2RAM: 4GB
Worker-1
Free CPU:1
Free RAM:8GB
Worker-2
Free CPU:4
Free RAM:16GB
Worker-3
Free CPU:8
Free RAM:32GB
  • Role: Watches for newly created Pods with no assigned Node and decides where to run them.
  • Algorithm (two phases):
    1. Filtering: Which nodes satisfy the pod's constraints?
      • Does the node have enough CPU and RAM?
      • Does it match nodeSelector or nodeAffinity rules?
      • Does it tolerate the pod's tolerations?
      • Is it tainted to reject this pod?
    2. Scoring: Among feasible nodes, which is "best"?
      • Prefer nodes with less resource utilization (spread load).
      • Prefer nodes that already have the container image cached.
      • Apply custom priority functions.
    3. Binding: The scheduler binds the Pod to the winning Node by updating Etcd via the API Server.

D. Controller Manager

2. The Data Plane (Worker Nodes)

Where the actual application containers run. Each worker node has three components:

A. Kubelet

  • Role: The agent that runs on every node (both master and worker).
  • Function: Talks to the API Server every few seconds: "Do you have work for me?" → "Yes, ensure these Pods are running." → Tells the Container Runtime to start/stop containers.
  • Pod lifecycle management: Runs liveness and readiness probes. If a container's liveness probe fails, Kubelet restarts it.

B. Kube-proxy

  • Role: The networking agent. Runs on every node.
  • Function: Maintains network rules (iptables or IPVS) that enable Services. When you create a Kubernetes Service, kube-proxy ensures that traffic to the Service's ClusterIP is routed to one of the Service's healthy Pods.
  • Modes:
    • iptables (default): Creates iptables rules for load balancing. Simple but can get slow with thousands of services.
    • IPVS: Uses kernel-level IPVS for faster load balancing at scale.

C. Container Runtime

  • Role: The engine that actually runs containers.
  • containerd (default since K8s 1.24): Low-level runtime. Docker was removed in K8s 1.24 because it added unnecessary overhead — Kubernetes only needs the OCI-compliant runtime, not the full Docker daemon.
  • CRI-O: An alternative lightweight runtime designed specifically for Kubernetes.

3. The Complete Flow: kubectl run nginx

What actually happens when you run this command?

kubectl run nginx --image=nginx:latest
Click to expand code...
  1. kubectl sends an HTTP POST request to the API Server with the Pod spec.
  2. API Server validates the request (authentication, RBAC, admission controllers), then stores "Pending Nginx Pod" in Etcd.
  3. Scheduler watches for unassigned Pods. It sees the new Pod, evaluates nodes (filter + score), picks "Node-1", and updates the Pod's nodeName in Etcd via the API Server.
  4. Kubelet on Node-1 notices (via its API Server watch) that it has been assigned a new Pod.
  5. Kubelet tells containerd to pull the nginx:latest image and start the container.
  6. Kube-proxy sets up networking rules so other Pods and Services can reach the Nginx Pod.
  7. Kubelet reports Pod status as Running back to the API Server → Etcd is updated.

4. Self-Healing in Action

What happens when Node-1 crashes?

  1. Node Controller stops receiving heartbeats from Node-1.
  2. After 40 seconds (default), it marks Node-1 as NotReady.
  3. After 5 minutes, it evicts all Pods from Node-1.
  4. ReplicaSet Controller detects that the desired replica count (3) > actual running replicas (2).
  5. It creates a new Pod spec.
  6. Scheduler assigns the new Pod to a healthy node (Node-2 or Node-3).
  7. Kubelet on the chosen node starts the container.

The entire process is automatic — no human intervention needed.

5. Key Kubernetes Objects

ObjectPurpose
PodSmallest deployable unit. One or more containers.
DeploymentManages ReplicaSets. Handles rolling updates.
ServiceStable networking endpoint for a set of Pods.
ConfigMapStore config data as key-value pairs.
SecretStore sensitive data (passwords, tokens).
NamespaceLogical isolation for multi-tenant clusters.
IngressHTTP routing rules (virtual hosts, paths).

Interview Tips 💡

  1. Don't just say what Kubernetes is: Explain the control loop — "K8s continuously compares desired state (Etcd) with actual state (kubelet reports) and reconciles."
  2. Know the scheduling algorithm: "Filter feasible nodes, score them, bind the best one."
  3. Explain the flow: Walk through what happens when kubectl apply is run — API Server → Etcd → Scheduler → Kubelet → Container Runtime.
  4. Discuss high availability: "API Server is stateless and load-balanced. Etcd runs as a 3-node Raft cluster. Workers are inherently redundant."
  5. Networking: "Services provide stable IPs via kube-proxy (iptables/IPVS). Ingress controllers handle HTTP routing."

Summary

  • API Server: Central hub — the only gateway to Etcd.
  • Etcd: Source of truth — stores all cluster state.
  • Scheduler: Decides where to run new Pods (filter → score → bind).
  • Controllers: Decide what to do (self-healing via reconciliation loops).
  • Kubelet: The worker agent — ensures Pods are running on each node.
  • Kube-proxy: Networking — routes traffic to healthy Pods.

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles