Last updated: April 29, 2026
Why this module exists. Containers are isolation, not security. The Linux kernel boundary between container and host has historically had escape paths every 6-18 months. Most enterprises run Kubernetes with Pod Security policies set to “permissive” because it’s the default. Every red team checks for container-escape primitives first.
What “container escape” means
A process inside a container gains capabilities equivalent to a process on the host node. From there: read other pods’ secrets, kubelet credentials, kubeconfig from /etc/kubernetes/admin.conf on a control-plane node, mount and read other persistent volumes.
Escape path 1: privileged: true
The simplest escape — a pod with securityContext.privileged: true has all Linux capabilities, all device access, the ability to mount filesystems, configure network, load kernel modules. Equivalent to root on the host.
# From inside a privileged pod
mkdir /host
mount /dev/sda1 /host
chroot /host bash
# You're root on the node.
Why it exists: Docker-in-Docker, networking plugins, monitoring agents that genuinely need it. Most pods that have it don’t actually need it — copy-pasted from a tutorial.
Escape path 2: hostPath mounts
A pod that mounts host directories with read-write access can:
- Mount / → game over (write to host /etc/cron.d, get RCE on next minute boundary).
- Mount /var/run/docker.sock or /var/run/containerd.sock → start any container with
--privileged --pid=host; chroot into host. - Mount /etc/kubernetes on a control-plane node → read kubeconfig with cluster-admin credentials.
- Mount /var/lib/kubelet → read every pod’s secrets that have ever run on the node.
# Pod manifest with /var/run/docker.sock mounted
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
type: Socket
# From inside the pod
docker run --rm --privileged --pid=host -it ubuntu nsenter -t 1 -m -u -n -i bash
# You're now PID 1 on the host
Escape path 3: dangerous capabilities
- SYS_ADMIN — many escape primitives; can mount filesystems, manipulate cgroups.
- SYS_PTRACE — attach to host processes if PID namespace is shared.
- SYS_MODULE — load kernel modules. Trivial root.
- NET_ADMIN + host network — sniff/manipulate other pods’ traffic.
The default Docker capabilities set is restrictive but Kubernetes allows broader sets. Audit securityContext.capabilities.add in every pod.
Escape path 4: hostPID / hostNetwork / hostIPC
- hostPID: true — see and (with capabilities) signal host processes.
nsenterinto PID 1. - hostNetwork: true — pod uses the host’s network namespace. Sniff all node traffic, talk to localhost services on the node.
- hostIPC: true — share IPC namespace; rarely useful for attacks but increases attack surface.
Runtime CVEs (the unpredictable risk)
Linux container runtimes have shipped escape CVEs every year:
- CVE-2019-5736 (runC) — exec-time race; container could overwrite the runC binary on the host.
- CVE-2022-0185 (kernel) — file-system context overflow; container escape via FUSE.
- CVE-2022-0492 (cgroup v1) — release_agent abuse; escape via writeable cgroup.
- CVE-2024-21626 (runC) — file descriptor leak; container escape via tmpfs.
Patching cadence on Kubernetes nodes is the defence. Most enterprises lag 30-90 days.
Real-world cases
- Tesla 2018 — Kubernetes dashboard exposed without auth; pods deployed → cryptocurrency mining + AWS metadata theft.
- Many cloud-native ransomware operations 2023-25 — initial access via vulnerable web app → container shell → escape via privileged DaemonSet → kubelet credentials → cluster takeover.
Try this yourself
# Lab — Kind cluster on your laptop
kind create cluster
# Deploy a deliberately vulnerable pod
kubectl apply -f - <
Defender's checklist
- Pod Security Standards / Pod Security Admission — Kubernetes 1.25+ ships this. Set namespaces to
restrictedby default;baselineat minimum. - Audit privileged pods — should be near-zero outside specific monitoring/networking namespaces.
- Drop capabilities by default —
capabilities.drop: ["ALL"]+ add only what's needed. - Disable host* fields — hostPID, hostNetwork, hostIPC false everywhere except specific system namespaces.
- OPA / Kyverno / Gatekeeper — admission controllers that block non-compliant pods at apply time.
- Runtime detection — Falco, Tetragon. Alert on suspicious syscalls (
setns,unshare,nsenter, mounting from a container). - Patch cadence — kernel + runtime within 14 days for critical CVEs.
Module Quiz · 6 questions
Pass with 80%+ to mark this module complete. Unlimited retries. Each question shows an explanation.
Custom team training + practitioner advisory
Beyond the free academy — we run private workshops, vCISO advisory, and red-team exercises tailored to your stack. For Indian SMBs scaling past their first hire.