Kubernetes is the most operationally complex attack surface an organization can put on the internet. A production cluster has dozens of running components, hundreds of configuration options, and a dynamic workload model that makes traditional “patch and forget” impractical. The defaults are better in 2026 than they were in 2019, but the gap between “default-installed Kubernetes” and “production-hardened Kubernetes” is still large enough that every serious cluster audit produces meaningful findings. This is the practitioner’s guide to hardening Kubernetes for production.
The layers that matter
Layer 1 — Cluster infrastructure
Before anything running on Kubernetes, the cluster itself. For managed clusters (EKS, GKE, AKS), much of this is provider-handled. For self-managed, it is the customer’s responsibility.
- Control plane encryption at rest (etcd data)
- Control plane API access restricted to known networks (not public)
- Node images based on hardened, minimal OS (Bottlerocket, CoreOS, minimized Ubuntu)
- Node autorotation to apply patches without manual intervention
- Private node groups with no public IPs; NAT gateway for egress
- Workload identity federation configured (IRSA on EKS, Workload Identity on GKE, Azure AD Workload Identity on AKS)
Layer 2 — Access control
RBAC is Kubernetes’ access model, and it is where most finding-worthy gaps live.
- No ClusterRoleBindings granting
cluster-adminexcept to named break-glass identities - Human access via OIDC or IAM Identity Center federation, not kubeconfig files with long-lived tokens
- Service account tokens not auto-mounted into pods that do not need them (
automountServiceAccountToken: falseby default) - Service accounts with narrow role bindings; default service account has no cluster access
- Pod Security Standards (PSS) enforcement:
restrictedorbaselineprofile on every namespace;privilegedonly where specifically required and documented - OPA/Gatekeeper or Kyverno for policy-as-code enforcement beyond PSS
Layer 3 — Network
By default, every pod in a Kubernetes cluster can reach every other pod on every port. This is usually wrong.
- NetworkPolicy enforced on every namespace; default-deny with explicit allow rules for required flows
- Service mesh (Istio, Linkerd) for mTLS between services — transport-level encryption independent of application
- Ingress controllers restricted to specific hostnames and TLS-only
- No use of
hostNetwork: trueexcept where absolutely required - Egress filtering via egress gateways or network policies to specific external destinations
Layer 4 — Workload configuration
- Pods run as non-root user (
runAsNonRoot: true,runAsUser: >0) - Read-only root filesystem where feasible (
readOnlyRootFilesystem: true) - All capabilities dropped, specific capabilities added only as needed
- No privileged containers except for specific infrastructure needs (CNI, storage plugins)
- No host namespace sharing (
hostPID,hostIPC,hostNetwork) except as required - Resource requests and limits defined to prevent noisy-neighbor and resource-exhaustion attacks
- Liveness and readiness probes configured
Layer 5 — Images and supply chain
- Images from approved registries only; admission controller enforces this
- Image scanning at build time (Trivy, Grype, Snyk) blocking critical vulnerabilities
- Image signing with Sigstore/cosign; admission controller verifies signatures
- SBOM generation and storage for every image; vulnerability re-scanning as new CVEs are published
- Minimal base images (distroless, scratch, alpine) rather than full OS images
- No
latesttags in production; immutable tag or digest references
Layer 6 — Secrets
- No secrets in environment variables or ConfigMaps
- External secret management (AWS Secrets Manager, Vault, GCP Secret Manager) integrated via CSI driver
- Kubernetes Secrets encrypted at rest (etcd encryption with KMS)
- Secret rotation automated; applications reload secrets without restart where possible
Layer 7 — Observability and detection
- Audit logging enabled on the API server with comprehensive rules (metadata level for all requests, RequestResponse for sensitive operations)
- Audit logs shipped to central SIEM; retention 12 months minimum
- Runtime security (Falco, Tetragon, or equivalent) detecting anomalous process behavior, file access, network connections
- Workload observability (metrics, logs, traces) for security-relevant events
- Defined incident response playbook for common scenarios: compromised pod, credential theft, suspicious network activity
The attacks we most often see succeed
- Compromised developer credentials — engineer kubeconfig leaked via phishing or malware; attacker uses it to pivot into the cluster. Federated auth and short-lived tokens prevent this.
- Misconfigured RBAC — a service account with broader permissions than needed, compromised via a vulnerable workload, used to access secrets or deploy malicious pods.
- Privileged containers as an escape vector — a container running with host namespace access, compromised via application vulnerability, used to escape to the node and access other workloads.
- Exposed kubelet or etcd — less common on managed clusters but still appears on self-managed. Unauthenticated kubelet API reachable from the network grants pod-level access.
- Image supply chain — a base image pulled from a public registry contains malware or a backdoor. Image scanning and signed base images from controlled registries prevent this.
The Kubernetes audit in practice
A hardening-focused Kubernetes audit typically covers:
- Cluster configuration review (control plane, nodes, networking)
- RBAC analysis — every role, every binding, every service account
- Pod Security Standards enforcement and PSP-to-PSS migration status
- NetworkPolicy review against intended traffic flows
- Workload manifest review for security context, capabilities, secrets handling
- Image registry and scanning pipeline review
- Secrets management integration review
- Runtime security tooling configuration
- Audit log and detection pipeline review
Typical engagement: 2–3 weeks for a production cluster of moderate size, producing 30–80 findings categorized by severity and with specific remediation guidance (including the YAML patches to apply).
Related reading
- Cloud Security for Indian Businesses: The Complete Guide
- AWS Security Audit: The 47-Point Checklist
- AWS IAM Best Practices for Indian SaaS
For a Kubernetes security audit of your production cluster, book a scoping call.