Cloud Security

Kubernetes Security: Hardening Guide for Production (2026)

Manish Garg
Manish Garg Associate CISSP · RingSafe
April 19, 2026
4 min read

Kubernetes is the most operationally complex attack surface an organization can put on the internet. A production cluster has dozens of running components, hundreds of configuration options, and a dynamic workload model that makes traditional “patch and forget” impractical. The defaults are better in 2026 than they were in 2019, but the gap between “default-installed Kubernetes” and “production-hardened Kubernetes” is still large enough that every serious cluster audit produces meaningful findings. This is the practitioner’s guide to hardening Kubernetes for production.

The layers that matter

Layer 1 — Cluster infrastructure

Before anything running on Kubernetes, the cluster itself. For managed clusters (EKS, GKE, AKS), much of this is provider-handled. For self-managed, it is the customer’s responsibility.

  • Control plane encryption at rest (etcd data)
  • Control plane API access restricted to known networks (not public)
  • Node images based on hardened, minimal OS (Bottlerocket, CoreOS, minimized Ubuntu)
  • Node autorotation to apply patches without manual intervention
  • Private node groups with no public IPs; NAT gateway for egress
  • Workload identity federation configured (IRSA on EKS, Workload Identity on GKE, Azure AD Workload Identity on AKS)

Layer 2 — Access control

RBAC is Kubernetes’ access model, and it is where most finding-worthy gaps live.

  • No ClusterRoleBindings granting cluster-admin except to named break-glass identities
  • Human access via OIDC or IAM Identity Center federation, not kubeconfig files with long-lived tokens
  • Service account tokens not auto-mounted into pods that do not need them (automountServiceAccountToken: false by default)
  • Service accounts with narrow role bindings; default service account has no cluster access
  • Pod Security Standards (PSS) enforcement: restricted or baseline profile on every namespace; privileged only where specifically required and documented
  • OPA/Gatekeeper or Kyverno for policy-as-code enforcement beyond PSS

Layer 3 — Network

By default, every pod in a Kubernetes cluster can reach every other pod on every port. This is usually wrong.

  • NetworkPolicy enforced on every namespace; default-deny with explicit allow rules for required flows
  • Service mesh (Istio, Linkerd) for mTLS between services — transport-level encryption independent of application
  • Ingress controllers restricted to specific hostnames and TLS-only
  • No use of hostNetwork: true except where absolutely required
  • Egress filtering via egress gateways or network policies to specific external destinations

Layer 4 — Workload configuration

  • Pods run as non-root user (runAsNonRoot: true, runAsUser: >0)
  • Read-only root filesystem where feasible (readOnlyRootFilesystem: true)
  • All capabilities dropped, specific capabilities added only as needed
  • No privileged containers except for specific infrastructure needs (CNI, storage plugins)
  • No host namespace sharing (hostPID, hostIPC, hostNetwork) except as required
  • Resource requests and limits defined to prevent noisy-neighbor and resource-exhaustion attacks
  • Liveness and readiness probes configured

Layer 5 — Images and supply chain

  • Images from approved registries only; admission controller enforces this
  • Image scanning at build time (Trivy, Grype, Snyk) blocking critical vulnerabilities
  • Image signing with Sigstore/cosign; admission controller verifies signatures
  • SBOM generation and storage for every image; vulnerability re-scanning as new CVEs are published
  • Minimal base images (distroless, scratch, alpine) rather than full OS images
  • No latest tags in production; immutable tag or digest references

Layer 6 — Secrets

  • No secrets in environment variables or ConfigMaps
  • External secret management (AWS Secrets Manager, Vault, GCP Secret Manager) integrated via CSI driver
  • Kubernetes Secrets encrypted at rest (etcd encryption with KMS)
  • Secret rotation automated; applications reload secrets without restart where possible

Layer 7 — Observability and detection

  • Audit logging enabled on the API server with comprehensive rules (metadata level for all requests, RequestResponse for sensitive operations)
  • Audit logs shipped to central SIEM; retention 12 months minimum
  • Runtime security (Falco, Tetragon, or equivalent) detecting anomalous process behavior, file access, network connections
  • Workload observability (metrics, logs, traces) for security-relevant events
  • Defined incident response playbook for common scenarios: compromised pod, credential theft, suspicious network activity

The attacks we most often see succeed

  1. Compromised developer credentials — engineer kubeconfig leaked via phishing or malware; attacker uses it to pivot into the cluster. Federated auth and short-lived tokens prevent this.
  2. Misconfigured RBAC — a service account with broader permissions than needed, compromised via a vulnerable workload, used to access secrets or deploy malicious pods.
  3. Privileged containers as an escape vector — a container running with host namespace access, compromised via application vulnerability, used to escape to the node and access other workloads.
  4. Exposed kubelet or etcd — less common on managed clusters but still appears on self-managed. Unauthenticated kubelet API reachable from the network grants pod-level access.
  5. Image supply chain — a base image pulled from a public registry contains malware or a backdoor. Image scanning and signed base images from controlled registries prevent this.

The Kubernetes audit in practice

A hardening-focused Kubernetes audit typically covers:

  • Cluster configuration review (control plane, nodes, networking)
  • RBAC analysis — every role, every binding, every service account
  • Pod Security Standards enforcement and PSP-to-PSS migration status
  • NetworkPolicy review against intended traffic flows
  • Workload manifest review for security context, capabilities, secrets handling
  • Image registry and scanning pipeline review
  • Secrets management integration review
  • Runtime security tooling configuration
  • Audit log and detection pipeline review

Typical engagement: 2–3 weeks for a production cluster of moderate size, producing 30–80 findings categorized by severity and with specific remediation guidance (including the YAML patches to apply).

Related reading

For a Kubernetes security audit of your production cluster, book a scoping call.