Forensics

Container Forensics: Investigating Kubernetes Compromises Like a Pro

Por Equipe Basilisk ·

How the Basilisk team collects evidence from pods, runtime, and control plane after a suspected incident in production Kubernetes clusters.

Three in the morning, Falco alert: a pod in the payments namespace spawned /bin/sh after opening a reverse socket to an Egyptian IP. The on-call team cordoned the node, but the CISO's first question was brutal: 'do you have evidence that survives kubectl delete pod?'. In most clusters Basilisk audits, the honest answer is no. Container forensics in Kubernetes demands a collection chain that begins before the incident, because the runtime loves recycling layers, cgroups, and namespaces faster than any analyst can launch Wireshark.

The first evidence layer lives inside the pod itself, and it is volatile by design. Before any kubectl cordon, we tag the node with a custom NoExecute taint, snapshot the disk via CSI (on EKS we trigger an EBS snapshot from the console, on GKE gcloud compute disks snapshot does the job), and only then enter the target container with kubectl debug using an ephemeral image carrying static binaries of busybox, lsof, ss, and tcpdump. We capture /proc/[pid]/exe, /proc/[pid]/maps and the contents of /tmp before any restart. Whoever has run DFIR on Linux: Live Triage with UAC and Velociraptor on traditional VMs needs to shift mindset: here the filesystem is overlay and the top layer vanishes the moment the pod dies.

Container memory is the real treasure. With containerd or CRI-O runtimes, the in-pod process PID is visible on the host, so we run avml --pid or use LiME compiled against the node kernel to produce a full dump. That dump feeds directly into the workflow from Memory Forensics with Volatility 3: Analyzing Dumps in a Reproducible Lab, where plugins like linux.pslist and linux.malfind reveal injections the host EDR missed because they were confined to the container namespace. We log every SHA-256 hash on a chain of custody sheet signed with Sigstore, tying back to the discipline outlined in Supply Chain Security: Sigstore Signing and Real SBOMs in CI/CD.

Moving up the stack, the control plane is where the investigation gets juicy. The kube-apiserver with audit policy at RequestResponse level records every exec, attach, and portforward as structured JSON. In a real 2025 case, we recovered the smoking gun: the attacker created a ServiceAccount named 'monitoring-helper' with cluster-admin via a kubectl apply -f - heredoc; the audit log showed user-agent kubectl/v1.29.2 from a residential IP at 3:47am. We cross-referenced hourly etcd snapshots (etcdctl snapshot save) to reconstruct Role state before and after. This kind of timeline echoes the approach in Timeline Forensics on Windows: Plaso, Log2Timeline and KAPE in Practice, only applied to declarative resources.

Runtime forensics requires specific tooling. We deploy Tetragon or keep Falco with custom rules exporting to an external SIEM, never inside the same compromised cluster. For live capture, Aqua's tracee-ebpf records syscalls with container_id context, and sysdig inspect reads scap files as if they were kernel pcaps. Beware of false negatives: if the attacker used techniques adapted from EDR Evasion for Research: Direct Syscalls Explained Without the Hype for Linux, the default seccomp hook will miss them. Combining this with KQL or Sigma hunting, in the spirit of Threat Hunting with Sigma and Elastic: From Indicator to Detection Rule, multiplies your odds of catching the pivot.

Compromised images deserve their own chapter. Before destroying anything, we docker save (or skopeo copy) the suspect image into an isolated forensic registry, then run dive and trivy fs to map layers added at runtime via kubectl cp or docker exec. In 60% of cases we have seen, the attacker did not modify the original image; they dropped binaries in /tmp or /dev/shm betting nobody would snapshot the overlay. When upstream compromise is suspected, we ship layers to an ELF-adapted Malware Analysis in an Isolated Lab: Safe Setup with FlareVM and REMnux with Remnux running on an air-gapped VM.

Network forensics in Kubernetes differs from traditional networking because the CNI does NAT and encapsulation. We capture traffic at the pod's veth pair level with tcpdump -i any on the host, filtered by the pod IP assigned by IPAM. If the cluster runs Cilium, hubble observe --pod payments/checkout-7f4 shows decoded L7 flows. For persistent C2, we compare against IOCs from our Building C2 Infra with Sliver in an Isolated Lab for Defensive Research lab and inspect DNS at CoreDNS via query log. We always export pcaps to write-once storage, because lawyers love to challenge integrity.

Practical takeaway: rehearse the procedure before the incident. Spin up a kind or k3d cluster, simulate an attacker pod that opens a reverse shell, and time how long your team takes from alert to signed memory dump. If it exceeds 20 minutes, automate it with an operator that reacts to Falco events by applying the taint, triggering CSI snapshot, and launching a collection job. Container forensics is not about exotic tools, it is about choreography rehearsed before the stage catches fire.

Nenhum comentário ainda

Seja o primeiro a comentar.

Deixe seu comentário

Entre com sua conta Canverly para comentar. Você pode usar a mesma conta em qualquer site da rede.

Entrar com Canverly