June 10, 2026Red Team

Adversary Emulation with Caldera and MITRE ATT&CK in a Corporate Lab

Por Equipe Basilisk · June 10, 2026

How Basilisk uses Caldera, Atomic Red Team and MITRE ATT&CK to simulate real TTPs in a closed lab and measure SOC maturity without breaking production.

When a client calls saying they bought the top-shelf EDR and want to know if 'it is any good', the honest answer is never a PowerPoint report. It is an adversary emulation engagement with a written scenario, ATT&CK mapping and cold metrics: how many steps went silent, how many produced an alert, how many escalated into human response within 60 minutes. At Basilisk OffSec we standardised this around Caldera 5.x, Atomic Red Team and a Windows domain lab with 8 hosts that mirrors the client environment without ever touching their production. This piece describes the stack, the flow and the mistakes we charged ourselves for so we would not repeat them.

The base lab is simple and reproducible: one Windows Server 2022 as DC, two Server 2019 boxes (file and SQL), three Windows 11 Enterprise workstations running Defender for Endpoint in block mode, and two Ubuntu 24.04 hosts with auditd and a Wazuh agent. Everything is orchestrated on Proxmox via Terraform with snapshots named per scenario phase. Anyone bootstrapping the Windows side from scratch should start with Active Directory Pentest: Step-by-Step Kerberoasting in a GOAD Lab, which already ships intentionally messy ACLs, and then plug in the pivoting setup from [[pivoting-chisel-ligolo-rede-pentest]] to model realistic VLAN segmentation. Without that segmentation, any lateral-movement detection metric is biased from the start.

Caldera becomes the brain of the operation. We run the server on an isolated Debian box, load the 'atomic' and 'stockpile' plugins plus our internal fork 'basilisk-ttps' which packages region-specific behaviour like AnyDesk abuse in SMBs. Each Caldera adversary is a YAML file that cites ATT&CK technique IDs: T1059.001 for PowerShell, T1021.006 for WinRM, T1003.001 for LSASS dumping. We never run off-the-shelf adversaries unreviewed; they are useful as benchmarks, but the actual value comes from writing scenarios that reproduce the client threat model. A retail with an exposed POS does not deserve the same adversary as a fintech with federated Azure AD and strong MFA.

Execution follows a four-phase loop we learned to respect. First, simulated initial access on a controlled endpoint, usually following the playbook from Simulated Initial Access: Macros, LNK and ISO in an Isolated Windows 11 Lab with a payload signed by an internal CA. Second, execution and discovery leaning on LOLBins, where the material in Hunting Living-off-the-Land Binaries on Windows with KQL helps the blue team have KQL ready before the exercise starts. Third, lateral movement over SMB and WinRM abusing weak service accounts. Fourth, actions on objectives, which can mean SQL data exfiltration or simulated share encryption. Each phase has a clear stop criterion: if the EDR kills the process in under 90 seconds, we log it as detected and try another path.

The boring but decisive part is blue-side instrumentation. Without comparable telemetry, adversary emulation turns into theatre. We deploy Sysmon with Olaf Hartong config, ship to an Elastic 8.14 cluster and apply converted Sigma rules, a workflow we covered in Threat Hunting with Sigma and Elastic: From Indicator to Detection Rule. Every event Caldera triggers gets a custom header x-caldera-op-id, so in Kibana we can query which technique produced which events and how long the analyst took to respond. The three metrics we always report: MTTD per technique, detection rate by ATT&CK tactic and number of new Sigma rules authored as a direct result. Without those three numbers, the client thinks they bought a pentest and walks away frustrated.

There are ethical and operational traps worth naming. Adversary emulation is not a license to run EDR evasion against someone else's production; anything involving direct syscalls or AMSI patching stays inside the lab, and whoever wants the reasoning can read EDR Evasion for Research: Direct Syscalls Explained Without the Hype and AMSI and ETW Bypass for Defensive Research: What Blue Teams Should Know. We also never share C2 infrastructure across clients; each engagement gets a dedicated Sliver teamserver built per Building C2 Infra with Sliver in an Isolated Lab for Defensive Research. And the final report always ties each TTP to a concrete countermeasure, feeding the feedback loop from Purple Team in Practice: Building a Red vs Blue Feedback Loop and keeping the exercise out of the PDF graveyard.

Practical takeaway: start small. Spin up Caldera, write one adversary with five ATT&CK techniques aligned to your real threat model, run it against three monitored endpoints and measure MTTD per technique. Repeat monthly, swapping one technique each cycle. In six months you will have a defensible maturity curve to show executives, without buying any new expensive tool, and your SOC will stop complaining that 'nothing ever happens' during training.

Adversary Emulation with Caldera and MITRE ATT&CK in a Corporate Lab

Nenhum comentário ainda

Deixe seu comentário