Purple Team in Practice: Building a Red vs Blue Feedback Loop
How to integrate adversarial emulation with the SOC, close detection gaps in short sprints, and turn exercises into versioned Sigma rules.
Purple Team is not a quarterly workshop with pizza and pretty slides. It is an engineering cadence where every TTP executed by Red becomes a detection hypothesis for Blue within 72 hours. At Basilisk OffSec we run two-week sprints: 10 techniques cherry-picked from ATT&CK, controlled execution in a corporate lab, and closure with a Sigma rule shipped to production. The KPI is not how many shells Red popped, it is how many techniques moved from 'undetected' to 'alerted with low false positive'. Anyone not measuring that delta is running expensive security theater.
The starting point is a technique catalog prioritized by real threat intel, not DEF CON hype. We grab fresh reports (Mandiant M-Trends, CrowdStrike OverWatch, CISA advisories) and cross-reference them with the ATT&CK Enterprise v15 matrix. For a financial operation, for example, T1078.004 (cloud accounts), T1558.003 (Kerberoasting) and T1059.001 (PowerShell) end up at the top. Before execution, Red documents the exact procedure following Adversary Emulation with Caldera and MITRE ATT&CK in a Corporate Lab and Blue sketches which telemetry should capture each step. This written contract kills the classic 'we did not see it because Splunk was not ingesting that index'.
Execution happens in an agreed window, with an exercise flag in the logs and a Slack #purple-live channel open. Every Red action gets a UTC timestamp, target hostname and binary hash. When we run Kerberoasting via Rubeus, the operator notes the exact extracted ticket and the targeted service account, matching the workflow in Active Directory Pentest: Step-by-Step Kerberoasting in a GOAD Lab. In parallel, a SOC analyst tries to detect in real time without knowing which step comes next, mimicking the real scenario. If they catch it in 4 minutes, we mark green. If it slipped through, it becomes a Jira ticket with priority set by the criticality of the touched asset.
Post-execution, the hard work begins: turning a finding into a durable rule. We use the pipeline described in Threat Hunting with Sigma and Elastic: From Indicator to Detection Rule to convert hypotheses into Sigma, then into EQL on Elastic and KQL on Sentinel. A rule only merges to main if it meets three criteria: covers the exercise technique, generates fewer than 5 false positives per week in the staging environment, and has a linked response playbook. Evasion techniques like those in EDR Evasion for Research: Direct Syscalls Explained Without the Hype and AMSI and ETW Bypass for Defensive Research: What Blue Teams Should Know force the team to move past signatures into behavioral detection, watching NtAllocateVirtualMemory calls and anomalous parent-child patterns.
Exercise infra must be auditable. C2 runs in an isolated VLAN with full PCAP capture, per the model in Building C2 Infra with Sliver in an Isolated Lab for Defensive Research, and traffic is mirrored to the staging SIEM via port mirror. Lateral movement follows the Lateral Movement in the Lab: SMB, WMI and WinRM with a Detection Focus playbook with Impacket and Evil-WinRM, always with /OPSEC=false flags to ensure Blue sees the artifacts. Internal pivoting uses Chisel per Pivoting with Chisel and Ligolo-ng: Segmented Networks in a Pentest Lab. Everything is logged in a private Git repo: every Red commit is referenced by the corresponding Blue rule PR, creating traceability that auditors love and managers love to show the board.
Communication kills more Purple Team programs than tooling gaps. We establish shared vocabulary: 'detected' means an alert was generated and triaged, not just that a log sits in some cold index. Retros run 60 minutes with three slides: techniques executed, detections shipped, open technical debt. Metrics we track: MTTD per ATT&CK category, percentage of Tactics coverage in the environment, and count of rules with FP rate above threshold. In six months, one customer went from 23% Credential Access coverage to 71%, with a 40% drop in noisy alerts.
Practical takeaway: start small and measurable. Pick five techniques relevant to your sector, write a contract with the SOC, execute in a short window with full logging, and do not close the sprint without a Sigma rule versioned in Git. Purple Team that leaves no versioned artifact behind did not scale, it only entertained. The cycle Red-builds-hypothesis, Blue-validates-telemetry, team-merges-rule-to-production must fit in two weeks. If it takes longer, you are managing a project, not operating continuous detection.