OPSEC

STRIDE Threat Modeling in Sprints: A Full Microservice Walkthrough

Por Equipe Basilisk ·

How to apply STRIDE to a real payments microservice inside a two-week sprint, with a clean DFD, prioritized threats, and actionable mitigations.

Threat modeling dies in a drawer when it turns into a four-hour meeting with no owner. At Basilisk OffSec we wire STRIDE into two-week sprints using a payments microservice as the guinea pig: 1 backend dev, 1 SRE, 1 offensive researcher, 90 minutes at kickoff, 30 minutes of review mid-sprint. The output is not a 40-page PDF, it is 12 Jira issues with verifiable mitigations. This post shows exactly how we ran it on the payments-api service, which receives PSP webhooks, talks to Postgres, Redis, and a KMS, and how every STRIDE letter became a real patch in code.

Before modeling, we drew the DFD (Data Flow Diagram) in draw.io with four elements: external entities (PSP, frontend), processes (payments-api, worker-reconciliation), datastores (Postgres tx_db, Redis idempotency_cache), and flows. We marked the trust boundaries: internet -> Cloudflare -> ingress -> internal mesh -> KMS. The diagram does not need to be pretty; it needs to be correct. In 25 minutes everyone signed off. Anyone who has built a pentest lab knows a wrong diagram leads to wrong tests Web Pentesting From Scratch: Building a Safe Lab with DVWA, Juice Shop and Burp Suite. Same rule here: if your webhook flow does not show HMAC validation happening before the JSON parse, you are modeling the service that lives in your head, not the one in production.

Spoofing showed up first on the PSP -> payments-api flow. The webhook arrived with an X-Signature header, but verification ran after json.loads(body), leaving a parser-differential window. Fix: validate HMAC SHA-256 with a KMS-rotated key before touching the body, with constant-time comparison via hmac.compare_digest. Tampering showed up in Redis: the idempotency cache had no fixed TTL or signature, so an attacker with internal network access could plant entries and trigger charge replays. We added a namespaced prefix plus a short HMAC on the key, and ACL with requirepass + TLS on Redis 7. Repudiation was handled with an append-only audit_log table using chained hashes, a pattern we also use when documenting pivoting across segmented networks Pivoting with Chisel and Ligolo-ng: Segmented Networks in a Pentest Lab.

Information Disclosure was the heaviest category, with eight findings. Stack traces leaked via FastAPI 500s in a staging environment mirrored to prod, secrets surfaced in /debug behind a magic header a 2024 intern forgot, and the Prometheus /metrics endpoint exposed labels carrying card_bin. We fixed it with middleware that only serializes {error_id, code} to the client, killed /debug, and applied relabel_config in Prometheus to drop sensitive labels. To make impact concrete for the team, we showed a POC equivalent to an SSRF pulling cloud metadata, an exercise we documented in another lab SSRF Demystified: Exploiting Cloud Metadata in a Local AWS Lab. We also ran SAST with custom Semgrep rules and SCA with osv-scanner; both became blocking gates for High and Critical findings in the pipeline.

Denial of Service was not treated as just rate limiting. We mapped algorithmic amplification: a /search endpoint accepted client regex and hit Postgres with LIKE %term%. We replaced it with tsvector + GIN index and a 64-char cap on the term. We added a per-API-key token bucket in envoy (100 rps, burst 200) and a circuit breaker on the KMS client using pybreaker, because the managed KMS has a 1200 ops/s quota per key and we already caused a 14-minute incident in January. Elevation of Privilege closed the list: the internal API JWT used HS256 with a secret shared across 6 services. We migrated to RS256 with per-service keys in KMS, audience-specific claims, and exp + nbf + iss validation in a single middleware shipped as the internal library basilisk-authz==2.3.0.

By the end of the sprint, every item became an issue titled STRIDE--: with a mitigation: tag. Of the 12 threats raised, 9 shipped as merged PRs inside the same sprint, 2 were accepted as residual risk with a 90-day review, and 1 turned into an epic to refactor the webhook module. Total cost: 4 hours of distributed meetings plus the code work already on the sprint. STRIDE is not an audit checklist; it is a shared language between dev, SRE, and offensive. Practical takeaway: start with a one-page DFD, force the 6 letters against every flow crossing a trust boundary, and require every threat to land as an issue with a verifiable acceptance criterion. If it does not turn into code this sprint, it was not threat modeling, it was theater.

Nenhum comentário ainda

Seja o primeiro a comentar.

Deixe seu comentário

Entre com sua conta Canverly para comentar. Você pode usar a mesma conta em qualquer site da rede.

Entrar com Canverly