Hardening

Linux Server Hardening: Applying CIS Benchmark Without Breaking Production

Por Equipe Basilisk ·

How to apply the CIS Benchmark on production Debian and Ubuntu hosts by validating each control, measuring impact, and keeping SLA intact without an all-night rollback.

Applying the full CIS Benchmark in one shot to a Debian 12 box serving 40k requests per minute is the fastest way to turn a Friday into a sev 1. The Basilisk team has watched more than one shop run the ansible-lockdown playbook straight against production and discover at 3 a.m. that control 5.2.16 disabled the service account orchestrating the Postgres backup. Serious hardening is not copying 380 controls from a PDF: it is picking the 60 that are worth the risk, replaying them in staging under the same load, and wiring telemetry so you know how many minutes you have before customer impact forces a rollback.

The right starting point is splitting the controls into four buckets before you touch a single server. Bucket 1: kernel and boot (sysctl, GRUB, modules), low blast radius and high payoff. Bucket 2: network and firewall (nftables, IPv6, ICMP), medium risk if you have not mapped every port. Bucket 3: authentication, PAM, and SSH, where most post-hardening incidents live. Bucket 4: auditd, syslog, and AIDE integrity, essentially zero operational risk. Work bucket 4 first, then 1, then 2, and leave SSH and PAM for last. If you are already touching SSH, read SSH Hardening 2026: Algorithms, Certificates and Bastion Hosts before editing sshd_config because the algorithm story shifted again in 2026.

To inventory the current state, run openscap-scanner with the xccdf_org.ssgproject.content_profile_cis_level2_server profile from the ComplianceAsCode project. On a clean Ubuntu 24.04 you will see 110 to 140 controls in fail state, and that is expected. Export the HTML report, import it into Jira as one epic per bucket, and estimate effort in risk points rather than hours. The trick is never applying a remediation without reading the rationale, because half of CIS Level 2 breaks modern workloads. Disabling usb-storage makes sense on a bastion, not on a host that writes encrypted dumps to a hardware key for an air-gapped regulator handoff.

The kernel layer pays off enormously with low risk if you understand what you are tuning. kernel.kptr_restrict=2, kernel.dmesg_restrict=1, kernel.yama.ptrace_scope=2, and fs.protected_hardlinks=1 are free wins. kernel.unprivileged_userns_clone=0, on the other hand, kills rootless Docker, Podman, Bubblewrap, and every application sandbox you might rely on, so if you run containers or use the techniques from Linux Application Sandboxing with Bubblewrap, Firejail and Flatpak, keep it at 1 and formally document the deviation. For high-exposure services, push SELinux into enforcing with a custom targeted policy, exactly as we walk through in SELinux Without Fear: Custom Policies for Critical Services using an internet-facing Nginx as the worked example.

Validation is where most teams cut corners. Spin up a mirrored environment in LXD or Proxmox with the same kernel, the same glibc, and the same service versions, then replay 30 minutes of real traffic captured via tcpdump using k6 or wrk. Apply controls in batches of ten, rerun the test, and compare p95 latency and error rate. If regression exceeds 3% you bisect to identify which control caused it. Also run atomic-red-team with techniques mapped to MITRE ATT&CK to confirm that hardening actually shrinks the attack surface: the logic is the same as in Adversary Emulation with Caldera and MITRE ATT&CK in a Corporate Lab, but aimed at post-exploitation on the hardened host.

Auditd almost always becomes a bottleneck if you copy the CIS ruleset blindly. Default rules generate 8 to 15 thousand events per minute on an average host, fill /var/log in six hours, and force journald to start dropping. The recipe we use at Basilisk is trimming execve rules for known service users (postgres, nginx, app) and keeping aggressive monitoring only for uid 0, sudo, and interactive shells. Ship it through auditd-plugin into a Sigma pipeline as we describe in Threat Hunting with Sigma and Elastic: From Indicator to Detection Rule, otherwise you are generating expensive noise with zero actionable detection on the consuming end.

For servers that touch sensitive data or that you cannot physically reach, pair hardening with coercion-resistant disk encryption and verified backups, following the playbook in Disk Crypto and Backups: VeraCrypt, LUKS and a Resilient 3-2-1 Strategy. LUKS2 with Argon2id, a TPM2-sealed key bound to PCR0+PCR7, and an encrypted offsite snapshot solve the physical theft scenario at the datacenter. Combined with Secure Boot, signed modules, and a password-protected GRUB (CIS 1.4.x controls), you raise the cost of a hands-on attack to something only worth it against very specific targets.

Practical takeaway: build a spreadsheet listing every CIS control you applied, the package version at the time, the openscap result before and after, and the latency delta measured in staging. Re-run that spreadsheet on every major OS release (Debian point release, LTS upgrade) because sysctl defaults shift and auditd adds new fields. Hardening is not a project, it is a process: if six months from now you cannot prove via automated scan that those 60 controls remain in effect on the host, you do not have hardening, you have faith.

Nenhum comentário ainda

Seja o primeiro a comentar.

Deixe seu comentário

Entre com sua conta Canverly para comentar. Você pode usar a mesma conta em qualquer site da rede.

Entrar com Canverly