ZeonEdge - Enterprise DevSecOps & Cyber Security Solutions

The average time to detect a security breach in 2025 was 194 days (IBM Cost of a Data Breach Report). The average time to contain it after detection was another 64 days. During those 258 days, attackers exfiltrate data, establish persistence, and move laterally through the network. Companies with an incident response (IR) team and a tested IR plan saved an average of $2.5 million per breach compared to those without.

An incident response plan isn't something you create during an incident. It's a document you create, drill, and refine before the incident so that when the alarm goes off at 3 AM, your team knows exactly what to do, in what order, and who is responsible for each step.

Phase 1: Preparation (Before the Incident)

Preparation is the most important phase because it determines how effectively you handle everything that follows. Essential preparation activities:

Define your IR team and roles. Incident Commander (makes decisions, coordinates response), Technical Lead (leads technical investigation and containment), Communications Lead (handles internal and external communications), Legal/Compliance (advises on notification requirements, evidence preservation), and Subject Matter Experts (database admin, network engineer, application developers — called in as needed).

Define severity levels.

// Incident Severity Classification

SEV-1 (Critical): Active data breach, ransomware, complete service outage
  - Response time: < 15 minutes
  - War room: immediately
  - Escalation: CTO, CEO, Legal
  - Example: Attacker actively exfiltrating customer PII

SEV-2 (High): Confirmed intrusion without active data access,
               partial service degradation affecting customers
  - Response time: < 1 hour
  - War room: within 1 hour
  - Escalation: CTO, Engineering Lead
  - Example: Compromised employee credentials with access to production

SEV-3 (Medium): Suspicious activity requiring investigation,
                 security policy violation, vulnerability exploitation attempt
  - Response time: < 4 hours
  - Investigation: next business day
  - Example: Unusual login patterns from multiple countries

SEV-4 (Low): Informational security events, minor policy violations
  - Response time: < 24 hours
  - Investigation: within 1 week
  - Example: Employee accessed unauthorized but non-sensitive resource

Prepare your toolkit. During an active incident is not the time to figure out how to capture a memory dump or analyze a log file. Have these ready: forensic imaging tools (dd, FTK Imager), log analysis tools (Splunk, ELK, grep scripts), network capture tools (tcpdump, Wireshark), a dedicated incident response Slack channel (pre-created, invite-ready), evidence storage (encrypted, access-controlled, tamper-evident), and pre-approved communication templates (customer notification, press statement, regulatory notification).

Phase 2: Detection and Analysis

Detection happens through: automated alerts (SIEM, EDR, WAF, IDS/IPS), user reports ("I received a suspicious email and clicked the link"), third-party notifications (a security researcher reports a vulnerability, a partner reports unusual API traffic), or threat intelligence feeds (your domain appears on a phishing list).

When an alert fires:

Step 1: Triage (5 minutes). Is this a real incident or a false positive? Check: Is the alert from a known-noisy source? Is the affected system a production system? Can you quickly verify the activity? Assign an initial severity.

Step 2: Scope assessment (15-30 minutes). What systems are affected? What data is at risk? Is the attacker still active? Check logs for: authentication events (failed logins, new sessions, privilege escalation), network connections (unusual outbound traffic, connections to known malicious IPs), file system changes (new files, modified configurations, encrypted files), and database queries (unusual data access patterns, bulk exports).

# Quick forensic commands for initial assessment

# Check for unusual processes
ps aux --sort=-%mem | head -20
ps aux --sort=-%cpu | head -20

# Check for new user accounts
getent passwd | awk -F: '$3 >= 1000'
last -20

# Check for unusual network connections
ss -tunapl | grep ESTABLISHED
netstat -tunapl | grep -v '127.0.0.1'

# Check for recently modified files
find /etc -mtime -1 -type f 2>/dev/null
find /var/www -mtime -1 -type f 2>/dev/null

# Check authentication logs
grep "Failed password" /var/log/auth.log | tail -50
grep "Accepted publickey" /var/log/auth.log | tail -20

# Check for cron jobs (persistence mechanism)
crontab -l
ls -la /etc/cron.d/ /etc/cron.daily/ /etc/cron.hourly/
find /var/spool/cron -type f

# Check for unusual systemd services (another persistence mechanism)
systemctl list-units --type=service --state=running
find /etc/systemd/system -name "*.service" -mtime -7

# Capture network traffic for analysis
tcpdump -i any -w /tmp/capture_$(date +%s).pcap -c 10000 &

Phase 3: Containment

Containment prevents the incident from spreading. There are two containment strategies:

Short-term containment (immediate): Stop the bleeding without destroying evidence. Isolate the compromised system from the network (disable network interface, add firewall rules, or move to an isolated VLAN). Block the attacker's IP addresses and known C2 domains at the firewall. Revoke compromised credentials. If a web application is compromised, put it behind a WAF rule or take it offline.

Long-term containment (hours): Implement sustainable controls while the investigation continues. Patch the exploited vulnerability. Rotate all credentials that may have been exposed. Add enhanced monitoring to detect if the attacker returns. If a server is compromised, don't rebuild it yet — keep it isolated for forensic analysis and provision a clean replacement.

Critical: Do NOT alert the attacker. If you detect an active intrusion, don't change the compromised user's password immediately — the attacker will know they've been detected and may accelerate data exfiltration or deploy destructive payloads. First isolate the network path, then revoke credentials.

Phase 4: Eradication

Eradication removes the attacker's presence from your environment. This means: removing all malware, backdoors, and persistence mechanisms. Rebuilding compromised systems from clean images (do NOT try to "clean" a compromised system — you can never be sure you found everything). Closing the vulnerability that was exploited. Rotating all credentials, API keys, and certificates that may have been exposed.

For compromised servers: capture a forensic image of the disk (for investigation), then rebuild from a known-good image, configuration management (Ansible/Terraform), and the most recent clean backup. Never restore from a backup taken during the compromise — the backup may contain the attacker's backdoors.

Phase 5: Recovery

Recovery restores normal operations. Bring systems back online in order of business criticality. Monitor recovered systems intensively for 48-72 hours for signs of re-compromise. Verify data integrity — compare database checksums, file hashes, and transaction logs against known-good baselines.

Phase 6: Post-Incident Review

The post-incident review (also called a "retrospective" or "post-mortem") is where you turn a bad experience into improved security. Hold it within 1-2 weeks of the incident while memories are fresh.

The review must answer: What happened? (Timeline of events from initial compromise to full recovery.) How did we detect it? (Was it automated or manual? How can we detect it faster?) How did they get in? (Root cause analysis of the exploited vulnerability.) What worked well in our response? What didn't work? What will we change? (Specific, actionable improvements with owners and deadlines.)

Blameless culture: The post-incident review must be blameless. The goal is to improve systems and processes, not to punish individuals. If a developer deployed a vulnerable configuration, the question is "why did our processes allow a vulnerable configuration to be deployed?" not "who deployed the vulnerability?"

Notification Requirements

Depending on the type of data affected and your jurisdiction, you may be legally required to notify regulators, affected individuals, or both:

GDPR (EU): Notify the supervisory authority within 72 hours of becoming aware. Notify affected individuals "without undue delay" if the breach is likely to result in a high risk to their rights.

US state laws: 50 states have their own breach notification laws with varying requirements. Most require notification within 30-60 days. California (CCPA) has some of the strictest requirements.

Industry-specific: PCI DSS requires notification to card brands within 72 hours. HIPAA requires notification within 60 days for breaches affecting 500+ individuals.

ZeonEdge provides incident response planning, tabletop exercises, and active incident support. We help companies build IR capabilities before incidents happen and provide expert support during active incidents. Contact our security team.

Incident Response Playbook 2026: From Detection to Recovery in Minutes, Not Days

Phase 1: Preparation (Before the Incident)

Phase 2: Detection and Analysis

Phase 3: Containment

Phase 4: Eradication

Phase 5: Recovery

Phase 6: Post-Incident Review

Notification Requirements

Tags

Related Articles

DNS Deep Dive in 2026: How DNS Works, How to Secure It, and How to Optimize It

Data Privacy Engineering and GDPR Compliance in 2026: A Developer's Complete Guide

Linux Server Hardening for Production in 2026: The Complete Security Checklist

Ready to Transform Your Infrastructure?