P2 Incident Cost: The Frequent Tier That Usually Costs More in Aggregate
P2 (Severity 2) incidents are the underappreciated cost driver in modern incident management. Each P2 costs less than a P1, but P2s occur 5-15x more frequently, which means the annual aggregate cost of P2 events frequently exceeds the aggregate cost of P1 events at well-instrumented organisations. The economic implication: investment in reducing P2 frequency typically yields higher returns than further investment in P1 cost reduction. This page covers the P2 cost stack, the aggregate-arithmetic that makes P2 management a high-leverage discipline, and the engineering investments that compress P2 frequency.
What Counts as P2
The P1 versus P2 boundary is the most consequential incident-classification distinction. A P2 covers significant degradation that does not amount to full outage. Standard triggers for P2 declaration include the following.
- Meaningful customer-impacting performance degradation. Latency increase, partial feature loss, error rate elevation that affects customer experience without causing complete failure.
- Single-region or single-availability-zone failure. Service is degraded but functional overall thanks to multi-region or multi-AZ design; failover may be partial or graceful.
- Security incident under investigation but not actively exfiltrating. Suspicious activity detected, containment in progress, no confirmed data exit.
- Single major customer affected. A specific enterprise customer is impacted but the broader customer base is unaffected.
- Critical batch job failure during business hours. Reporting, billing, or operational batch process failure that does not immediately disrupt customer-facing service but will if not resolved.
- Supporting-system failure. Internal tooling failure (CI/CD, monitoring stack, identity provider, or similar) that affects engineering productivity but not customer service.
The fuzzy zone is whether to escalate a slow-developing P2 to P1 as it deteriorates. Mature organisations have explicit time-and-impact triggers for automatic upgrade (for example, P2 unresolved for 90 minutes with growing impact upgrades to P1; P2 affecting 15%+ of the customer base upgrades to P1). The escalation discipline preserves response intensity proportional to actual impact.
The P2 Cost Stack
P2 cost components mirror P1 components but at lower magnitudes. The cost shape is meaningfully different because revenue impact is typically much smaller (no full outage) while labor cost is roughly proportional to incident duration (which can be similar to or longer than P1).
| Cost Component | Range (mid-market) | Driver |
|---|---|---|
| Direct response labor | $2.5K-$15K | 3-7 responders * 1-4 hours * $200/hr loaded |
| Customer-facing revenue impact | $5K-$100K | Partial revenue loss * affected user share; varies dramatically |
| Productivity loss (internal tooling P2) | $10K-$200K | CI/CD or developer-tooling outage * affected engineer count |
| Customer-trust impact | <$50K typical | Smaller than P1 because impact is usually invisible to executives |
| Post-incident review | $1K-$10K | Lighter than P1 PIR; sometimes consolidated or skipped (a known anti-pattern) |
The single-event range of $50K-$200K is a triangulation of these components for a typical mid-market technology firm. Variability is high because customer-impact revenue varies wildly with the affected functionality and the customer base size.
Why Aggregate P2 Cost Usually Wins
The annual cost arithmetic favours P2 reduction at most organisations. Consider a representative mid-market SaaS team profile.
| Metric | Per-event cost | Annual frequency | Annual aggregate |
|---|---|---|---|
| P1 incidents | $794K (PagerDuty avg) | 4 per year | $3.18M/yr |
| P2 incidents | $150K (mid-range) | 60 per year | $9.0M/yr |
| P3 incidents | $15K | 240 per year | $3.6M/yr |
| Total annual incident cost | $15.78M/yr |
In this representative profile, P2 events are the largest single contributor to annual incident cost despite each one costing roughly 19% of a P1. Reducing P2 frequency by 30% (achievable at most organisations through targeted reliability investment) saves $2.7M annually. Reducing P1 frequency by 30% saves only $954K annually. The arithmetic is robust across organisation sizes: the absolute numbers change but the P2 dominance pattern persists.
Where P2 Frequency Comes From, and How to Compress It
P2 frequency in mature organisations clusters around four common root-cause categories. Each has a characteristic engineering response with measurable cost impact.
| Root-Cause Category | Share of P2s | Reduction Investment |
|---|---|---|
| Deployment / change-related | 30-50% | Progressive delivery, feature flags, canary deployment, automated rollback |
| Capacity / scaling-related | 15-25% | Auto-scaling, capacity planning, load testing, SLO budget enforcement |
| Third-party / dependency-related | 15-25% | Circuit breakers, retries with backoff, vendor-tier observability, dependency-failure runbooks |
| Configuration drift / human error | 10-20% | Infrastructure-as-code with mandatory review, policy-as-code (OPA), drift detection |
Across these categories, the median ROI of dedicated platform-reliability investment runs 3-8x within 12 months for organisations with high baseline P2 frequency. The investment is typically structured as a dedicated platform-reliability team (5-15 engineers) rather than as feature-team time, which avoids the common failure mode of reliability work losing priority against feature pressure.
The False-Positive Tax
Alert noise is the most under-recognised cost in P2 management. PagerDuty 2024-adjacent data and similar industry surveys suggest that up to 30-40% of P2-tier alerts in poorly-tuned organisations are false positives. At $2,500-$15,000 per response cycle, an organisation with 60 declared P2s per year and a 30% false-positive rate is paying $45K-$270K annually for noise alone. The real cost is higher: alert fatigue degrades responder discipline on real events, and attrition risk among on-call engineers rises with noise levels.
Alert-tuning programs typically yield 20-40% reduction in P2-tier event volume within 3-6 months at modest engineering cost (one engineer for one quarter, plus monitoring-platform configuration time). The work is straightforward but politically hard at organisations where deletion of any alert is treated as risk-taking. Mature SRE practice frames alert deletion as the default and alert addition as the exception requiring justification.