Complete the incident timeline before writing any causal analysis
When drafting incident postmortems or failure analyses, complete the timeline of observable events (with timestamps and measurements) before writing any causal analysis, because mixing observation and explanation during collection produces defensive filtering.
Why This Is a Rule
When you mix "what happened" and "why it happened" during postmortem data collection, defensive filtering kicks in. People unconsciously edit the timeline to support or deflect causal narratives. The engineer who deployed the breaking change emphasizes the lack of staging environment (causal: it's the process's fault, not mine). The SRE who was on-call minimizes the response time gap (observation: the 45-minute lag becomes "we responded quickly"). Each person's account is subtly distorted by the causal story they've already formed.
Completing the timeline first — all observable events with timestamps and measurements, no explanations — creates a shared factual foundation that no one can retroactively revise once causal analysis begins. The timeline is the ground truth. Causal analysis is the interpretation built on top of it. Mixing them contaminates the ground truth with motivated reasoning.
When This Fires
- Drafting incident postmortems after production outages
- Writing failure analyses for project delays or missed objectives
- Conducting any blameless retrospective where facts need to be separated from narratives
- Investigating any situation where multiple people have different accounts of "what happened"
Common Failure Mode
Starting the postmortem with "what went wrong" or "root cause" sections. These frames invite causal analysis before the timeline is established. People jump to explaining before the group has agreed on what actually happened. Disagreements about causes get confused with disagreements about facts. The resulting document is a negotiated narrative, not an investigation.
The Protocol
(1) First pass — timeline only: gather every observable event with timestamps, measurements, and log evidence. No "because" statements. No explanations. Just: "14:23 — deploy initiated. 14:31 — error rate exceeded 5% threshold. 14:38 — first alert fired. 14:52 — engineer acknowledged." (2) Review the timeline as a team. Fill gaps, correct timestamps, add missing events. (3) Only after the timeline is complete and agreed-upon, begin causal analysis. The question shifts from "what happened?" (settled) to "why did this sequence of events occur?" (open for investigation).