Set RTO and RPO before failure, not during crisis — define maximum acceptable downtime and data loss while thinking clearly
Set Recovery Time Objectives (RTO) that define maximum acceptable downtime and Recovery Point Objectives (RPO) that define maximum acceptable data loss before a failure occurs, not during crisis.
Why This Is a Rule
Disaster recovery engineering uses two fundamental parameters: Recovery Time Objective (RTO) — the maximum acceptable time between failure and restored operation — and Recovery Point Objective (RPO) — the maximum acceptable loss of data or progress when recovery occurs. These parameters, defined for servers and databases, apply equally to personal and organizational systems.
RTO for a writing practice might be "2 weeks maximum between the habit breaking and regular sessions resuming." RPO might be "I can accept losing up to 1 week of accumulated progress/momentum." RTO for a team process might be "the process must be restored within 1 sprint." RPO might be "we can accept repeating the last sprint's decisions but not losing the quarter's strategic context."
The "before failure" timing is critical for the same reason as Design pre-commitments when calm to constrain behavior when stressed — never make rules in hot states (cold-state pre-commitment) and Build measurement systems when you design the strategy, not after problems appear — crisis instrumentation measures symptoms, not causes (concurrent instrumentation): during a crisis, your assessment of acceptable downtime is distorted by the urgency of the moment (compress too aggressively) or by the overwhelm of the situation (accept too much). Pre-defined RTO and RPO provide calm-state standards that crisis-state you can execute against.
When This Fires
- When designing recovery plans (Document five recovery components for every important process: failure mode, detection trigger, recovery steps, time target, verification) — RTO and RPO are inputs to recovery step design
- When a process has failed and you need to decide how urgently to restore it — check the pre-defined RTO
- When determining how much progress loss is acceptable after a disruption — check RPO
- During process architecture when deciding how much resilience investment each process warrants
Common Failure Mode
Setting recovery expectations during the crisis: "My meditation practice broke three months ago — but it's fine, I'll start again someday." Without a pre-defined RTO ("restart within 2 weeks"), "someday" extends indefinitely. The absence of a recovery deadline normalizes the failure. With a 2-week RTO, exceeding it would have triggered an alarm at week 3, not passive acceptance at month 3.
The Protocol
(1) For each important process, define before any failure occurs: RTO: "If this process fails, I will restore operation within [timeframe]." Be specific. "2 weeks" not "soon." RPO: "When restoring, I accept losing up to [amount] of accumulated progress." This sets expectations for whether you restart from scratch or rebuild from a recent checkpoint. (2) Write both down alongside the recovery plan (Document five recovery components for every important process: failure mode, detection trigger, recovery steps, time target, verification). (3) When failure occurs: compare elapsed downtime against RTO. If approaching RTO → escalate recovery priority. If exceeding RTO → this is now an emergency; the pre-defined deadline has passed. (4) Review RTO and RPO annually: as processes mature and your life context changes, acceptable downtime and progress loss may shift.