Define your error budget in writing: ideal behavior, minimum acceptable, deviation threshold, and investigation trigger window
For any system you operate, define four components in writing: (1) ideal behavior, (2) minimum acceptable behavior, (3) numeric deviation threshold, and (4) time window before triggering investigation.
Why This Is a Rule
Google's Site Reliability Engineering (SRE) introduced the concept of error budgets to production systems: a quantified tolerance for failure that separates acceptable variance from actionable degradation. The same concept applies to any system you operate — personal habits, team processes, creative routines, health practices.
Without explicit error budgets, you're either investigating every deviation (exhausting and unsustainable) or ignoring all deviations until crisis (dangerous). The four components create the missing middle ground: deviations within the budget are accepted as normal variance; deviations beyond the budget trigger investigation.
Ideal behavior: what the system looks like when running optimally. Minimum acceptable behavior: the threshold below which the system is degraded and needs attention. Numeric deviation threshold: how far below ideal before you investigate — this is the error budget. Time window: how long a deviation must persist before triggering action — this prevents overreacting to transient noise. Together, these create a precise contract between "everything's fine" and "something needs fixing."
When This Fires
- When setting up monitoring for any recurring system (habits, processes, workflows)
- When you find yourself constantly anxious about whether a system is "working" — an error budget provides clarity
- When deviation investigations happen either too frequently (reacting to noise) or too rarely (ignoring real problems)
- Complements A complete feedback loop needs three elements: measured output, comparison standard, and adjustment rule — define all three or the loop is broken (feedback loop components) with the monitoring and tolerance specification
Common Failure Mode
Vague standards without numeric thresholds: "My writing practice should be consistent." What's consistent? 5 days a week? 4? 3? Without a numeric threshold, every minor deviation triggers self-judgment, and every major deviation gets rationalized. "I write 4 out of 5 weekdays, and 3 is my minimum — below 3 for two consecutive weeks triggers investigation" is an error budget. "I should write consistently" is not.
The Protocol
(1) For each system you want to monitor, write four components: Ideal: "This system runs optimally when [specific, measurable description]." Example: "When I write for 30+ minutes on 5 out of 5 weekdays." Minimum acceptable: "The minimum acceptable performance is [specific, measurable]." Example: "Writing for at least 15 minutes on at least 3 weekdays." Deviation threshold: the numeric gap between ideal and minimum. This is your error budget. Example: "Up to 2 missed days per week is within budget." Investigation trigger: "If performance falls below minimum for [time window], investigate." Example: "Two consecutive weeks below 3 sessions triggers root cause analysis." (2) Write these down. Unwritten error budgets don't survive memory distortion. (3) Monitor against the written budget. Within budget → accept and continue. Budget exhausted → investigate (When error budget is exhausted, analyze the pattern not individual incidents — budget exhaustion signals structural problems).