Define agent success as 80%+ firing rate, not subjective satisfaction — felt reliability systematically inflates actual performance
Define agent success as a measurable outcome with a minimum acceptable firing rate threshold (typically 80% over one week for new agents) rather than subjective satisfaction, because subjective assessment systematically inflates reliability perception.
Why This Is a Rule
Subjective assessment of behavioral reliability is systematically inflated. When asked "how consistently are you doing your morning writing habit?", people typically estimate 80-90% when actual tracked data shows 40-60%. This isn't dishonesty — it's a predictable cognitive bias. Memory overweights recent successes (recency bias), successful instances are more salient than forgotten ones (availability bias), and identity-consistent behaviors feel more frequent than they are (confirmation bias).
The 80% threshold over one week is calibrated to the early stage of behavioral agent installation. One week provides enough trigger instances to distinguish genuine pattern from lucky streak. 80% accounts for legitimate exceptions (illness, travel, genuine emergencies) while still requiring that the agent fires in the vast majority of valid opportunities. Below 80%, the agent is underperforming and needs structural adjustment (Diagnose failing behavioral agents by component — trigger salience, condition scope, or action effort each require different fixes); above 80%, it's functioning as designed.
Measurable outcomes replace felt reliability with observed reliability. "I think my meditation agent is working" becomes "My meditation agent fired 6 out of 7 mornings this week (86%)." The number can't be inflated by identity or recency effects — it's a count.
When This Fires
- When evaluating whether a behavioral agent is "working"
- During weekly reviews (Review new agents weekly, established ones monthly, and all agents after major context changes) when assessing new agent performance
- When you feel confident about a habit but haven't tracked actual data
- When comparing self-reported behavior change success to observed outcomes
Common Failure Mode
Using subjective satisfaction as the success criterion: "It feels like the habit is sticking." This produces premature confidence that causes you to stop monitoring and adjusting. The agent may be firing at 50% — enough to create a subjective sense of consistency but not enough to produce reliable behavior change. Without measurement, you won't discover the gap until the agent silently dies.
The Protocol
(1) Before installing any behavioral agent, define what measurable success looks like: "This agent succeeds when it fires in X% of valid trigger instances over Y time period." (2) For new agents, the default threshold is 80% over one week. (3) Track actual performance: count trigger instances and firing instances. Calculate the rate. (4) Compare to threshold: ≥80% → agent is functioning. Continue and extend review interval. <80% → agent needs diagnosis (Diagnose failing behavioral agents by component — trigger salience, condition scope, or action effort each require different fixes) and structural adjustment. (5) Never substitute felt reliability for tracked data. "It feels like I'm doing well" is not a data point — count the instances.