Agent reliability matters more than agent sophistication

Your best system is the one that actually runs

You have probably designed impressive cognitive infrastructure that doesn't work. A weekly review protocol with twelve steps. A decision framework that requires pulling data from three sources and scoring options on a weighted matrix. A morning routine so elaborate it needs its own project plan. Each of these represents a sophisticated agent — and each of them has the same problem: it doesn't fire.

Meanwhile, the most productive people you know often rely on embarrassingly simple rules. Write one sentence every morning. Ask "what's the most important thing?" before starting work. Sleep on every major decision. These aren't sophisticated. They're not impressive on a whiteboard. But they fire every single day, which means they compound — and a compounding simple agent will outperform a sporadic complex agent every time.

This isn't just an opinion about productivity. It's a principle with deep roots in cognitive science, decision theory, and systems engineering. The evidence is consistent: when reliability and sophistication compete, reliability wins.

Gigerenzer's fast-and-frugal heuristics: less information, better decisions

Gerd Gigerenzer spent decades at the Max Planck Institute for Human Development demonstrating something that offends our intuitions about intelligence: simple decision rules routinely outperform complex analytical models — not despite using less information, but because of it.

In his research program on fast-and-frugal heuristics, Gigerenzer and colleagues showed that a simple rule like "take the best" — look at one cue, make your decision, ignore everything else — predicted outcomes as accurately as or better than multiple regression models that used all available data. In one study, the 1/N portfolio allocation rule (divide your money equally among N options) outperformed the Markowitz mean-variance optimization — the model that won Harry Markowitz a Nobel Prize — in six out of seven test environments (DeMiguel, Garlappi, & Uppal, 2009).

How is this possible? Gigerenzer's explanation draws on the bias-variance tradeoff from statistical learning theory. Complex models fit the training data beautifully — they capture every pattern, every correlation, every nuance. But they also fit the noise. When you deploy them in a new environment with new data, that noise-fitting becomes a liability. They overfit. Simple heuristics, by contrast, ignore most of the data. They have higher bias — they miss real patterns. But they have near-zero variance — they don't chase noise. In uncertain, changing environments with limited data, that tradeoff favors simplicity.

Gigerenzer called this the less-is-more effect: there are definable conditions under which using less information, less computation, and less time produces more accurate judgments than exhaustive analysis. The conditions aren't exotic. They're the default conditions of real life — incomplete information, time pressure, changing circumstances. In other words, exactly the conditions under which your cognitive agents operate every day.

The implication for your epistemic infrastructure is direct. A cognitive agent that uses one cue and fires every time will, over time, outperform a cognitive agent that uses twelve cues and fires when conditions are perfect. Not because the simple agent is smarter. Because it's there.

The habit research: consistency builds automaticity, not complexity

Phillippa Lally and colleagues at University College London published a study in the European Journal of Social Psychology (2010) that tracked 96 participants as they tried to build new daily habits over 84 days. The key finding: the median time for a behavior to reach 95% of its peak automaticity was 66 days — but with enormous individual variation, ranging from 18 to 254 days depending on the person and the behavior's complexity.

Here's the part that matters for agent design: performing the behavior more consistently was associated with stronger automaticity development. But missing a single day did not materially affect the long-term trajectory. What killed habit formation wasn't the occasional miss — it was abandoning the behavior entirely after a miss, which happened far more often with complex behaviors than simple ones.

Benjamin Gardner, also at UCL, pushed this further by arguing that automaticity — not frequency — is the "active ingredient" in habit formation. A behavior becomes habitual when it fires automatically in response to a contextual cue, without requiring deliberation. The simpler the behavior, the faster it reaches automaticity. The faster it reaches automaticity, the less it depends on motivation, willpower, or favorable conditions. It just runs.

This maps precisely onto cognitive agent design. An agent that requires you to be well-rested, undistracted, and motivated is an agent that depends on conditions you don't control. An agent simple enough to fire regardless of your state — that's an agent that reaches automaticity. And an automatic agent is a reliable agent.

Gall's Law: complex systems that work evolved from simple ones

John Gall, writing in Systemantics (1975), articulated a principle that systems engineers have rediscovered in every decade since: "A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system."

This is Gall's Law, and it applies to cognitive infrastructure as directly as it applies to software architecture. You cannot design a twelve-step weekly review from scratch and expect it to work. You can design a one-step weekly review — ask yourself one question every Sunday — get it running reliably, then add a second step once the first is automatic, then a third. The complex system emerges from the simple system through incremental extension, not top-down design.

The KISS principle in software engineering reflects the same insight. Kelly Johnson, the lead engineer at Lockheed's Skunk Works who built the U-2 and SR-71, required that every system his team designed be repairable in the field by a mechanic with basic tools. Not because simplicity was aesthetically pleasing — because simplicity was the only path to reliability under real operating conditions. An aircraft that needs a specialized lab to fix is, functionally, an aircraft that can't be fixed.

Your cognitive agents operate under field conditions, not lab conditions. You deploy them when you're tired, rushed, distracted, or stressed. A complex agent that works perfectly in a calm Saturday morning planning session is useless if it can't fire on a chaotic Tuesday afternoon. The field-repairable version — the version simple enough to run under any conditions — is the one that matters.

The AI parallel: small reliable models vs. large unreliable ones

The same pattern appears in production AI systems. In 2024 and 2025, as organizations deployed large language models at scale, a consistent finding emerged: hybrid architectures using simple rule-based systems as guardrails around complex models outperformed either approach alone. The complex model handled nuance and generation. The simple rules ensured compliance, consistency, and predictable behavior.

More telling: in high-stakes production environments — healthcare, finance, nuclear safety — rule-based systems remain the standard precisely because their behavior is deterministic and auditable. They don't hallucinate. They don't drift. They don't produce brilliant output 90% of the time and catastrophic output the other 10%. They produce adequate output 100% of the time.

This is the reliability-sophistication tradeoff made concrete. A smaller model that returns a correct answer every time is more valuable in production than a larger model that returns a brilliant answer most of the time and a dangerous answer occasionally. Google's engineering teams learned this in search ranking; Stripe learned it in fraud detection; every organization that has tried to replace simple business rules with complex ML pipelines has learned it: you cannot deploy what you cannot trust to fire correctly every time.

Your cognitive agents face the same deployment constraint. You need agents that produce useful output every time they fire, not agents that produce profound output on their best days and nothing on their worst.

The compounding math: why firing rate dominates

Consider two agents over 90 days:

Agent A is sophisticated. When it fires, it produces high-quality output — call it 10 units of value per firing. But it only fires 30% of the time because it requires conditions that aren't always present. Over 90 days: 27 firings, 270 total units of value.

Agent B is simple. Each firing produces modest output — 4 units of value. But it fires 95% of the time because it's simple enough to run under any conditions. Over 90 days: 85 firings, 340 total units of value.

Agent B wins, and the gap widens over time because of compounding effects. Each firing of a cognitive agent doesn't just produce isolated output — it reinforces the agent itself, strengthens the contextual trigger, and builds automaticity. Agent B's 85 firings make it increasingly automatic and increasingly likely to fire tomorrow. Agent A's 27 firings, punctuated by 63 missed days, produce decay between firings — each restart costs friction, and the agent never reaches automaticity.

This is before accounting for the second-order effect: a reliable simple agent creates a stable foundation on which you can layer additional complexity. Agent B, once automatic, can be extended — first to 5 units of value per firing, then 6, then 7 — without sacrificing its firing rate. Agent A, never having achieved reliability, cannot be extended because there's nothing stable to extend.

What this means for your infrastructure

After the agent audit in L-0408, you have an inventory of agents — some designed, some default. Now apply the reliability filter:

For each agent, ask two questions. First: what is this agent's firing rate over the last 30 days? If you don't know, that's your answer — it's low. Second: what is the simplest version of this agent that could achieve a firing rate above 90%?

The simplest version might feel embarrassingly minimal. A "daily reflection" agent might reduce to writing one sentence. A "decision protocol" might reduce to pausing for five seconds before responding. A "weekly review" might reduce to reading your task list once on Sunday. That's fine. The minimal version that fires is infinitely more valuable than the maximal version that doesn't.

Resist the urge to add complexity before the base is stable. You will feel the pull. The simple version feels too easy, too shallow, too unsophisticated for someone who understands cognitive infrastructure at the level you're building toward. But Gigerenzer's research, Lally's habit data, and Gall's Law all converge on the same point: sophistication that doesn't fire is worth zero, and simplicity that fires every day compounds into something more powerful than sophistication ever could.

The order of operations is non-negotiable. First, build agents that fire. Then, and only then, make them fire better. Reliability is not the ceiling — it's the floor. But without the floor, there is no building.