Accuracy optimization

Fast and wrong is still wrong

You optimized for speed. Your morning routine fires in ninety seconds. Your email triage empties the inbox in twelve minutes. Your decision heuristic picks a direction before the meeting ends. Everything moves fast. And half of it is aimed at the wrong target.

Speed optimization, which the previous lesson addressed, solves latency — how quickly an agent responds to its trigger. But speed says nothing about whether the agent's output is correct. A sniper who pulls the trigger faster is only more dangerous if they hit what they are aiming at. Otherwise, they are just making noise sooner.

Accuracy optimization is the discipline of making your agents produce the right outcome more often. Not faster outcomes. Not more outcomes. Right outcomes. And this distinction matters more than most people realize, because the cost of a wrong action almost always exceeds the cost of a slow one.

The speed-accuracy tradeoff is real and non-negotiable

Experimental psychology has studied this for over a century. Wayne Wickelgren's 1977 research on speed-accuracy tradeoff functions established the foundational finding: as response speed increases, accuracy decreases along a predictable curve. This is not a cultural preference or a personality quirk. It is a structural feature of information processing in biological systems.

The mechanism is straightforward. Accurate responses require evidence accumulation — your brain (or your process, or your habit) needs time to gather enough signal to distinguish the correct response from plausible alternatives. When you compress response time, you cut off evidence accumulation before the signal clears the noise. You act on partial information. Sometimes you get lucky. Often you do not.

Fitts' Law, formalized in 1954, quantified this for motor tasks: the time required to hit a target is a function of the distance to and the size of that target. Smaller targets (higher accuracy requirements) demand more time. This law has been replicated across hundreds of studies in human-computer interaction, sports science, and ergonomics. The tradeoff is not optional. You can choose where to sit on the curve, but you cannot escape the curve.

The practical implication: every time you optimize an agent for speed without measuring accuracy, you are likely sliding down the accuracy curve without realizing it. You feel faster. You are also wronger.

What accuracy actually means: bias, noise, and calibration

Accuracy is not a single thing. In measurement science, accuracy decomposes into two independent components: bias (systematic error) and noise (random error). An accurate system has low bias and low noise. Understanding this decomposition is essential for knowing what to fix.

Bias is consistent directional error. Your morning agent always picks the urgent task over the important one — not sometimes, but systematically. Your hiring process consistently favors candidates who remind you of yourself. Your financial estimates always run 20% optimistic. Bias means you are reliably off-target in a predictable direction.

Noise is inconsistent variability. You pick the right priority on Monday, the wrong one on Wednesday, and the right one again on Friday — for no discernible reason. Daniel Kahneman, Olivier Sibony, and Cass Sunstein documented this extensively in Noise: A Flaw in Human Judgment (2021). Their central finding: wherever there is human judgment, there is more noise than you think. The same doctor gives different diagnoses depending on the time of day. The same judge gives different sentences depending on whether the local football team won. The same you makes different decisions about the same situation depending on how much sleep you got.

The key insight from Kahneman's work is that noise and bias contribute roughly equally to total error in most judgment tasks, but organizations (and individuals) almost exclusively focus on bias. They build systems to correct systematic mistakes while ignoring the variability that silently degrades accuracy from one instance to the next.

Calibration bridges both. Philip Tetlock's research on forecasting — culminating in the Good Judgment Project that tracked over 5,000 forecasters across four years — showed that superforecasters (the top 2% for accuracy) were distinguished primarily by their calibration. When they said something had a 70% chance of happening, it happened about 70% of the time. Their accuracy came not from secret knowledge but from honest self-assessment of their own certainty. Poorly calibrated forecasters were confident when they should have been uncertain, and uncertain when they actually had good information. Calibration is knowing how accurate you are, which turns out to be a prerequisite for becoming more accurate.

The cost asymmetry: why accuracy usually matters more than speed

W. Edwards Deming built the modern quality movement on a counterintuitive insight: investing in accuracy reduces total cost. The framework he helped establish — Cost of Quality — shows that the cost of poor quality (rework, returns, wasted effort, cascading failures) typically consumes 15% to 35% of an organization's operating budget. Fixing an error downstream costs orders of magnitude more than preventing it upstream.

This principle transfers directly to personal agents. Consider the cost structure of a wrong decision:

Direct cost: The time and energy spent executing the wrong action.
Rework cost: The time and energy spent undoing the wrong action and doing the right one instead.
Opportunity cost: Whatever you would have accomplished if you had done the right thing first.
Cascade cost: Every downstream decision that was contaminated by the wrong starting point.

A five-minute accuracy check that prevents a two-hour wrong-direction sprint has a 24:1 return on investment. A thirty-second pause before sending an email that prevents a week-long conflict has an incalculable return. Speed optimization saves minutes. Accuracy optimization saves days.

This is why Deming's third point — "cease dependence on inspection to achieve quality" — is so radical. He argued that catching errors after they happen (inspection) is structurally inferior to preventing errors from happening (building accuracy into the process). Applied to your agents: do not run the agent fast and then check if it was right. Build the accuracy check into the agent so it cannot easily fire wrong.

How to optimize agent accuracy: the error reduction protocol

Machine learning provides a useful framework here. In ML evaluation, you measure accuracy through precision (of the actions you took, how many were correct?) and recall (of the correct actions available, how many did you take?). Improving accuracy means improving at least one of these without destroying the other.

For personal agents, the protocol has four steps:

Step 1: Measure your current hit rate. You cannot optimize what you do not measure. For one week, score each firing of a target agent: did it produce the intended outcome? Not "did it feel productive" — did it actually hit the target? Most people discover their agents are less accurate than they assumed. A planning agent that picks priorities might have a 50% hit rate once you track what actually mattered by end of week. A communication agent that drafts responses might produce rework-requiring outputs 30% of the time. Get the number.

Step 2: Classify your errors. Not all misses are the same. Are your errors biased (consistently wrong in one direction) or noisy (randomly wrong in different directions)? Biased errors need a correction factor — a rule that counteracts the systematic pull. If you always underestimate task duration, add a 1.5x multiplier. Noisy errors need a different intervention — standardization, checklists, or environmental controls that reduce variability.

Step 3: Install a targeted checkpoint. Based on your error classification, add a single verification step at the point where errors enter the process. This is not a general "be more careful" instruction — that produces no measurable improvement. It is a specific check: "Before starting the top task, confirm it appears on this week's goal list." "Before sending this message, re-read it from the recipient's perspective." "Before committing to this estimate, check it against the last three actuals." One checkpoint, targeted at the dominant error type.

Step 4: Measure again and iterate. Run the modified agent for a comparable period. Did the hit rate improve? Did the checkpoint add unacceptable latency? If accuracy improved and the time cost is tolerable, keep it. If accuracy did not improve, you misdiagnosed the error — go back to step 2. If accuracy improved but the time cost is too high, look for a faster checkpoint that catches the same errors.

Deliberate practice: the meta-skill of accuracy improvement

K. Anders Ericsson's research on deliberate practice, spanning decades of study across domains from surgery to chess to music, identified a consistent pattern in how experts achieve and maintain high accuracy. The defining characteristic of deliberate practice is not repetition — it is error-focused repetition with immediate feedback.

Ericsson studied surgeons whose error rates dropped dramatically over 12-year periods — but only for error types where the surgeon could identify the mistake, trace its cause, and adjust their technique for the next instance. Error types without clear feedback signals showed no improvement regardless of how many repetitions accumulated. The volume of practice was irrelevant. The accuracy of the feedback loop was everything.

This maps directly onto agent optimization. An agent you run without tracking outcomes is rote repetition — it does not improve. An agent you run while scoring accuracy, classifying errors, and adjusting the process is deliberate practice applied to your own cognitive infrastructure. The agent improves because you built feedback into it.

Ericsson's framework also explains why most people plateau. They develop an agent, run it until it becomes automatic, and then stop paying attention to accuracy. The agent ossifies. Errors become invisible because they are consistent. Accuracy degrades slowly, masked by the comfort of routine. Optimization requires periodically pulling an automated agent back into conscious attention, measuring its current accuracy, and asking whether it can be improved.

The accuracy-reliability bridge

Accuracy and reliability are related but distinct. Accuracy asks: when this agent fires, does it hit the right target? Reliability, which the next lesson addresses, asks: does this agent fire consistently when it should?

You can have a highly accurate agent that is unreliable — it produces excellent results when it fires, but it only fires 40% of the time it should. You can have a highly reliable agent that is inaccurate — it fires every single time on cue, but it hits the wrong target half the time. The optimization sequence matters: accuracy before reliability, because there is no value in making an inaccurate agent fire more consistently. You would just be scaling your error rate.

Once your agent's accuracy is at an acceptable level — you have measured it, reduced the dominant error types, and installed the checkpoints that keep it on target — then reliability optimization becomes the next leverage point. A well-aimed agent that fires every time is the compound outcome of these two lessons working in sequence.

The uncomfortable truth about accuracy

Most agents in your current system have never been measured for accuracy. You assume they work because they fire. You assume they hit the target because you feel productive after they run. But feeling productive and being accurate are different things, in the same way that feeling confident and being calibrated are different things.

Tetlock's forecasting research found that the least accurate forecasters were often the most confident. They had strong opinions, moved fast, and committed fully — all traits that feel like effectiveness. The superforecasters were slower, more uncertain, more willing to update. They felt less decisive. They were dramatically more accurate.

Your agents deserve the same honest scrutiny. Measure them. Score the outcomes. Classify the errors. Install the fixes. Do it for the agents that matter most — the ones governing how you spend your time, how you make decisions, how you communicate with people whose opinions shape your trajectory.

An agent that acts fast but wrong is worse than one that acts slowly but right. And an agent that acts at the right speed with verified accuracy is better than both.