Speed optimization

The speed of an agent determines whether it fires at all

In L-0568, you learned the distinction between optimization and innovation — when to improve an existing agent and when to replace it entirely. You decided to optimize. Now you face the first and most consequential optimization axis: speed.

Speed seems like a straightforward metric. Faster is better. But in the context of personal agents — the habits, routines, and cognitive processes that constitute your epistemic infrastructure — speed operates through a mechanism most people miss. The primary value of a faster agent is not that it saves you time. It is that it reduces the friction that prevents the agent from firing in the first place. A morning review that takes four minutes will happen almost every day. The same review at fourteen minutes gets skipped whenever you're running late, feeling tired, or slightly pressed. The output of both versions might be identical. But the four-minute version produces that output five times a week, and the fourteen-minute version produces it three times a week. Over a year, the faster agent delivers 60% more total value — not because each execution is better, but because there are more executions.

This is why speed optimization is the first optimization axis you should address, before accuracy, before reliability, before scope. A slow agent that runs rarely is worse than a slightly less accurate agent that runs consistently. Speed is the foundation on which all other optimization axes depend.

Three response time thresholds that govern human interaction

Jakob Nielsen's research on human-computer interaction, published through the Nielsen Norman Group across decades of empirical study, identified three response time thresholds that fundamentally shape the user's experience. These thresholds are not arbitrary design guidelines. They are rooted in the limits of human perception and attention.

0.1 seconds (100 milliseconds). Below this threshold, the response feels instantaneous — the user perceives the outcome as caused by their action, not by the system. This is the threshold of direct manipulation. When you flip a light switch and the light appears within 100ms, you experience yourself as causing the light. Add a 500ms delay and the experience shifts: the switch caused the system to turn on the light. The sense of agency diminishes.

1 second. Below this threshold, the user maintains uninterrupted flow of thought. They notice the delay but do not lose their place in the task. Their working memory holds the context. Above one second, the user begins to feel that they are waiting, and their cognitive thread frays.

10 seconds. Beyond this threshold, the user's attention detaches from the task entirely. They begin thinking about other things, checking other applications, switching to other work. The interaction is effectively broken. Resuming it after a 10+ second delay requires the user to rebuild their mental context — a costly cognitive operation that many users simply decline to perform. They abandon the task instead.

These thresholds were established for digital interfaces, but the perceptual limits they reflect are universal. They apply to any interaction between an agent (a person, a routine, a system) and the person it serves. A morning routine that requires a 30-second load time (opening apps, finding files, waiting for pages to render) operates above the 10-second threshold — the user's attention has already wandered before the routine begins. The cognitive context that makes the routine effective has dissipated.

Amazon's internal research quantified this effect in commercial terms: every 100 milliseconds of additional page load latency cost them 1% of sales. Akamai found that a 100-millisecond delay in page load time reduced conversion rates by 7%. These are not small effects produced by large delays. These are large effects produced by delays most people cannot consciously detect. Speed operates below the threshold of awareness, which is precisely why it is so powerful — and so consistently underestimated.

Frederick Taylor's insight: decompose, measure, eliminate

The systematic study of speed optimization begins with Frederick Winslow Taylor's time-motion studies, published in The Principles of Scientific Management in 1911. Taylor's method was simple and, for its era, revolutionary: observe a worker performing a task, decompose the task into its individual motions, time each motion with a stopwatch, identify which motions are productive and which are waste, then redesign the task to eliminate the waste.

In his famous study of shoveling at Bethlehem Steel, Taylor found that workers using their own shovels — of various sizes and conditions — moved material at widely different rates. By studying the relationship between shovel load weight and worker output, he determined that the optimal load was 21 pounds. He then provided shovels sized to hold exactly 21 pounds of the specific material being moved. Productivity increased by a factor of three to four.

Taylor's critics — and there are legitimate ones — focused on the dehumanizing potential of reducing workers to optimized motion sequences. That critique has force in industrial labor contexts. But Taylor's analytical method is sound and transfers directly to personal agent optimization: decompose the agent into steps, time each step, identify which steps are execution (contributing to the output) and which are overhead (contributing only to the process), then redesign to minimize overhead.

The insight most people miss in Taylor's work is that the largest gains came not from making productive motions faster, but from eliminating unproductive motions entirely. The shovelers didn't learn to move their arms faster. They got shovels that loaded the right amount in one motion instead of requiring two adjustments. The optimization was structural, not muscular. The same principle applies to your agents: the biggest speed gains come from removing unnecessary steps, not from performing necessary steps faster.

Amdahl's Law: optimize the bottleneck, not the component you like

Gene Amdahl formalized the mathematics of speed optimization in 1967 with what is now known as Amdahl's Law. The law states that the overall speedup of a system is limited by the fraction of time spent on the component that cannot be improved. More precisely: if a component that takes 5% of total execution time is made infinitely fast, the system speeds up by at most 5%. If a component that takes 80% of total execution time is made 50% faster, the system speeds up by 40%.

The implications are severe and consistently ignored.

People optimize the components they understand, the components they enjoy optimizing, or the components that are most visible — not the components that are the actual bottleneck. A person who spends 90 minutes on their weekly review — 20 minutes gathering information, 50 minutes processing it, 20 minutes writing action items — might spend hours building a beautiful template for the action items. The template cuts the writing step from 20 minutes to 10 minutes. Total time: 80 minutes. A 10-minute improvement from substantial effort. If instead they had automated the information gathering — pulling data from their tools into a single view — they might have cut the gathering step from 20 minutes to 3 minutes. Total time: 73 minutes. A 17-minute improvement, nearly double the gain, and likely less effort to implement.

Amdahl's Law is a focusing device. Before optimizing any component, ask: what fraction of total execution time does this component consume? If the answer is less than 20%, optimizing it cannot produce a meaningful system-level improvement. Find the step that consumes the most time and start there — even if that step is harder to optimize, even if it is less interesting to work on, even if you don't immediately see how to improve it.

BJ Fogg's Behavior Model: speed as the ability variable

BJ Fogg's research at the Stanford Behavior Design Lab produced a model of behavior that makes speed optimization's importance mathematically clear. The Fogg Behavior Model states that a behavior occurs when three elements converge at the same moment: Motivation, Ability, and a Prompt. If any element is insufficient, the behavior does not occur.

Ability, in Fogg's framework, is defined as simplicity — how easy or frictionless the behavior is to perform. Fogg identified six components of simplicity: time, money, physical effort, cognitive effort, social deviance, and non-routine. Time is first on the list, and in most personal agent contexts, it is the dominant factor.

Here is the critical insight: when you make an agent faster, you are directly increasing the Ability component of the Fogg model. You are making the behavior easier to perform. And because Behavior = Motivation x Ability x Prompt, increasing Ability means the behavior can occur at lower levels of Motivation. A 14-minute morning review requires substantial motivation to initiate — you need to really want to do it. A 4-minute version can be initiated at much lower motivation levels — mild intention is sufficient.

This is why speed optimization affects execution frequency, not just execution duration. You're not just saving time per execution. You're lowering the motivation threshold required for each execution. On mornings when your motivation is high, both the 14-minute and 4-minute versions run. On mornings when your motivation is moderate, only the 4-minute version runs. On mornings when your motivation is low, neither runs. The faster agent captures the moderate-motivation days that the slower agent misses. Over time, those captured days compound into a fundamentally different outcome trajectory.

Fogg's research also identified a complementary principle: friction works in both directions. To stop an unwanted behavior, add friction. To sustain a desired behavior, remove it. Every second of overhead in your agent is friction opposing the behavior you want. Removing that friction doesn't feel like optimization. It feels like nothing happened. That's the point — the best speed optimization is the one you don't notice because the agent fires so smoothly that execution feels automatic.

Habit automaticity: speed is what makes agents self-sustaining

Phillippa Lally's 2010 study at University College London, published in the European Journal of Social Psychology, tracked 96 participants as they attempted to form new habits. Participants chose an eating, drinking, or activity behavior to perform daily in the same context for 12 weeks. Each day, they rated the automaticity of the behavior — how much it felt effortless, unconscious, and self-initiating rather than deliberate and forced.

The results showed that automaticity followed an asymptotic curve: rapid initial gains that gradually plateaued. The average time to reach 95% of peak automaticity was 66 days, but with enormous individual variation — from 18 to 254 days. More relevant for speed optimization: simpler behaviors reached automaticity faster. Drinking a glass of water after breakfast became automatic far sooner than doing 50 sit-ups after breakfast. Complexity and duration directly slowed the path to automaticity.

This finding has a direct implication for agent design. Every agent you build exists on a spectrum from fully deliberate (requires conscious effort and attention each time) to fully automatic (fires with minimal conscious involvement, triggered by context cues). You want your agents to move toward the automatic end of that spectrum, because automatic agents are reliable — they fire consistently without depending on your willpower, motivation, or memory.

Speed is the primary lever for moving along that spectrum. A shorter, faster agent provides more repetitions in the same calendar period, and each repetition deepens the associative link between context cue and behavior. A 4-minute agent executed daily for 66 days delivers 66 repetitions. A 14-minute agent that gets skipped twice a week delivers only 47 repetitions in the same period. The faster agent reaches automaticity sooner because it accumulates the repetitions that drive habit formation more efficiently.

Wendy Wood's research at the University of Southern California extends this finding. Wood's work, including her 2016 review in the Annual Review of Psychology and her subsequent research through 2024, establishes that approximately 40% of people's everyday actions are habitual — triggered automatically by contextual cues rather than by deliberate intention. The mechanism is associative learning: repeated performance of an action in a consistent context creates a mental shortcut where perception of the context directly activates the behavior.

The speed of an agent determines how quickly this associative link forms. Faster agents get repeated more often. More repetitions produce stronger associations. Stronger associations produce more reliable automatic activation. Speed optimization is not just about efficiency — it is about building the neural infrastructure that makes your agents self-sustaining.

Kahneman's dual systems: speed shifts agents from System 2 to System 1

Daniel Kahneman's dual-process framework, synthesized in Thinking, Fast and Slow (2011), distinguishes between two modes of cognitive operation. System 1 operates automatically and quickly, with little or no effort and no sense of voluntary control. System 2 allocates attention to effortful mental activities that demand it, including complex computations and deliberate choice.

Every new agent starts in System 2. It requires conscious attention, deliberate initiation, and effortful execution. This is costly — System 2 has limited capacity, and every agent that requires its involvement competes with every other demand on your deliberate attention. An infrastructure built entirely on System 2 agents cannot scale, because there are only so many things you can pay conscious attention to in a day.

The goal of agent optimization is to migrate agents from System 2 to System 1 — from effortful and deliberate to automatic and effortless. Speed is the primary mechanism for this migration. Faster agents require less System 2 engagement per execution. Less engagement per execution means less cognitive resistance to initiating the agent. Less resistance means more repetitions. More repetitions build the automaticity (per Lally's research) that eventually allows the agent to operate in System 1 entirely.

Consider the difference between a morning journaling practice that takes 20 minutes (requiring you to actively decide to sit down, open the journal, think about what to write, and sustain attention for the full duration) versus a 3-minute version (three sentences in a pre-opened document about one question). The 20-minute version lives permanently in System 2 — it will always require a deliberate decision to begin. The 3-minute version can migrate to System 1 — after enough repetitions, you find yourself writing the three sentences without having consciously decided to start. The shorter duration enabled the transition that makes the practice self-sustaining.

The speed optimization protocol

Applying these principles to your own agents requires a systematic approach. Here is the protocol.

Step 1: Measure total execution time. Time the agent from trigger to completion. Use a stopwatch or timer, not estimation — people consistently underestimate how long their routines take. Include setup time, transition time, and teardown time, not just the "productive" portion.

Step 2: Decompose into steps. List every discrete action within the agent. Be granular: "open laptop," "navigate to task manager," "wait for page to load," "scan task list," "identify priorities" — each is a separate step. This is Taylor's method applied to your personal infrastructure.

Step 3: Classify each step. Mark each step as either execution (directly contributes to the agent's output) or overhead (necessary for the process but does not contribute to output — setup, navigation, context-switching, waiting, searching). Calculate the overhead percentage.

Step 4: Apply Amdahl's Law. Identify the single step that consumes the most time. This is your bottleneck. Calculate how much total time you'd save by cutting that step by 50%. Compare with the savings from cutting any other step by 50%. Optimize the bottleneck first.

Step 5: Eliminate before you optimize. For each overhead step, ask: can this step be eliminated entirely rather than made faster? Pre-staging materials, using templates, batching similar operations, reducing tool switches — these structural changes often produce larger gains than incremental speed improvements to existing steps.

Step 6: Redesign and measure. Implement the changes. Time the new version. Compare not just the total duration but, over the following week, the execution frequency. A faster agent that runs more often is the target outcome. If the agent is faster but doesn't run more often, the bottleneck wasn't speed — revisit your analysis.

Step 7: Protect the output. After every speed optimization, verify that the agent's output quality has not degraded. Run the optimized agent for a full week and compare outputs to the pre-optimization baseline. Speed that comes at the cost of accuracy is not optimization — it is sabotage. This check is your bridge to L-0570.

Speed optimization in AI agent contexts

Everything in this lesson applies with amplified force to AI-mediated agents — the prompts, workflows, and automated processes you build with language models and other AI tools.

AI inference speed has improved dramatically: NVIDIA's Blackwell architecture delivers over 1,000 tokens per second per user, a 15x improvement over previous generation hardware, and inference costs have fallen 280-fold since late 2022. Techniques like speculative decoding, KV caching, and FlashAttention have produced 5-10x performance improvements for production deployments.

But the speed optimization that matters most for your personal AI agents is not hardware-level inference speed. It is workflow-level friction reduction. The latency that determines whether you use an AI agent is not the model's response time — it is the total time from "I have a need" to "I have a result." That includes: formulating the prompt, switching to the right tool, providing context, waiting for output, evaluating the result, and iterating if needed.

Apply the same protocol. Decompose the total workflow. Identify overhead versus execution. Cut the overhead. Pre-stage context. Use templates for recurring prompt patterns. Reduce the number of tool switches. Batch similar requests. The AI model's speed is already fast. Your surrounding workflow is almost certainly the bottleneck.

The Fogg model applies identically. Every second of friction between "I have a question" and "I'm asking the AI" is a barrier that reduces execution frequency. People who use AI tools productively have not developed more motivation. They have reduced the friction to the point where using the tool requires almost no motivational activation. The tool is open, the context is loaded, the prompt pattern is familiar. The behavior fires at System 1 speed.

From speed to accuracy

Speed optimization makes your agents fire more often. But firing frequently is only valuable if the output is worth having. A morning review that runs in four minutes every day but produces wrong priorities is worse than one that runs in fourteen minutes three times a week but produces right priorities — because the fast, wrong version sends you confidently in the wrong direction with high consistency.

The next lesson — L-0570, Accuracy optimization — addresses this complementary axis. Where speed optimization asks "how do I make this agent fire more often?", accuracy optimization asks "how do I make this agent produce better outcomes when it fires?" The two are not independent. Speed changes create accuracy risks (cutting steps may cut necessary verification). Accuracy requirements create speed constraints (thorough checking takes time). Optimizing either axis in isolation produces a broken agent. The skill is optimizing both simultaneously, which requires understanding each on its own terms before integrating them.

You now know how to make your agents faster. Next, you learn how to make them more precise.

Sources:

Nielsen, J. (1993). "Response Times: The 3 Important Limits." Nielsen Norman Group. Updated and expanded through 2024.
Taylor, F. W. (1911). The Principles of Scientific Management. Harper & Brothers.
Amdahl, G. M. (1967). "Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities." AFIPS Conference Proceedings, 30, 483-485.
Fogg, B. J. (2020). Tiny Habits: The Small Changes That Change Everything. Houghton Mifflin Harcourt. Behavior Model: behaviormodel.org.
Lally, P., van Jaarsveld, C. H. M., Potts, H. W. W., & Wardle, J. (2010). "How are habits formed: Modelling habit formation in the real world." European Journal of Social Psychology, 40(6), 998-1009.
Wood, W., & Runger, D. (2016). "Psychology of Habit." Annual Review of Psychology, 67, 289-314.
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
Amazon internal latency study (2006). Referenced in Greg Linden's presentation at Stanford, later confirmed by multiple Amazon engineers. Also: Akamai (2017), "Akamai Online Retail Performance Report."