The agent audit

You don't know what's running

You think you know your own behavior. You have a self-image: the kind of person who makes deliberate choices, who acts on values, who is mostly in control. That image is wrong — not because you're bad at self-awareness, but because the architecture of human cognition makes accurate self-inventory nearly impossible without external tools.

Wendy Wood's landmark experience-sampling research at Duke found that approximately 43 percent of daily actions are performed habitually — repeated in the same context, usually while thinking about something else entirely (Wood, Quinn, and Kashy, 2002). Not 43 percent of trivial actions. Forty-three percent of all actions. Participants in the study reported that during habitual behavior, their thoughts were frequently unrelated to what they were doing. They were executing behavioral programs without conscious oversight.

This means that nearly half of what you do each day is running on code you didn't write — or at least didn't write recently. Before you design new agents, you need to audit the ones already in production. That's the agent audit: a systematic inventory of every behavioral agent currently operating in your life, sorted by whether you designed it or whether it installed itself.

What an agent audit actually is

In the previous lessons, you learned that an agent is a system that acts on your behalf, and that agents can be internal (running in your mind) or external (embedded in tools and environments). The agent audit takes that framework and applies it as a diagnostic tool.

The audit has three steps:

Observe — Record what you actually do, in real time, without editing or interpreting.
Classify — Sort each observed behavior into designed (you intentionally installed it) or default (it arrived through habit, culture, social pressure, or environment).
Map — For each agent, identify its trigger, its action, and its typical outcome.

This isn't introspection. Introspection asks "why do I do this?" The audit asks "what is actually happening?" The distinction matters because the first question invites rationalization while the second demands evidence.

The research case for self-monitoring

Self-monitoring — the systematic observation and recording of one's own behavior — is one of the most studied interventions in behavioral psychology, and the findings are consistent: the act of monitoring a behavior changes the behavior.

Frederick Kanfer's foundational work on self-regulation established that self-monitoring triggers a three-step chain: observation leads to self-evaluation, which leads to self-administered consequences that alter behavior frequency (Kanfer, 1970, 1975). When you record a behavior, you can't help but evaluate it against your standards. That evaluation creates pressure — usually in the direction of social norms and personal values. Positively valued behaviors tend to increase when monitored; negatively valued behaviors tend to decrease.

This is called reactivity — the phenomenon where observation itself produces change. Nelson and Hayes (1981) identified multiple mechanisms behind this reactivity, including increased cue salience (you notice triggers you previously missed), enhanced self-evaluation (you compare your behavior to your goals), and motivational shifts (recording creates accountability even when no one else sees the record).

The practical implication is powerful: you don't need to change your behavior first. You just need to watch it. The watching initiates the change.

Modern research on Ecological Momentary Assessment (EMA) extends this principle with real-time, context-sensitive measurement. Instead of asking people to recall their behavior at the end of a day or week — which introduces massive retrospective bias — EMA prompts participants to report what they're doing in the moment it's happening. A 2024 study in the Journal of Medical Internet Research demonstrated that EMA captures dynamics of change that weekly questionnaires systematically miss, including micro-patterns and fluctuations invisible to retrospective recall (Tamm et al., 2024). The hourly-timer technique in this lesson's exercise borrows directly from EMA methodology. You're running your own momentary assessment protocol.

CBT thought records as audit technology

Cognitive Behavioral Therapy offers a precise tool for auditing the agents running in your mind: the thought record.

A standard CBT thought record captures five elements: the situation (trigger), the automatic thought (what your mind generated), the emotion produced, the evidence for and against the thought, and a more balanced alternative thought. This structure was developed by Aaron Beck and refined over decades of clinical research. A 2017 study published in Biological Psychology found that completing a single thought record reduced cortisol reactivity to stress, demonstrating that the act of externalizing and examining automatic thoughts produces measurable physiological change, not just subjective improvement (Smyth et al., 2017).

What makes thought records relevant to the agent audit is the concept of automatic thoughts — cognitions that arise spontaneously, without deliberation, in emotionally relevant situations. These are mental agents. They have triggers (specific situations), they execute a fixed pattern (the thought), and they produce outcomes (emotions, behaviors, further thoughts). You didn't design most of them. They were installed by childhood experiences, cultural conditioning, repeated emotional patterns, and social reinforcement.

The agent audit borrows the thought record's core insight: if you don't externalize what's running, you can't evaluate it. A thought that stays inside your head feels like truth. A thought written on paper, with evidence for and against it listed alongside, becomes a hypothesis you can test.

You don't need to be in therapy to use this technique. You need a piece of paper with columns: Situation, What I did or thought, Was this designed or default?, What triggered it?, What was the outcome? That's a thought record repurposed as an agent inventory.

The AI parallel: model cards and system observability

The same audit principle operates in artificial intelligence — and the AI version illustrates why the human version matters.

In 2019, Margaret Mitchell and Timnit Gebru (then at Google) introduced the concept of model cards: standardized documentation that accompanies every machine learning model, describing what the model does, what data it was trained on, what its intended uses are, what its limitations are, and where it's likely to fail (Mitchell et al., 2019). Model cards exist because the AI field learned a painful lesson: models deployed without documentation produce unpredictable failures. You can't govern what you haven't inventoried.

Red Hat's 2025 extension of this concept — AI system cards — pushes the principle further. System cards document not just individual models but entire AI systems: their components, data flows, failure modes, security controls, and attack surfaces. The argument is that auditing a single model in isolation misses the interactions between models, databases, interfaces, and human operators that produce real-world outcomes.

The parallel to your own cognitive system is direct. You are not a single agent. You are a system of agents — some deliberately designed, many inherited, all interacting. Auditing one habit in isolation misses the interactions. Your "check email first thing in the morning" agent interacts with your "respond to urgent requests immediately" agent, which interacts with your "feel anxious when inbox count is high" agent, which interacts with your "skip the planned deep work block" agent. The system produces an outcome that no single agent intended.

Production AI systems use observability — real-time logging of what the system is actually doing, as distinct from what it was designed to do. Dashboards track model drift, anomalies, and unexpected outputs. The whole point is that the system's behavior in production diverges from the system's behavior in testing, and you need instruments to detect the gap.

Your agent audit is your observability layer. It's the monitoring infrastructure that shows you the gap between who you think you are and what you actually do.

How to run the audit

The exercise for this lesson gives you the full protocol, but here's the logic behind it:

Duration: 48 hours minimum. Shorter windows miss too many contexts. You need at least two full days to capture weekday patterns, transitions between work and rest, social versus solitary behavior, and the difference between morning and evening operation.

Method: Hourly sampling. Set a timer. When it fires, record three things: (1) what you were doing, (2) whether it was deliberate or automatic, and (3) what triggered it. This is your personal EMA protocol. Don't try to record everything — that's unsustainable. Sample hourly and you'll capture enough signal to map the system.

Classification: Designed versus default. A designed agent is one you can point to a specific decision for: "I decided to go for a run every morning after my coffee." A default agent is one where you can't identify that decision: "I just... always check my phone when I sit down on the couch." The classification doesn't need to be perfect. The act of asking the question surfaces awareness that wasn't there before.

Output: An agent inventory. After 48 hours, you should have a list that looks something like this:

| Agent | Type | Trigger | Action | Outcome | | --------------------- | -------- | ----------------------------- | ---------------------- | ------------------------------- | | Morning run | Designed | Coffee finished | Run 3 miles | Energy, clarity | | Phone check on wake | Default | Eyes open | Scroll notifications | Anxiety, lost time | | Agreement in meetings | Default | Someone proposes idea | Nod, say "sounds good" | Avoid conflict, miss objections | | Weekly review | Designed | Sunday 9am calendar block | Review goals and tasks | Alignment, prioritization | | Stress eating at 3pm | Default | Energy dip + work frustration | Sugar/carbs | Brief relief, crash at 4pm |

Your table will be longer. Most people discover between 20 and 40 distinct behavioral agents in a 48-hour window. The ratio of designed to default is usually shocking — not because you're undisciplined, but because the system was never designed to be fully conscious. Default agents are how humans operate efficiently. They're not failures. But you can't improve what you haven't inventoried.

What the audit reveals

Three patterns consistently emerge from agent audits:

Default agents cluster around transitions. The moment you wake up, the moment you arrive at work, the moment you finish a meeting, the moment you get home — these transition points are where default agents fire most reliably. Transitions create brief moments of ambiguity (what should I do next?), and default agents rush in to fill the gap.

Designed agents are often weaker than you think. You may have "designed" a morning meditation practice, but if your audit shows you actually meditate three days out of seven, that agent has a 43 percent reliability rate. This isn't a moral failing. It's an engineering observation. The agent's trigger-action coupling is too weak to fire consistently — which becomes the focus of the next lesson on agent reliability.

The highest-impact agents are often ones you didn't design. Your automatic response to criticism (defend, deflect, or withdraw), your default behavior when alone with unstructured time (scroll, snack, or work), your habitual communication pattern in conflict (pursue, avoid, or freeze) — these shape your life more than any deliberate system you've built. They're the load-bearing walls of your behavioral architecture, and most people have never inspected them.

The audit is not a judgment

One final point, because this is where people get stuck. The agent audit is a map, not a verdict. You're not cataloging sins. You're reverse-engineering a system.

When an engineering team audits a production system, they don't shame the system for having bugs. They document what's running, identify what's working, flag what's failing, and prioritize what to change. The same stance applies here.

Some of your default agents are excellent. The one that makes you hold the door for the person behind you — that's a default agent doing its job. The one that makes you check on a friend when they've been quiet — also a default agent, also good. The audit will surface agents you want to keep, agents you want to modify, and agents you want to replace. All three categories matter.

But you can't make those decisions until you know what's running. And you can't know what's running until you watch, record, and classify — in writing, with evidence, over time.

That's the audit. Run it before you build anything new.