Agent ecosystem health

Your agents are not a list. They are an ecology.

The previous lesson showed you that agents interacting in combination produce emergent behavior — outcomes none of the individual agents intended. That insight was about recognizing emergence when it happens. This lesson is about what to do with that recognition: you need to treat your collection of agents not as a checklist of independent routines, but as a living ecosystem that requires periodic assessment.

An ecosystem is not a collection of organisms. It is a web of relationships between organisms, and the health of that web determines whether the whole system thrives, degrades, or collapses. The same principle applies to every set of cognitive agents you operate — your habits, routines, commitments, rules, and automated behaviors. Each agent may function perfectly in isolation. But the moment you run multiple agents simultaneously, the interactions between them become the dominant factor in whether your system works.

You already know this intuitively. You have felt the friction of two good habits competing for the same time slot. You have watched a productive routine slowly degrade because another routine upstream changed its output without anyone noticing. What you may not have is a systematic way to assess these dynamics. That is what ecosystem health gives you.

Costanza's framework: vigor, organization, resilience

The most robust framework for assessing ecosystem health comes from ecological economist Robert Costanza, who proposed in 1992 that a healthy ecosystem can be characterized along three dimensions: vigor, organization, and resilience. This framework — known as the VOR model — has been applied to everything from aquatic ecosystems to alpine meadows to urban landscapes. It works because it captures the three independent axes along which any system of interacting agents can fail.

Vigor measures the system's activity and productivity — its metabolic throughput. In an ecological system, this is primary productivity: how much energy the system captures and converts. In your agent ecosystem, vigor is the total useful output your agents produce. Are they generating the decisions, artifacts, behaviors, and outcomes they were designed to produce? An agent ecosystem with low vigor looks like a set of routines that technically execute but produce nothing meaningful — motion without progress.

Organization measures the structural coherence of the system — how well its components are connected to each other and how efficiently information and resources flow between them. A highly organized ecosystem has clear pathways, minimal redundancy in critical functions, and well-defined relationships between components. In your agent ecosystem, organization means your agents hand off outputs cleanly, do not duplicate each other's work destructively, and maintain coherent relationships. Low organization looks like agents operating in silos, producing outputs that nothing downstream consumes, or creating contradictions that you resolve manually.

Resilience measures the system's capacity to absorb disturbance and maintain function. A resilient ecosystem recovers from shocks — a drought, a disease, a sudden change in conditions — without losing its essential structure. In your agent ecosystem, resilience means that when one agent gets disrupted (you travel, you get sick, your schedule changes), the other agents compensate or gracefully degrade rather than cascading into total system failure. Low resilience looks like a single missed morning routine that derails your entire day.

Costanza's key insight was that you need all three. A vigorous but disorganized system wastes energy. An organized but fragile system collapses at the first perturbation. A resilient but low-vigor system persists indefinitely while producing nothing of value. Ecosystem health is the product of all three dimensions, not the maximum of any one.

The McKinsey parallel: organizational health predicts performance

Costanza developed VOR for ecological systems, but the same triadic pattern appears in organizational science. McKinsey's Organizational Health Index, developed over two decades of research across more than 2,600 organizations, measures nine dimensions that map directly onto vigor, organization, and resilience: direction and leadership (organization), motivation and capabilities (vigor), and innovation, learning, and external orientation (resilience).

The data is unambiguous: organizations in the top quartile of health deliver three times the total shareholder returns of unhealthy organizations, regardless of industry. More telling, organizations with six times fewer safety incidents are not the ones with better safety protocols — they are the ones with higher overall organizational health scores. Health is not a function of any single process working well. It is a property of the system as a whole.

This transfers directly to your agent ecosystem. The question is not whether your exercise habit is effective, or whether your planning routine is sound, or whether your reading practice is productive. The question is whether the system composed of all of them is healthy — whether it has sufficient vigor (total output), organization (clean interactions), and resilience (capacity to absorb disruption).

Microservices and the health check pattern

Software engineering arrived at the same conclusion independently. In a microservices architecture — where an application is decomposed into many small, independently deployable services — system health cannot be assessed by checking each service individually. A service can return HTTP 200 ("I'm healthy") on its own health check endpoint while the system it participates in is failing catastrophically because of broken connections, mismatched throughput, or cascading timeouts between services.

This is why modern distributed systems implement three levels of health assessment. Liveness checks ask: is this service running at all? Readiness checks ask: is this service ready to accept traffic and connected to its dependencies? Startup checks ask: has this service completed initialization? The critical insight is the second level — readiness — because it assesses not the service in isolation, but the service in relationship to the system it participates in. A database service that is running but cannot connect to the database it wraps is live but not ready. The service is healthy. The system is not.

The three pillars of observability in distributed systems — metrics, traces, and logs — mirror this same structure. Metrics tell you about aggregate vigor (throughput, latency, error rates). Traces tell you about organization (how requests flow between services, where bottlenecks form). Logs tell you about resilience (what happened during failures, how the system responded). No single pillar is sufficient. You need all three to assess the health of a system whose components interact.

Your cognitive agent ecosystem has the same architecture. You need to check not just whether each agent is "running" (liveness), but whether each agent is connected to the agents it depends on and the agents that depend on it (readiness). And you need to observe the system across all three dimensions — throughput, flow, and recovery — not just any one.

The AI parallel: monitoring multi-agent systems

In AI engineering, the challenge of multi-agent health monitoring is one of the field's most active frontiers. When you deploy a single AI model, monitoring is straightforward: you track accuracy, latency, and drift. But when you deploy a system of multiple AI agents that coordinate to accomplish tasks — a planning agent, an execution agent, a verification agent, a resource-allocation agent — the monitoring problem explodes in complexity.

Research from Tsinghua University's "Agent Hospital" simulation demonstrated this vividly: a virtual hospital where all roles are played by autonomous LLM-powered agents requires an overarching health system that monitors not just individual agent performance, but patient flow across facilities, coordination latency between agents, and the emergent behavior of the system as a whole. Individual agent accuracy can be high while system-level outcomes are poor, because the coordination layer between agents introduces failure modes that no individual agent can detect.

The practical lesson from AI multi-agent systems is that monitoring must happen at three levels simultaneously: the individual agent level (is this agent performing its function?), the interaction level (are agents communicating effectively and resolving conflicts?), and the system level (is the overall system producing the intended outcomes?). Skip any level and you will miss the category of failure that level is designed to detect.

Your cognitive agent ecosystem has the same three-level structure. You need agent-level assessment (is my exercise routine producing fitness gains?), interaction-level assessment (is my exercise routine conflicting with my deep work schedule?), and system-level assessment (is my total set of routines producing the life outcomes I designed them to produce?). Most people only do the first level. The second and third are where ecosystem health lives.

The assessment protocol: running your own health check

Here is the practical protocol for assessing your agent ecosystem health. Run it weekly if you operate more than five active agents, monthly if you operate fewer.

Step 1: Inventory. List every active agent — every recurring commitment, routine, rule, habit, or automated behavior that executes with regularity. Do not filter. Include the small ones. The agent you forget to list is often the one causing the most interference.

Step 2: Individual VOR scoring. For each agent, rate vigor (1-5: is it producing meaningful output?), organization (1-5: does it connect cleanly to other agents?), and resilience (1-5: does it recover when disrupted?). Multiply the three scores. The maximum is 125. Anything below 27 (all threes) needs investigation. Anything below 8 (a two and two ones, or similar) needs immediate triage — either repair it or remove it.

Step 3: Interaction scan. Look at each pair of agents and ask three questions. Are they producing conflicting outputs? Is one producing output faster than the other can consume it? Are they competing for the same limited resource (time, attention, energy)? Any "yes" answer identifies an interaction-level health issue that individual agent assessment would never reveal.

Step 4: System-level check. Step back from the individual agents and ask: Is the total coordination overhead — the effort spent managing interactions between agents — growing, shrinking, or stable? If it is growing, your ecosystem is becoming less organized over time, regardless of how well individual agents perform. This is the earliest warning sign of ecosystem degradation.

Step 5: Record and compare. Write down your findings. The value of ecosystem health assessment is not in any single snapshot — it is in the trend over time. A system that is gradually losing organization or resilience will not feel broken today. It will feel broken six months from now, when the accumulated degradation finally crosses a threshold. The only way to catch gradual decline is longitudinal measurement.

Why individual assessment is not enough

The deepest failure in agent management is assessing agents in isolation. This is equivalent to a doctor checking each organ individually — heart, lungs, liver, kidneys — finding each one within normal parameters, and declaring the patient healthy without ever checking whether the organs are working together. A patient can have individually healthy organs and still be systemically ill because the interactions between organs are failing.

Costanza's VOR framework, McKinsey's organizational health research, microservices observability, and AI multi-agent monitoring all converge on the same conclusion: the health of a system is not the sum of the health of its parts. It is a property that emerges from the interactions between parts, and it can only be assessed at the system level.

This is why you need an ecosystem health practice, not just an agent management practice. Managing agents one at a time is necessary but insufficient. The agents interact. The interactions produce emergent behavior. The emergent behavior can be healthy or pathological. And the only way to know which is to assess the ecosystem as a whole.

From assessment to careful modification

Now you can see your agent ecosystem clearly. You can measure its vigor, organization, and resilience. You can identify interaction-level failures that individual assessment misses. You can track system-level trends over time.

The temptation, once you have this visibility, is to immediately start fixing things — adding new agents to fill gaps, removing agents that score poorly, restructuring interactions that produce conflicts. That temptation is exactly what the next lesson addresses. Because every new agent you add interacts with every existing agent, and every agent you remove changes the dynamics of every remaining interaction. Modifying an ecosystem is not like editing a list. It is like performing surgery on a living system. The next lesson, L-0517, teaches you why adding agents must be done with the same care you would bring to introducing a new species into a functioning ecology — deliberately, with full awareness of the interaction effects.

Sources:

Costanza, R. (1992). "Toward an Operational Definition of Ecosystem Health." In Ecosystem Health: New Goals for Environmental Management. Island Press.
Costanza, R., & Mageau, M. T. (1999). "What is a Healthy Ecosystem?" Aquatic Ecology, 33(1), 105-115.
McKinsey & Company. (2024). "Organizational Health Is (Still) the Key to Long-Term Performance." McKinsey People & Organizational Performance Practice.
Meadows, D. H. (2008). Thinking in Systems: A Primer. Chelsea Green Publishing.
Li, J., et al. (2024). "Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents." Tsinghua University.
Richardson, C., et al. (2023). "Health Check API." Microservices.io: Microservice Architecture Patterns.
Hernandez-Blanco, M., et al. (2022). "Ecosystem Health, Ecosystem Services, and the Well-Being of Humans and the Rest of Nature." Global Change Biology.