How do I apply the idea that output measurement?

Build a personal output scorecard by listing every output you produced in the last 30 days, scoring each on reach, resonance, and downstream action, then ranking output types by total value to identify where your production effort should concentrate.

What goes wrong when you ignore that output measurement?

Measuring only vanity metrics like views and likes, which feel rewarding but tell you nothing about whether your outputs actually changed anyone s thinking or behavior — optimizing for applause instead of impact.

How to ThinkIn the Age of AI

Output measurement

~14 min read·operations·

operations output-systems measurement feedback-loops metrics

Core Primitive

Track which outputs produce the most value to focus your production on high-impact types.

The blog post that fooled everyone

A consultant published a blog post that went viral. Forty thousand views in a week. Hundreds of shares. Comments piling up. She felt the dopamine hit of validation — the numbers said this was her best work. So she wrote more posts in the same style: provocative takes, contrarian framing, punchy titles designed to stop the scroll.

Six months later, she reviewed her business. The viral posts had generated exactly zero client inquiries. Zero. The post that had produced three of her five highest-value clients that year was a quiet, technical walkthrough with eight hundred views — a piece she had almost deleted from her drafts folder because the metrics looked embarrassingly modest compared to her "successful" work. The walkthrough attracted the right eight hundred people. The viral post attracted forty thousand of the wrong ones.

She had been measuring. But she had been measuring the wrong thing. And because she was measuring the wrong thing, her measurement system was actively steering her away from her most valuable output.

This is not an edge case. This is the default. Most people who track their output at all track what is easiest to count — views, likes, follower growth — and then optimize for those numbers with total sincerity and total misdirection. Output measurement is not about counting. It is about counting what matters.

The core principle: track which outputs produce the most value

Here is the primitive in its full form: track which outputs produce the most value to focus your production on high-impact types.

Two operations are embedded in that sentence. First, you measure — you attach numbers to outputs so you can compare them. Second, you redirect — you use those numbers to shift your production toward what works and away from what does not. Measurement without redirection is vanity tracking. Redirection without measurement is guessing. You need both.

In the previous lessons, you built an output system — you defined your output types, established quality standards, created templates, set frequency, built a pipeline, versioned, distributed, and repurposed. That is a production engine. But a production engine without measurement is a factory running blind. It produces whatever you tell it to produce, at whatever pace you set, in whatever formats you choose — with no feedback on whether any of it is actually working. You could be producing at peak efficiency and generating zero value. Efficiency without effectiveness is the most dangerous kind of waste because it feels productive.

Measurement closes the loop. It converts your output system from an open-loop machine — produce, ship, hope — into a closed-loop system — produce, ship, measure, learn, adjust, produce better. That feedback loop is the difference between output that improves over time and output that repeats the same patterns indefinitely.

What Peter Drucker actually meant

"What gets measured gets managed." You have heard this attributed to Peter Drucker so many times that it has become a reflex rather than a thought. But the full implication is sharper than the sound bite suggests.

Drucker's insight was not that measurement is good. It was that measurement is steering. Whatever you measure, you will unconsciously — and then consciously — optimize for. Your attention follows your metrics. Your effort follows your attention. Your output follows your effort. The metric becomes the target, and the target shapes everything downstream.

This is powerful when the metric is well-chosen. A writer who measures "pieces that generated inbound conversations" will naturally produce work that invites conversation — substantive, nuanced, question-raising. A consultant who measures "outputs that led to client engagements" will naturally produce work that demonstrates competence to the right audience. The measurement steers the production, and good measurement steers it well.

But the same mechanism is catastrophic when the metric is poorly chosen. This is where Charles Goodhart enters the picture.

Goodhart's Law: the metric trap

In 1975, British economist Charles Goodhart observed a pattern in monetary policy: whenever a statistical regularity was adopted as a target for policy purposes, it immediately ceased to be a reliable indicator. The Bank of England would identify a metric that correlated with economic health, set a target for it, and then watch helplessly as economic actors optimized for the metric itself rather than the underlying health it was supposed to represent. The metric was fine as an observation. It collapsed as a target.

Marilyn Strathern later generalized this into what we now call Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

This law operates in your personal output system with brutal efficiency. You start measuring newsletter subscribers, so you begin writing clickbait subject lines that inflate your list with people who will never read a word you write. You start measuring article views, so you begin optimizing for shareability over substance. You start measuring output frequency, so you begin shipping half-finished work to hit your daily quota. In every case, the metric was initially a reasonable proxy for value. In every case, optimizing directly for the metric degraded the value it was supposed to represent.

The defense against Goodhart's Law is not to stop measuring. It is to measure multiple things, to rotate your primary metric periodically, and to always maintain at least one qualitative assessment alongside your quantitative metrics. A number tells you what happened. A qualitative review tells you whether what happened was good.

Vanity metrics versus actionable metrics

Eric Ries, in "The Lean Startup," drew a distinction that every output producer needs to internalize: the difference between vanity metrics and actionable metrics.

Vanity metrics are numbers that go up and make you feel good but do not inform any decision. Total page views. Cumulative follower count. Lifetime downloads. These metrics have two properties that make them dangerous: they almost always increase over time (which feels like progress), and they tell you nothing about whether your most recent actions were effective (which means they cannot guide your next action).

Actionable metrics, by contrast, answer a specific question and lead to a specific decision. Not "how many people saw my output?" but "what percentage of people who saw my output took the action I intended?" Not "how many followers did I gain this month?" but "which specific output drove the most new followers, and what was different about it?" Not "how many outputs did I ship?" but "which shipped outputs generated the most downstream value, and what do they have in common?"

The difference is causality. Vanity metrics describe outcomes without explaining causes. Actionable metrics connect specific inputs to specific outputs so you can reproduce what works and stop what does not. Your measurement system should be built entirely from actionable metrics. Vanity metrics should be visible only as background context, never as primary targets.

Ben Thompson, who runs the strategy analysis publication Stratechery, has written extensively about the difference between measuring engagement and measuring reach. Reach tells you how many people saw something. Engagement tells you how deeply they interacted with it. A piece with enormous reach and zero engagement is noise — it passed through people's visual fields without leaving a trace. A piece with modest reach and deep engagement found its audience and served them. Thompson argues that engagement is almost always the better metric for individual creators because it measures the thing that actually compounds: the relationship between you and the people who care about your work.

Leading versus lagging indicators

There is a temporal dimension to measurement that most people ignore. Some metrics tell you what already happened. Others tell you what is about to happen. The distinction matters enormously for how you use them.

Lagging indicators are outcomes. Revenue generated. Clients acquired. Opportunities created. These are the metrics that ultimately matter — they represent the actual value your output produced. But they arrive late. By the time you see a lagging indicator, the output that caused it was produced weeks or months ago. You cannot steer by lagging indicators alone because by the time you see the result, the moment for adjustment has long passed.

Leading indicators are signals that predict future outcomes. Email replies to a newsletter. Direct messages after a post. Questions from people who engage with your work. Saves and bookmarks rather than likes and shares. Repeat visitors rather than first-time viewers. These metrics arrive earlier in the causal chain, which means they give you faster feedback — you can see within days whether a piece is resonating, rather than waiting months to see whether it generated downstream results.

Andy Grove, the former CEO of Intel who popularized the OKR framework, insisted on measuring outcomes rather than outputs — "Did the customer's problem get solved?" rather than "Did we ship the feature?" John Doerr, who brought OKRs to Google, built on this by pairing lagging objectives ("achieve X") with leading key results ("do Y, which we believe drives X"). The pairing is the insight. Lagging indicators tell you whether you are succeeding. Leading indicators tell you what to do today to succeed tomorrow.

Your personal output measurement system needs both. You need lagging indicators to know what is actually working. And you need leading indicators to make adjustments in real time rather than in retrospect.

The Balanced Scorecard for personal output

Robert Kaplan and David Norton introduced the Balanced Scorecard in 1992 to solve a problem in corporate measurement: companies were measuring financial performance and nothing else, which was like driving by looking only at the rearview mirror. Their solution was multi-dimensional measurement — tracking financial results alongside customer satisfaction, internal processes, and learning and growth.

The principle translates directly to personal output measurement. If you measure only one dimension of your output, you will overfit to that dimension at the expense of everything else. You need a scorecard that captures multiple dimensions of value.

Here is a personal output scorecard with four dimensions:

Reach: How many people encountered this output? This is the most visible and least important dimension, but it is not irrelevant. An output that reaches zero people produces zero value regardless of its quality. Reach is a necessary condition, not a sufficient one.

Resonance: How deeply did the audience engage? Did they reply, comment, share with personal annotation, save for later, or forward to a specific person? Resonance measures whether the output actually landed — whether it produced a cognitive or emotional response rather than just passing through someone's attention.

Downstream action: Did the output produce any tangible result beyond the moment of consumption? An inbound email. A collaboration request. A client inquiry. An invitation to speak. A citation in someone else's work. Downstream action is where output converts into value — where the intangible work of producing something becomes the tangible outcome of something happening because you produced it.

Personal growth: Did producing this output make you better at something? Did the research deepen your understanding? Did the writing clarify your thinking? Did the feedback reveal a gap you did not know existed? This dimension is invisible to the outside world but critical to long-term value. An output that teaches you something is valuable even if no one reads it, because the learning compounds into future output that is better, sharper, and more insightful.

No single output needs to score well on all four dimensions. A private working document might score zero on reach and resonance but high on personal growth. A social post might score high on reach and resonance but zero on downstream action. The scorecard's value is not in optimizing every output across every dimension — it is in ensuring you are aware of all four dimensions so you do not accidentally abandon three of them while chasing one.

Building your personal output measurement system

Theory without practice is inert. Here is how to build a measurement system that actually works.

Step 1: Define your value metric. Before you measure anything, answer this question: what does "value" mean for your specific output? If you are building a professional reputation, value might mean inbound opportunities from people in your target audience. If you are building a knowledge practice, value might mean insights generated through the act of production. If you are building a community, value might mean conversations initiated. Your value metric is the lagging indicator that ultimately matters. Everything else is a proxy.

Step 2: Identify your leading indicators. What observable signals, arriving within days of publication, predict your value metric? For most personal output systems, the leading indicators include: reply rate (people who respond directly), save/bookmark rate (people who intend to return), forwarding rate (people who share with a specific recommendation), and question rate (people who engage deeply enough to want more). These are not the same as likes and retweets, which are low-cost social signals that predict almost nothing about downstream value.

Step 3: Build a simple tracking system. A spreadsheet is sufficient. For each output, record: the date, the type (article, post, video, document), the format, the channel, the reach number, the resonance signals, and any downstream actions you can attribute to it. Do not build an elaborate dashboard. Build a row in a spreadsheet. The overhead of measurement must be low enough that you actually do it — an insight from this entire curriculum that recurs at every operational level.

Step 4: Review monthly. Once a month, sort your outputs by the dimension that matters most to you. Which outputs produced the most downstream action? Which resonated most deeply? What do the top performers have in common — in topic, in format, in channel, in timing, in structure? What do the underperformers share? The patterns that emerge from a month of data are usually more instructive than your intuition about what works, because your intuition is biased by the output you most enjoyed producing rather than the output that actually produced value.

Step 5: Adjust your production. This is where measurement converts into improvement. If your data shows that detailed technical breakdowns consistently outperform opinion pieces on every dimension, produce more breakdowns and fewer opinion pieces. If one channel generates ten times the resonance of another, concentrate your distribution there. If a specific format — say, numbered lists or case studies — reliably produces downstream action, make that format a larger share of your production. You are not abandoning creative freedom. You are informing it with evidence.

Step 6: Rotate your primary metric. Every quarter, review which metric you have been primarily optimizing for and consider whether it is still the right one. Goodhart's Law guarantees that any metric you optimize for long enough will eventually become gamed — even by yourself. Rotating your primary focus between reach, resonance, downstream action, and personal growth prevents you from overindexing on any single dimension.

The measurement paradox

There is a tension at the heart of output measurement that you must hold without resolving it.

Measurement is essential because without it you are guessing. You are producing based on intuition, hope, and the feedback of whatever happens to be most visible — which is almost never the feedback that matters most. People who do not measure their output systematically almost always overestimate the value of their most popular work and underestimate the value of their quietest work. Measurement corrects this by replacing impressions with data.

But measurement is also dangerous because it can reduce everything to what is countable. The most valuable outputs are often the hardest to measure. A piece of writing that changes how someone thinks about their career does not generate a measurable signal at the moment of reading — it generates a decision six months later that you will never trace back to your article. A document that clarifies your own thinking produces zero external metrics but might redirect years of your effort. A conversation started by something you wrote might compound into a relationship that transforms your professional life, and you will attribute it to luck rather than to the output that initiated it.

The resolution is not to choose between measuring and not measuring. It is to measure rigorously while maintaining the humility to know that your measurements are capturing a fraction of the actual value your output produces. Use measurement to guide your production. Do not use measurement to define your worth. The numbers are a flashlight, not the sun — they illuminate part of the landscape, but the territory extends far beyond the beam.

The Third Brain: AI as measurement analyst

AI transforms output measurement from a manual chore into a semi-automated intelligence system.

Pattern detection across outputs. Feed your output measurement data to an AI assistant and ask it to identify patterns you might miss. "Looking at the last 90 days of output data, which variables most strongly correlate with downstream action? Is there a topic pattern? A length pattern? A publishing day or time pattern? A structural pattern?" AI can process the full dataset simultaneously and surface correlations that you would never notice reviewing row by row.

Qualitative analysis at scale. Ask an AI to analyze the comments, replies, and messages you received in response to your outputs. "Categorize the responses to my last 20 outputs by type: question, agreement, disagreement, personal application, sharing intent. Which outputs generated the highest proportion of personal application responses?" This converts qualitative feedback into semi-quantitative patterns without losing the nuance that pure numbers miss.

Measurement system design. Describe your output goals to an AI and ask it to recommend a measurement framework. "I produce weekly articles and daily social posts aimed at building expertise reputation in my field. What leading indicators should I track? What is the minimum viable measurement system that gives me actionable data without consuming more than fifteen minutes per week?" AI can synthesize measurement best practices from across domains and tailor them to your specific situation.

Predictive analysis. After a few months of data, ask AI to predict which upcoming outputs are likely to perform well based on historical patterns. "Given that my top-performing outputs share these characteristics, does my draft for next week match the pattern or diverge from it? If it diverges, is that a creative risk worth taking or an unforced error?" This is not about letting AI dictate your creative choices. It is about making your creative choices with full awareness of what the data suggests.

Anomaly detection. AI can flag outputs that performed significantly above or below prediction. "This post reached half the audience of a typical post but generated five times the downstream action. What was different about it?" Anomalies are where the deepest learning hides, because they reveal dynamics that your existing model does not capture.

The key principle: AI handles the data processing so you can focus on the judgment. What do these patterns mean? Which metrics actually matter? When should you follow the data and when should you override it? The measurement is mechanical. The interpretation is human. Let each do what it does best.

The bridge to the output review

You now have a measurement system — a way to track which outputs produce value and which do not. But measurement data is inert until it is acted upon. A spreadsheet full of numbers does not improve your output. A human being reviewing those numbers, identifying patterns, and making deliberate adjustments does.

That is the work of The output review: the output review. A periodic, structured review of your output system that uses measurement data to answer the critical questions: What is working? What is not? What should I produce more of? What should I stop producing? What experiments should I run next?

Measurement without review is data accumulation. Review without measurement is opinion. Together, they form the feedback loop that makes your output system self-improving — a production engine that gets better at producing value with every cycle of measure, review, adjust, produce.

You have built the measurement instrument. Next, you learn to read it.

Sources:

Drucker, P. F. (1954). The Practice of Management. Harper & Brothers.
Goodhart, C. A. E. (1975). "Problems of Monetary Management: The U.K. Experience." Papers in Monetary Economics, Reserve Bank of Australia.
Strathern, M. (1997). "'Improving Ratings': Audit in the British University System." European Review, 5(3), 305-321.
Ries, E. (2011). The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Crown Business.
Grove, A. S. (1983). High Output Management. Random House.
Doerr, J. (2018). Measure What Matters: How Google, Bono, and the Gates Foundation Rock the World with OKRs. Portfolio/Penguin.
Kaplan, R. S., & Norton, D. P. (1992). "The Balanced Scorecard — Measures That Drive Performance." Harvard Business Review, 70(1), 71-79.
Thompson, B. (2015). "Aggregation Theory." Stratechery.

Practice

Build an Output Value Scorecard in Google Sheets

Create a spreadsheet to inventory and score your last 30 days of outputs, then calculate which output types deliver the most value. This practice transforms abstract output quality into concrete data you can use to prioritize future work.

15 minutesIntermediate

Method: Weekly ReviewTool: Google Sheets

1Open Google Sheets and create a new spreadsheet titled 'Output Value Scorecard.' Create column headers: Output Name, Output Type, Date, Reach (1-10), Resonance (1-10), Downstream Action (1-10), Total Score.
2List every output you created in the last 30 days in separate rows (emails, documents, presentations, social posts, code, reports, etc.), filling in the Output Name, Output Type, and Date columns for each item.
3Score each output on Reach (how many people saw it), Resonance (how much it affected people), and Downstream Action (what actions it triggered) using a 1-10 scale in Google Sheets. In the Total Score column, use the SUM formula to add the three scores for each output.
4Create a pivot table in Google Sheets by selecting your data, clicking Insert > Pivot table, then setting 'Output Type' as rows and 'Total Score' as values (using SUM). This shows total value by output type.
5Sort the pivot table by total score descending to identify your highest-value output types. Add a text box below the pivot table noting the top 3 output types where you should concentrate future production effort.

Frequently Asked Questions

Common questions about this lesson

Loading lessons

Preparing the next section of the lesson graph.