Check subgroup relationships before aggregating — aggregate patterns can reverse at the disaggregated level (Simpson's paradox)
Before aggregating data across subgroups, check whether the relationship holds within each subgroup independently, as aggregate patterns can reverse at the disaggregated level (Simpson's paradox).
Why This Is a Rule
Simpson's paradox is one of the most counterintuitive and dangerous statistical phenomena: a trend that appears in aggregated data can reverse in every subgroup. The classic example: Hospital A has a higher overall survival rate than Hospital B. But for every individual condition (heart disease, cancer, trauma), Hospital B has a higher survival rate. The reversal happens because Hospital B treats more severe cases overall, skewing the aggregate.
This isn't a rare edge case — it appears in university admission rates, medical treatment effectiveness, batting averages, and any domain where subgroups differ in size or baseline characteristics. The danger is that aggregated data feels more reliable ("bigger sample!") while actually concealing the real pattern. Decision-makers who rely on aggregated results without checking subgroups will confidently make exactly the wrong choice.
The procedural defense is simple: before drawing conclusions from aggregated data, disaggregate by relevant subgroups and verify the relationship holds within each. If it reverses, the aggregate is misleading, and the subgroup-level analysis is what you should act on.
When This Fires
- Before making any decision based on aggregated performance data across teams, departments, or categories
- When comparing overall rates between two entities that differ in their subgroup composition
- During data analysis when combining results across demographic, geographic, or categorical subgroups
- When someone presents a compelling aggregate statistic that will drive a significant decision
Common Failure Mode
Trusting aggregate statistics because "the sample is large enough." Sample size is irrelevant to Simpson's paradox — the reversal can be perfectly statistically significant at both the aggregate and subgroup levels. The issue isn't noise; it's confounding. A large sample doesn't protect you from a lurking variable that changes subgroup proportions.
The Protocol
(1) Before acting on any aggregated comparison, ask: "Are there relevant subgroups that differ in size or baseline characteristics?" (2) If yes, disaggregate: compute the relationship within each subgroup independently. (3) Compare: does the relationship hold in the same direction within each subgroup as in the aggregate? (4) If the direction reverses in any subgroup → the aggregate is misleading. Report and act on the subgroup-level analysis. (5) If the direction holds consistently across subgroups → the aggregate is reliable for that specific comparison. (6) When in doubt about relevant subgroups, try the obvious ones: size categories, severity levels, time periods, demographic groups.