Experts vs. Simple Heuristics
Can simple rules beat the NCAA selection committee?
193 computer ranking systems • 40 seasons
The Contestants
Here's what we tested, ranging from "no information at all" to "team of experts with months of analysis":
| Model | What It Uses | Expert Knowledge? | Accuracy |
|---|---|---|---|
| Coin Flip | Nothing — pure random | None | 50.0% |
| Win-Loss Record | Just regular season wins | None — a child could do this | 64.7% |
| Point Differential | Average scoring margin per game | Minimal — basic arithmetic | 65.2% |
| Wins vs Tourney Teams | Wins against teams that made the tournament | Some — requires knowing the field | 69.1% |
| KenPom (best algorithm) | Adjusted efficiency ratings, tempo, SOS | Sophisticated math, no human bias | 70.9% |
| Selection Committee | Everything — stats, film, eye test, debate | Maximum — panel of experts | 71.4% |
Finding 1: The Committee Beats Every Simple Rule — But Not by Much
The committee is the best predictor — but the margins are thin. A simple win-loss count, something literally anyone can look up in 30 seconds, gets you to 64.7%. The committee's months of film study, debate, data analysis, and deliberation add just 6.7 points on top of that.
Or to put it differently: about 90% of the committee's predictive power can be captured by counting wins. The remaining 10% is where their expertise lives.
Finding 2: The Best Algorithm Essentially Ties the Committee
We tested all 193 computer ranking systems in the Massey Ordinals database against the committee's seedings. The results:
| Category | Count | % of Systems |
|---|---|---|
| Systems that beat the committee | 7 | 13% |
| Systems within 1% of the committee | 9 | 17% |
| Systems that trail by more than 1% | 36 | 70% |
Only 7 out of 52 systems with sufficient data beat the committee — which means the committee does add value beyond any single algorithm. But the best algorithms come remarkably close:
| System | What It Is | Accuracy | vs Committee |
|---|---|---|---|
| KenPom (POM) | Adjusted efficiency — the gold standard of analytics | 70.9% | +0.1% |
| Massey (MOR) | Massey Ratings — pure mathematical ranking | 70.6% | -0.2% |
| Sagarin (SAG) | Jeff Sagarin's computer rankings | 70.5% | -0.4% |
| RPI | Rating Percentage Index — simple formula | 69.0% | -1.8% |
| Colley (COL) | Colley Matrix — linear algebra approach | 69.0% | -1.8% |
Finding 3: Expertise Has Diminishing Returns
Think of prediction accuracy as a staircase. Each step represents adding more information or sophistication:
The biggest jump is from knowing nothing to counting wins: +14.7 points. After that, each additional layer of sophistication adds less. The jump from the best algorithm to the expert committee is just 0.5 points — the smallest increment on the entire staircase.
Finding 4: The Committee's Edge Is Biggest in the First Round and Elite 8
| Round | Committee | Win-Loss | Point Diff | Committee Edge over Pt Diff |
|---|---|---|---|---|
| Round of 64 | 72.9% | 64.7% | 64.9% | +8.0% |
| Round of 32 | 69.3% | 65.5% | 65.2% | +4.1% |
| Sweet 16 | 68.9% | 61.5% | 68.0% | +0.9% |
| Elite 8 | 71.1% | 61.7% | 59.2% | +11.9% |
| Final Four | 71.4% | 66.7% | 90.0% | -18.6% |
In the first round, where 1-seeds face 16-seeds and the talent disparity is obvious, the committee's seedings are 8 points better than point differential alone. The committee is good at identifying mismatches.
But in the Final Four — where we'd expect expert judgment to matter most — point differential actually flips and beats the committee. Small sample size caveat applies (only 10 games in the dataset with clear seed differences), but the direction is interesting: when the remaining teams are all elite, the experts' subjective rankings may be less reliable than raw performance data.
Finding 5: When Simple Disagrees with Expert, the Expert Usually Wins
There are 495 tournament games where the win-loss record would have picked a different team than the committee's seeding. In those contested cases:
The committee wins the tiebreaker about two-thirds of the time. This is where their expertise genuinely earns its keep — they're seeing something (strength of schedule, injuries, conference quality, late-season trends) that a raw win count misses. But a third of the time, the simple metric had it right and the experts were wrong. That's a meaningful error rate for "the best judgment available."
Finding 6: Among 193 Computer Systems, Only 7 Beat the Committee
The committee outperforms 87% of the computer ranking systems. That's a legitimate feather in their cap. But consider the economics:
The 7 systems that beat the committee are interesting precisely because they suggest the committee's errors aren't random — they're systematic. A purely mathematical model that ignores brand names, conference prestige, and "eye test" narratives can match or exceed the experts. The committee's biases (which we documented in our seeding analysis — conference favoritism, the 11-seed anomaly, mid-seed noise) are exactly the kind of errors an algorithm wouldn't make.
What Does This Mean?
The NCAA Selection Committee is a real-world laboratory for studying expert judgment. After 40 years and 2,518 games, the data tells a clear story:
1. Expertise is real but its marginal value is small. The committee beats every simple rule and most computer systems. But the gap between "count the wins" (64.7%) and "expert committee" (71.4%) is just 6.7 points. Most of the predictive signal is in the basic data, not the expert analysis.
2. Algorithms match experts at a fraction of the cost. KenPom hits 70.9% with two numbers and no human input. The committee's remaining 0.5% edge comes with enormous cost, complexity, and the introduction of human biases.
3. The "casual fan" is closer to the expert than you'd think. If your friend picks brackets by "going with the team that won more games," they'll get about 65% right. The expert gets 71%. The gap is real but narrow — which is why office bracket pools are competitive and why your coworker who "doesn't even watch basketball" occasionally wins.
4. This mirrors findings across expert domains. From financial analysts to political forecasters to medical diagnosticians, the pattern repeats: experts outperform simple rules by small margins, algorithms match experts closely, and more information doesn't proportionally improve accuracy. The NCAA tournament is just a particularly clean dataset to prove it.