Why science works, and why that's a strange question

You’ve never replicated a physics experiment. You trust results from people you’ve never met, using instruments you’ve never inspected, following procedures you’ve never checked. No scientist alive has personally verified more than a sliver of what they build on. The body of existing knowledge is too vast for any one person.

Science requires trust.

That doesn’t mean it’s “just” trust. Science has built-in mechanisms for catching when trust is misplaced: replication, peer criticism, new experiments that contradict old conclusions. These mechanisms are imperfect. They’re slow. But they exist. Ordinary trust has no built-in correction. Science does.

We still depend on trust more than we usually acknowledge. Which leads somewhere strange: why does this whole system produce knowledge that actually works?

The universe cooperates

Start with the part nobody has explained.

Short descriptions predict what happens next. We write a few symbols on paper and they tell us where a planet will be in a thousand years. Wigner called this “the unreasonable effectiveness of mathematics” and admitted he had no explanation. The universe has patterns. Patterns that repeat across scales. Why it has them is an open question.

It gets stranger. We exist because of a tiny imbalance. In the first moments after the Big Bang, for every billion pairs of matter and antimatter that annihilated each other, roughly one extra particle of matter survived. That one-in-a-billion surplus is everything. Every star, every planet, every person. If the balance had been perfect, the universe would be nothing but cooling radiation. Sakharov identified the conditions needed to produce the imbalance in 1967. Our current theories still can’t fully account for it.

The list goes on. Nuclear force strength: change it slightly, no stable atoms. Expansion rate: much faster, no galaxies. A specific energy level in carbon that allows stars to forge the elements life needs, predicted by Hoyle in 1954 because it had to exist, then confirmed by experiment.

Are these values “fine-tuned”? We can’t say. We don’t know what other values were possible. We can say: the values in our universe permit the complexity that makes science, and us, possible. Small changes to many of them would not.

Smooth enough for trial and error

Separate from having patterns at all: things change gradually. Small causes produce small effects. Push something a little, it moves a little.

This sounds obvious, but nothing about the universe requires it.

If small changes produced wildly unpredictable results, trial and error would teach you nothing. Evolution wouldn’t work, because a small genetic change could produce anything. Science wouldn’t work, because you couldn’t extrapolate from one experiment to a slightly different one.

The universe is smooth enough, at the scales we operate at, that nearby attempts give nearby results. Tinkering produces improvements. Yesterday’s solution is a good starting point for today’s slightly different problem.

Not always. Weather is smooth moment to moment but unpredictable weeks out. Lorenz showed in 1963 that small errors compound exponentially. Phase transitions are abrupt jumps from smooth changes. And we don’t know whether smoothness is fundamental or just a feature of the scales we happen to inhabit.

Why rough approximations work

Given patterns and smoothness, rough approximations can capture enough of reality’s structure to be useful. Rules of thumb, simplified models, educated guesses. They work not because they’re exact, but because the things they ignore tend to matter less than the things they capture.

Simon identified why in 1962: the universe organizes itself in layers that are mostly independent. You can understand cells without knowing about galaxies. You can predict a bridge’s load without modeling individual atoms. Not because the layers are perfectly decoupled, but because the interactions between them are usually weak enough to ignore.

When that assumption fails, the approximations break. Near phase transitions. In quantum entanglement. In tightly coupled systems where everything affects everything. But they hold often enough for science and engineering to function.

The textbook story is too clean

The standard account: science makes predictions, tests them, keeps what survives, discards what fails. Popper formalized this as falsification. It captures something real. It also breaks in important ways.

You can’t observe without assumptions. What counts as a measurement, what you call noise versus signal, what you decide to test, all depend on what you already believe. Kuhn called this theory-ladenness: your framework shapes what you notice. Two scientists with different commitments look at the same data and attend to different things.

Multiple explanations always fit the same evidence. This is the Quine-Duhem problem: any surprising result can be absorbed by adjusting auxiliary assumptions instead of abandoning the core theory. When Uranus’s orbit didn’t match predictions, astronomers postulated an unseen planet. Neptune. They were right. When Mercury’s orbit didn’t match, they postulated another unseen planet. Vulcan. They were wrong. Same logical move, opposite outcomes.

There’s no algorithm for knowing which response is correct.

You can’t tell in the moment, either. The faster-than-light neutrinos at OPERA in 2011? A loose cable. Cold fusion in 1989? Measurement error. Both took months or years for the scientific community to sort out. Not through logic alone. Through replication attempts, theoretical analysis, persuasion, career incentives.

What actually makes it work

There’s no single mechanism that makes science self-correcting. What you get instead is several imperfect mechanisms piled on top of each other, each covering gaps the others miss.

Theories must contact reality. They must make predictions you can compare against what actually happens. This doesn’t guarantee self-correction, but without it, self-correction is impossible. A theory that never touches data can’t be shown wrong. Which means it can never be fixed.

Scientists are rewarded for finding mistakes in each other’s work; this keeps bad answers from becoming permanent. But scientists are also rewarded for novelty and positive results, and those incentives distort the correction process.

Independent replication is the strongest single check. When different groups, using different equipment and different approaches, get the same result, that carries real weight. The problem is that replication is systematically under-rewarded. Journals want novelty. Careers are built on discoveries. Nobody gets tenure for confirming someone else’s finding.

The strongest signal science produces is convergence. When theory and experiment both point the same direction, each arrived at independently, the chance of both being wrong in the same way drops fast.

Where it breaks

Calling science “self-correcting” without examining its failures is like calling markets “self-regulating” without mentioning 2008.

Ioannidis showed in 2005 that most published research findings are false. Studies that find the expected result get published. Studies that find nothing get filed away. The literature overstates effect sizes and hides null results. This isn’t a few bad actors. It’s what the publication incentive structure produces.

The system rewards publications and citations. Not truth. When those incentives align with truth-seeking, the system works. When they diverge, you get impressive-looking results that don’t replicate.

Kuhn identified a deeper problem: scientists working within a paradigm don’t question its foundations. They solve puzzles the paradigm defines. Anomalies accumulate; they get treated as unsolved puzzles, not as signs the framework is wrong. Self-correction switches off during these periods. It resumes only when the anomalies become too numerous or too embarrassing to ignore.

The replication crisis made the structural nature of the problem visible. The Open Science Collaboration tried to reproduce 100 published psychology experiments. Between 36% and 47% replicated. Not a few flawed studies. A systemic failure in how the field generates and validates knowledge.

In some fields, mathematical sophistication substitutes for contact with reality. Paul Romer called this “mathiness”: models that are internally consistent and elegant and wrong. A proof that the model follows from its axioms tells you nothing about whether the axioms match the world.

Not all science self-corrects equally

Science works to the extent that reality pushes back hard enough to power the correction.

Engineering has strong feedback. Your bridge falls. Your rocket explodes. Reality responds fast, and the response is impossible to ignore.

Nutrition science and macroeconomics have weak feedback. Effects are small, confounded, slow to appear, impossible to study in clean experiments. The US dietary guidelines told Americans to avoid fat for 30 years. Consensus persisted because reality wasn’t pushing back hard enough to overwhelm career incentives and publication bias.

Physics sits on both sides. Precision measurement and reproducible experiments get strong feedback. Cosmology and string theory get much weaker feedback.

Self-correction tracks the strength and speed of reality’s pushback.

Can we trust thinking about thinking?

Everything above has a problem.

We’re using reasoning to ask why reasoning works. Our brains were shaped by evolution, a process that depends on the same cooperative structure of the universe we’re trying to explain. We’re not outside observers. We’re outputs of the process, examining the process, with tools the process built.

If the universe had no patterns, no smoothness, no conditions permitting complexity, no brains would have evolved. No science. Nobody to ask the question. The question can only be asked in universes where the answer is favorable. We’re selecting for the answer by being alive to ask.

This is deeper than “observers exist in observable universes.” Our capacity to reason about reasoning is contingent on the universe being the kind of place where approximate reasoning works. If it weren’t, we wouldn’t have this capacity, and the question wouldn’t exist. We can’t tell whether we’re discovering something true or exercising a capability that only functions in realities where it happens to be productive.

The philosopher Otto Neurath had a metaphor for this: we are like sailors who must rebuild their ship on the open sea, plank by plank, never able to put it in dry dock and start from scratch. We can improve our reasoning, but only by using the reasoning we already have. There is no dry dock. No neutral ground to stand on while we evaluate the tool.

What we’re left with

Every description is approximate. Every verification rests on trust. The analysis is circular.

Is anything solid?

Probably not in the way we’d like. But the universe’s consistency offers something almost as good: convergent approximation. Different people, different methods, different assumptions, different instruments, all landing on the same answer. The electron’s magnetic moment, predicted by quantum theory and measured in the lab, agrees to ten significant figures. Separate research groups find the same gene. Physics derived from theory matches physics derived from experiment.

This convergence isn’t universal. The interpretation of quantum mechanics, the nature of consciousness, the causes of major economic events, whether dark matter exists or gravity needs modifying. These haven’t converged. We tend to celebrate convergence and forget the open questions where methods still disagree.

We can’t prove our reasoning works. We can’t prove the universe is structured the way we think. We can’t escape using the tool to evaluate the tool. But within that circle, the evidence is remarkably consistent. Planes fly. Vaccines prevent disease. Eclipse predictions are accurate to the second. That track record is hard to dismiss even from deep skepticism.

The universe isn’t helping us on purpose. But its structure makes being approximately right possible — and that is either the most remarkable thing about it, or a thing only remarkable to beings who could only exist in a universe where it’s true.

We can’t tell which. That might be the most honest answer available.

Key references

Wigner, “The Unreasonable Effectiveness of Mathematics in the Natural Sciences” (1960)
Sakharov, “Violation of CP Invariance, C Asymmetry, and Baryon Asymmetry of the Universe” (1967)
Hoyle, “On Nuclear Reactions Occurring in Very Hot Stars” (1954)
Simon, “The Architecture of Complexity” (1962)
Popper, The Logic of Scientific Discovery (1959)
Kuhn, The Structure of Scientific Revolutions (1962)
Quine, “Two Dogmas of Empiricism” (1951)
Lorenz, “Deterministic Nonperiodic Flow” (1963)
Ioannidis, “Why Most Published Research Findings Are False” (2005)
Open Science Collaboration, “Estimating the Reproducibility of Psychological Science” (2015)
Romer, “Mathiness in the Theory of Economic Growth” (2015)
Neurath, Anti-Spengler (1913) / Protocol Statements (1932). The ship-rebuilding-at-sea metaphor for reasoning within reasoning.