One of the main messages of most psychological research into bias, and the hundreds of popular books that have followed, is that most people just aren’t intuitively good at thinking statistically. We make mistakes about the influence of sample size, and about representativeness and likelihood. We fail to understand regression to the mean, and often make mistakes about causation.
However, suggesting statistical competence as a universal cure can lead to new sets of problems. An emphasis on statistical knowledge and tests can introduce its own blind spots, some of which can be devastating.
The discipline of psychology is itself going through a “crisis’ over reproducibility of results, as this Bloomberg view article from the other week discusses. One recent paper found that only 39 out of a sample of 100 psychological experiments could be replicated. That would be disastrous for the position of psychology as a science, as if results cannot be replicated by other teams their validity must be in doubt. The p-value test of statistical significance is overused as a marketing tool or way to get published. Naturally, there are some vigorous rebuttals in process.
It is, however, a problem for other disciplines as well, which suggests the issues are genuine, deeper and more pervasive. John Ioannidis has been arguing the same about medical research for some time.
He’s what’s known as a meta-researcher, and he’s become one of the world’s foremost experts on the credibility of medical research. He and his team have shown, again and again, and in many different ways, that much of what biomedical researchers conclude in published studies—conclusions that doctors keep in mind when they prescribe antibiotics or blood-pressure medication, or when they advise us to consume more fiber or less meat, or when they recommend surgery for heart disease or back pain—is misleading, exaggerated, and often flat-out wrong. He charges that as much as 90 percent of the published medical information that doctors rely on is flawed.
The same applies to economics, where many of the most prominent academics apparently do not understand some of the statistical measures they use. A paper (admittedly from the 1990s) found that 70% of the empirical papers in the American Economic Review, the most prestigious journal in the field,” did not distinguish statistical significance from economic, policy, or scientific significance.” The conclusion:
We would not assert that every econoimst misunderstands statistical significance, only that most do, and these some of the best economic scientists.
Of course, the problems and flaws in statistical models in the lead up to the great crash of 2008 are also multiple and by now famous. If bank management and traders do not understand the “black box” models they are using, and their limits, tears and pain are the usual result.
The takeaway is not to impugn statistics. It is that people are nonetheless very good at making a whole set of different mistakes when they tidy up one aspect of their approach. More statistical rigor can also mean more blind spots to other issues or considerations, and use of technique in isolation from common sense.
The more technically proficient and rigorous you believe you are, often the more vulnerable you become to wishful thinking or blind spots. Technicians often have a remarkable ability to miss the forest for the trees, or twigs on the trees.
It also means there are (even more) grounds for mild skepticism about the value of many academic studies to practitioners.