Distrust: Big Data, Data-Torturing, and the Assault on Science Gary Smith. Oxford Univ. Press (2023)
Science is under attack. Ironically, the weapons are products of science itself: the propagation of misleading information, the torturing of data to ‘prove’ claims about anything, the mining of data untroubled by any hypothesis about what you might find. As Gary Smith writes in Distrust, “Disinformation is spread by the Internet that scientists created. Data torturing is driven by scientists’ insistence on empirical evidence. Data mining is fuelled by the big data and powerful computers that scientists created.”
Smith, an economist at Pomona College in Claremont, California, has form in this sort of critique: he wrote the 2018 book The AI Delusion and, together with mathematician Jay Cordes, the 2019 book The 9 Pitfalls of Data Science. Throughout Distrust, he underscores his claims with compelling examples. Take cryptocurrencies, one of his pet peeves. Disinformation and fake trades manipulate their value; data torturing underpins models that supposedly predict their prices; and data-mining creates them in the first place.
He discusses in detail other examples of science being under attack. He sets out how, for example, food-marketing researcher Brian Wansink’s claims about dieting — that people eat less if their food comes on a small plate or if their kitchen is painted in neutral earth tones, for instance — were featured in numerous peer-reviewed papers and led to two bestselling books. A classic case of data torturing and sloppy science, the saga known as pizzagate (supposedly the data were largely collected in an Italian diner) eventually led to 18 retractions and numerous expressions of concern about other papers.
Scientists rise up against statistical significance
Then there is the supercomputer IBM Watson, which has data-mining capabilities that would supposedly have revolutionized health care. IBM invested more than US$15 billion on a system that has not yet produced a single peer-reviewed paper but has instead, while employed at the University of Texas MD Anderson Cancer Center in Houston, produced “multiple examples of unsafe and incorrect treatment recommendations”. Don’t even mention former US president Donald Trump and COVID-19, the hydroxychloroquine hoax, conspiracy theories of varying stripes, the fake texts and images created by generative artificial intelligence, claims for the reality of extra-sensory perception, the effectiveness of power posing and so on.
Distrust is a veritable page-turner, and I finished it in a few sittings. On a higher level, it is a call for common sense, for scepticism, for methodological rigour and for epistemic modesty. I suspect most scientists will love it.
But in places it misses the mark. I found the lack of proper scientific referencing disappointing. I can hardly fault the author, as I have not included explicit references here either, for fear of not fitting the mould of Nature book reviews. But a book on disinformation ought to religiously cite its sources for any claim that it makes.
What ‘data thugs’ really need
Other miscues are more notable. Every year, the British Medical Journal (BMJ) publishes articles in its notorious (and entertaining) Christmas issue that purposely take things to extremes and draw conclusions that are patently ridiculous. Smith seems to take these articles at least semi-seriously. After demonstrating that many of the articles result from cherry-picking and P-hacking — torturing out statistically significant effects from data — he discusses a paper in which remote prayer was shown to improve outcomes for hospitalizations that had occurred several years earlier (because the authors were unwilling to assume that “God is limited by a linear time”). At this point, Smith notes, “I read that sentence twice and realized that this was a prank paper.” But so are the other BMJ papers that Smith critiques.
Distrust also pays little attention to the methodological improvements that scientists have embraced over the past decade to right the ship, or at least to counter data torturing and data mining. Only in the final chapter (‘Restoring the Luster of Science’) does the author provide a short, superficial discussion of ways to counteract questionable research practices. To my mind, the question of what should be done has a simple answer: academic journals should adopt the practices set out in Level 2 of the Transparency and Openness Promotion (TOP) Guidelines established by the Open Science Foundation in 2014. (Full disclosure: I was part of the committee of scientists that formulated the original guidelines.)
AI chatbots are coming to search engines — can you trust the results?
Distrust does mention pre-registration as a possible countermeasure: committing to a specific analysis plan in advance of data collection. But Smith argues that “relatively few journals currently require pre-registration — perhaps because it is so easy to game the system: collect the data, torture or mine the data to obtain interesting results, and then file a pre-plan that does not reveal that the study has already been completed.” However, many reputable medical journals effectively require pre-registration for publishing clinical trials. Although scientists can circumvent the rules, doing so would be outright fraud. Researchers who stoop that low might as well just make up the data from scratch.
The book joins a growing chorus saying that schools and universities ought to teach courses in quantitative literacy to counter the wider societal problem of scientific disinformation. French scholar Pierre-Simon de Laplace was already arguing for that in 1814; what Distrust often lacks is a prescription for what this might entail. One concrete recommendation is that “statistics courses in all disciplines should include substantial discussion of Bayesian methods”. I describe myself as a dedicated Bayesian and would argue that Bayesian methods of statistical inference are the bedrock of all rationality, so I fully support this idea; yet only two pages in the book are devoted to talking about this method.
The broader question is whether any educational initiative would do much good. As a species, humans have always been shockingly biased and gullible. Raw intelligence does not seem to provide much, if any, protection against misinformation. Alongside the entertaining examples of people believing weird things, I would have wanted a discussion of ‘why people believe weird things’. Michael Shermer’s 1997 book of that title could easily have been the main source for at least one extra chapter. The many examples of bad science in this highly readable, topical book are educational and distressing, but the focus is too much on the disease, and too little on the potential cures. Distrust lights a few candles, but mostly curses the darkness.
The author declares no competing interests.