Despite the pop title, “Calling Bullshit: The Art of Skepticism in a Data-Driven World” is actually a tough-minded guide to spot, deconstruct and refute misinformation, fake news, and false data that abound in our society. To me, the materials and ideas covered in the book can also serve as an educational resource for epidemiology students.
Coming from a non-English speaking background, I have always thought that ‘bullshit’ is a kind of foul language. And, I was somewhat shocked to see the word in the book’s title. However, it turns out that this interesting word is well accepted in scientific and philosophical discourse. Harry G. Frankfurt is perhaps the most important philosopher who has laid the theoretical foundation for the study of bullshit. In his bestseller treatise, On Bullshit, Professor Frankfurst does not exactly define what bullshit is, but he considers that bullshit is a by-product of public life where “people are frequently impelled — whether by their own propensities or by the demands of others — to speak extensively about matters of which they are to some degree ignorant.” Bullshitting is different from lying: liars know the truth but engage in a conscious act of deception, bullshiters don’t care about the truth and don’t consciously deceive. Frankfurt observes that bullshit is “one of the most salient features of our culture”.
In Calling Bullshit, Bergstrom and West go further and provide an operational definition of bullshit as follows: “Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade or impress an audience by distracting, overwhelming, or intimidating them with a blatant disregard for truth, logical coherence, or what information is actually being conveyed.” (Page 40).
So, ‘bullshit’ in this context seems like ‘nonsense’ or ‘falsehood’ to me.
In any case, based on that operational definition, the authors dive into a dense and thoughtful examination of the nature of bullshit (Chapter 3), causal inference (Chapter 4), and the fragility of science (Chapter 9). The authors draw background knowledge from many academic fields such as medical sciences, biology, statistics, social science, psychology, and language to cover issues relating to the interpretation of statistical data (Chapter 5), study selection bias (Chapter 6), data visualization (Chapter 7) and Big Data (Chapter 8). Each chapter is illustrated by a number of real-world cases that can enlighten science and non-science readers.
The authors declare right at the beginning of the book that “the world is awash with bullshit, and we’re drowning in it.” We are all familiar with it: misinformation, fake news, startup hype, and even downright lies dressed up in rhetoric and fancy phrases. Most people can easily identify and debunk those old style of bullshit. However, there is a new kind of nonsense (i.e., bullshit) that is articulated using the language of science and/or statistics. This new kind of nonsense is anything from scientific papers to “Big Data” and TED talks. Because this new style of bullshit tends to give an aura of scientific authority, most people don’t feel qualified to challenge it. And, that is where this book comes from — Bergstrom and West want to empower non-science readers with tools that allow them to deconstruct nonsense claims masquerade as scientific truth.
Nonsense in science
Sadly, many false claims (or bullshit?) are produced by scientific studies. In 2005, Professor John Ioannidis (Stanford University) shocked the scientific world with the proclamation that “most claimed research findings are false” (PLoS Med 2005). The editor of Lancet, Richard Horton, also agreed with Ioannidis but with a milder tone: “Much of the scientific literature, perhaps half, may simply be untrue.” Subsequent studies have proven that Ioannidis and Horton were right: most published research findings are indeed bullshit. The hotter a scientific field the more bullshit claims are (ie the Proteus Phenomenon). The reasons for many bullshit claims include, among others, wrong hypothesis, small sample size, design biases, small effect size, multiple tests of hypothesis and P-value hacking practice, and conflict of interest.
One of my favorite stories in the book is the case of mortality among musicians (pages 126–129). In 2015, there was a high profile study claiming that musicians of new genres such as Metal, Rap and Hip Hop died at younger ages than those of older genres such as Blues, Country and Jazz. The finding was sensational and covered by most international media outlets. It turned out that the finding was wrong. The study included musicians who had died, and did not sample all musicians. Blues, Country and Jazz have been around for a long time, and many performers lived into their eighties. However, Metal and Rap are new music, and the musicians included in the study died prematurely, but the majority of musicians of this genre (not included in this study) were still alive. So, Metal and Rap musicians have died must have died at younger ages because this music genre has not been around long enough for it to be otherwise. This case study is designed to highlight the problem of “right censored bias” which is not well appreciated among epidemiologists and social scientists.
Such a design bias can also explain why Oscar winners tended to live longer than less successful film stars, or bishops tended to died at older ages than curates, or patients on a certain medication (eg statin, beta-blockers) seem to live longer than those not on the medication. In fact, the observed advantageous longevity is due to a phenomenon called “immortal bias”. It is immortal bias that accounts for the better life expectancy of people who had reached higher ranks — professors vs lecturers, judges vs barristers, and generals vs lieutenants. People in higher ranks do not necessarily live longer than those in lower ranks; the observed association is spurious.
Nonsense in Big Data
Big Data, Machine Learning and Artificial Intelligence have produced a different kind of nonsense and hypes (pages 201–203). A case in point is a claim in a study published in Nature in 2009 that a Google algorithm could predict flu outbreaks simply based on keywords people use to search in Googe (eg “fever”, “headache”, “flu symptoms”, and “pharmacies near me”). The claim generated huge excitement, and techological enthusiasts took the case an an example of how Big Data and Machine Learning would change our lives forever. Some people even went further to declare that there would be no need for scientific method. The death of epidemiology was in sight. However, subsequent studies showed that the algorithm produced wrong predictions — a lot of wrong predictions — and the problem got so worse with time such that Google decided to axe the algorithm. It was then realized that the initial ‘experiment’ or ‘study’ was badly designed without any credible hypothesis or epidemiologic thinking. Moreover, the algorithm could only identify which words people searched for, but it could not ascertain why they were searching for those words. The epic failure is a powerful reminder that data cannot replace expert thinking, and that flawed data produce flawed outcome (ie ‘garbage in, garbage out’).
Another typical false claim produced by Machine Learning hype is the identification of criminality using facial images (pages 44–47). The story began with a paper, “Automated Inference on Criminality Using Face Images”, claming that an algorithm based on an individual’s headshot could predict criminality with 90% accuracy. The key assumption behind the algorithm was that machine is emotionless and unbiased which sound reasonable. However, the experiment appeared to be flawed right at the beginning when the authors chose to contrast photos of convicted criminals with photos of ‘normal’ people taken from the internet social media. In a simple experiment in their class room, West and Bergstrom asked students to consider a sample of photos of people who had been convicted criminals, and they found that “it seems less plausible to us that facial features are associated with criminal tendencies than it is that they are correlated with juries’ decisions to convict”. In other words, the algorithm’s predicted outcome was correlated with facial features that make a person convictable than a set of criminal inclinations.
Nonsense in statistics
Misuse of statistics, particularly the misinterpretation of P-values, has generated a lot of false claims in the scientific literature and popular media. P-value is an invention of Sir Ronald Fisher, a genius who helped create the modern Statistical Science. P-value was designed as an index for filtering out the signal (eg effects) from the noise (eg random error). Fisher suggested to use the threshold of 0.05 (eg P < 0.05) to distinguish between an admissible finding and a noise. He further recommended that all findings with P > 0.05 be ignored entirely. For almost 100 years, virtually all scientists have faithfully followed Fisher’s recommendation without questioning its wisdom. Actually, it can be shown that about 30% of findings with P ~ 0.05 are false positives.
Nevertheless, the threshold of 0.05 has become a sort of “passport for publication”. And, because publication leads to grants, promotion, awards and prestige, researchers are indirectly (or inherently) incentivized to get the desired P-values (eg P < 0.05) by resorting to questionable practice such as data dredging and ‘P-hacking’ (a term coined by Simmons in 2011). P-hacking is a practice where researchers conducted iteratively unplanned but selected analyses that yield a desired result. For every 100 totally negative results, with P-hacking practice researchers can turn 60 of the results to be positive. Thus, P-hacking have actively contributed to the production of so many nonsense claims in the scientific literature.
Detecting and debunking nonsense
Chapter 9, “The Susceptibility of Science”, includes a section on academia’s structural issues that indirectly produce nonsense claims. Most academics are judged one another by the number of papers published, and as a result, they direct all their efforts toward publications. That competition has effectively created a market for new players who are prepared to publish anything (Page 235), even the most bizzarre pieces. Then, there is the so-called predatory publication outlets that have been sucking hundreds of millions of dollars from the academic community by publishing nonsense papers. Researchers publishing in those predatory outlets do not just pollute the literature but also spread false information to the public.
Toward the end of the book, Bergstrom and West offer some useful tips to detect and refute bullshit and bullshitters. One of the tips is to use journalists’ approach to ascertain the source of information: “Who is telling me this? How does he or she know it? What is this person trying to sell me?” (page 243). The norm is “if a claim seems too good — or too bad — to be true, it probably is” (Page 249). Never trust anything that is too deterministic or too good to be true.
The authors remind us that it is much easier to generate bullshit than to refute it. They call to our attention the so-called ‘asymmetry principle’ (attributed to the Italian software engineer Alberto Brandolini) which states that “The amount of energy needed to refute bullshit is an order of magnitude bigger than [that needed] to produce it” (page 11). Similarly, blogger Uriel Fenelli remarks that “an idiot can create more bullshit than you could ever hope to refute”. The affair of MMR vaccine and autism is a case in point. In 1988, the British doctor Andrew Wakefield and his colleagues published a paper in the Lancet claiming that Measles, Mumps, Rubella vaccine was causally linked to autism. It has taken the medical establishment 16 years to gather convincing data to refute the claim.The case underlines the asymmetry principle that producing nonsense claim is a lot simpler and cheaper than disproving it.
Which parts of the book that I am not quite impressed? To me, there is room for improvement in Chapter 7 which covers issues of data visualization. The authors make some excellent points about bad graphical presentation, but their suggestion (eg the figure in page 176) can be seen as a problem that they criticize. The explanation of P-value (pages 216–217) can also be improved. The authors explain that P-value is the probability of the sample data arising by chance, but this is an incomplete definition; it should add that “under the assumption that the null hypothesis is true.”
Calling Bullshit was born out of a course that Jevin West and Carl Bergstrom developed a few years ago at the University of Washington’s (UW) Department of Biology. The course is actually called ‘Calling bullshit on Big Data’, and has become very popular not just in UW but also in non-US universities worldwide. I should add that West and Bergstrom also invented the “Eigenfactor”, a novel metric for evaluating the influence of scholarly publications.
This is not an average popular science book for entertainment or for in-flight reading, but it is a book for any occasion and anyone, from citizen skeptics to scientists. I would like to add that it is also a teaching quality that is highly accessible to non-math clinical researchers. For epidemiology teachers who have a hard time to find good real-world examples to illustrate the impact of study design biases, the ideas and stories presented in the book can be used as teaching resources.
The book was published at the time when the Covid-19 pandemic was unfolding. The pandemic has presented a wonderful opportunity for some unscrupulous individuals to spread misinformation, even disinformation. Surprisingly, these individuals are found even in the highest political hierarchy. Many doctors and scientists have also unintentionally become bullshitters in proclaiming categorically and arrogantly that they know what is right and what is not right in the containment of Covid-19. Perhaps they are not aware of the Buddha’s adage that “Uncertainty is the only certainty there is”.
As a scientist, I am used to so much nonsense and false claims (I am still hesitant to use the word ‘bullshit’) popped up in seminars, conferences and scientific journals. As a community citizen, I have seen daily nonsense in the popular media and social media. Surviving in such a sea of misinformation requires skills of data literacy. To that end, this book is a timely training course to equip scientists and citizens alike with statistical savvy at a time when immunity to misinformation is a fundamental skill for successful citizenry. I highly recommend the book to you.
CoI: I have neither personal nor professional relationships with the authors.