# Review of Statistics Done Wrong: The Woefully Complete Guide

Statistics Done Wrong: The Woefully Complete Guide
by Alex Reinhart
Paperback: 176 pages
Publisher: No Starch Press; 1 edition (March 16, 2015)
Language: English
ISBN-10: 1593276206
ISBN-13: 978-1593276201
Product Dimensions: 8.1 x 5.9 x 0.5 inches

My Rating: 3/5

Introduction

Statistics Done Wrong: The Woefully Complete Guide by Alex Reinhart, a graduate student in statistics at Carnegie Mellon University, is a guide to many common errors in statistical analyses in scientific research papers with many examples drawn mostly from the biology and medical research literature. There is also a Statistics Done Wrong web site. The book is primarily for graduate students, research scientists, and other professional data analysts with some background in probability and statistics, generally equivalent to taking a good one year first course in probability and statistics in college.

Although Statistics Done Wrong is not highly technical with few formulas or calculations, medical patients, policy makers, and others who need to make sense of the many statistical analyses now used to market drugs, medical procedures, policies, and many other goods and services will likely find the book slow going unless they already have some background in probability and statistics whether through formal training or personal study. The book is weak in defining many technical terms such as ANOVA (analysis of variance) that are introduced at various points and I found myself looking up either definitions or more precise definitions on Wikipedia (not an ideal source) or in my collection of books and articles on probability and statistics.

Statistics Done Wrong paints a rather dismal picture of the quality of statistics in the scientific literature, especially in the fields of biology, medicine, and psychology, somewhat in the spirit of John Ioannidis’s claims. Ioannidis’s works such as “Why Most Published Research Findings are False” are cited a number of times.

Overall I would recommend the book but with some important reservations. In my previous reviews of Joel Best’s books on the misuse of statistics, I concluded that Best gave little guidance for readers seeking to evaluate complex statistical claims as opposed to simple but misleading or false numbers such as “one million missing children” or “three million homeless people” frequently encountered in mass media coverage of social problems. These complex statistical claims assert an effect such as global warming that is comparable to or smaller than the normal variation in the measured quantity and derived from averaging over a large number of highly variable measurements and often fitting a mathematical model to the data or applying abstruse, advanced statistical methods. These statistical claims may be real but can also easily be produced by conscious, unconscious, or accidental biased sampling of the highly variable data or other subtle errors or manipulations. These statistical claims are also difficult or impossible to confirm or deny based on personal experience due to the high variability of the measured quantity compared to the size of the alleged effect. Statistics Done Wrong directly addresses these more complex and difficult cases.

These complex statistical claims include many contentious and emotional issues such as the effectiveness and safety of vaccines, global warming (climate change), the effectiveness of chemotherapy and other cancer treatments (for example the Whipple surgical procedure for pancreatic cancer), and laboratory parapsychology. It is common for advocates of these claims to discuss them as if they were not statistical in nature, but rather “hard” facts such as my near absolute certainty that I cannot walk through the walls of my apartment or that if I hold out a rock in my hand and let go, it will with great certainty fall to the ground. Skeptics are increasingly labeled as Statistical Claim Deniers or Statistical Claim Denialists, in analogy to Holocaust Deniers or Denialists, an ad hominem tactic that has little to do with rational analysis and is highly questionable at best.

On the other hand, skeptics always seem to be able to find substantive issues such as those discussed in Statistics Done Wrong and Joel Best’s books that call into deep question any purely statistical claim. What has been described as an “infinite regress” occurs in which if a particular criticism is conclusively shown to be false (in itself a very difficult achievement) skeptics will simply find yet another potential problem with the statistical analysis. Skeptics it seems are rarely if ever able to replicate purely statistical claims and advocates almost always can. Statistics Done Wrong is unlikely to fully fix this mushy quality of statistics in the real world.

Fraud is often implied and an actual conflict of interest (the research in question was funded by Colossal Pharmaceuticals — see these twelve billion dollar settlements with the Department of Justice for inaccurate marketing of failed wonder drugs X,Y, Z etc. that nonetheless admit no fault) or a potential conflict of interest (alternative medical “experts” always seem to have a book or video that you can buy and might become a bestseller even if it is not a bestseller right now) can usually be asserted to support suggestions of fraud or unconscious bias.

The Much Maligned P-Value

The book devotes a chapter and many sections to the many problems with the p-value, one of the most commonly cited statistics in scientific research. Loosely, the p-value is the probability that the results of an experiment could have been produced by pure chance. Scientists often say that the results of an experiment are statistically significant if the p-value is less than or equal to 0.05 (five percent), a value chosen more or less arbitrarily by pioneering statistician Ronald Fisher. This seemingly straightforward concept hides a plethora of difficulties that have become increasingly well-known in recent years, leading some scientific journals to ban the p-value altogether. This may be a case of throwing the baby out with the bath water.

In some respects, the empirically mushy quality of statistics in the real world is blamed on the limitations of the p-value in Statistics Done Wrong. The book argues for the use of confidence intervals on the putative effect size as a solution. I agree with the author that quoting confidence intervals on effect sizes in addition to the p-value is an improvement in statistical practices in research, but confidence intervals in no way solve the “infinite regress” problem. Indeed, all a skeptic need do is ask whether the confidence interval is too small or the estimated effect systematically biased and in fact the alleged effect is consistent with no effect. With statistical claims where the alleged effect is comparable to or smaller than the typical variations in the measured quantity, there are many ways biased measurement, biased sampling, or other subtle issues can produce a small effect and an incorrect confidence interval.

Other Common Problems with Statistics in Scientific Research

Statistics Done Wrong has chapters and sections on a number of other common problems in statistics in scientific research, several of which overlap with the weaknesses of the p-value. One chapter covers statistical power, loosely the probability that a statistical test/experiment will correctly reject that the results of the experiment are due to pure chance — the null hypothesis in statistical terminology. The statistical power of an experiment increases toward 1.0 with the sample size — the number of independent measurements in the experiment. Statistics Done Wrong argues that many scientific papers fail to compute the statistical power and have low statistical power — have too few measurements to reach reliable conclusions. In most cases, the book is talking about the statistical power of a p-value test such as the standard p < 0.05 test. The book covers several other common problems including "pseudo-replication," "the base rate fallacy," and "torturing the data until it confesses."

Conclusion

In conclusion, I recommend Statistics Done Wrong for those seeking to evaluate complex statistical claims as well as researchers trying to improve their research, which seems to be the target audience of the book. If the reader does not have some background in probability and statistics already, he or she will probably need to get up to speed by studying introductory probability and statistics at the college level. Even if the reader has a background in probability and statistics, the reader will likely need to look up some terms and jargon to understand some sections in the book.

Statistics Done Wrong is unlikely to fully fix the empirically mushy quality of the purely statistical claims in the real world. Even if researchers follow the suggestions in the book, the “infinite regress” problem is likely to continue for contentious statistical claims. Historically, purely statistical claims have mostly graduated to “hard” facts when it has become possible to isolate the causes and effects and demonstrate a strong unequivocal effect on demand. We don’t have heated emotional debates about whether we can walk through solid walls because the effect (“OW! THAT HURT!”) is strong, unequivocal, not statistical, and easily reproduced by most people. Statistics can mostly show us the way to find new “hard” facts but it cannot provide the “hard” facts. An experiment or machine that isolates the causes and effects and demonstrates a strong reproducible effect with negligible statistical variation is needed. Rarely, if ever, is a statistical “fact” (scare quotes on fact intentional) a “hard” fact.