Stat-Spotting: A Field Guide to Identifying Dubious Data
Joel Best
University of California Press
Berkeley and Los Angeles, California
2008, updated 2013
158 pages (Paperback)
My Rating: 4/5
Introduction
Stat-Spotting: A Field Guide to Identifying Dubious Data by Joel Best, author of Damned Lies and Statistics and More Damned Lies and Statistics, is a detailed, practical handbook of warning signs for false or misleading numbers and statistics and in some cases methods to determine if the numbers and statistics are false or misleading. We can say that a number or statistic is misleading if the implied meaning of the number or statistics in common English or (insert your language here) usage is false. The book is short and to the point. Stat-Spotting partially addresses a weakness in Professor Best’s previous books which provide general theory and several dramatic examples of false or misleading numbers and statistics, but little or no practical advice on how to identify such numbers on your own.
Professor Best’s area of expertise is alarming claims about children — people under the age of eighteen in the United States. This area has a long history of widely repeated claims and frightening numbers in the mass media and politics that are remarkably inconsistent with the actual numbers: one million missing children each year (1980’s), a specific claim of 50,000 stranger abductions and murders of children each year (1980’s), an epidemic of school shootings (1990’s), sadistic poisoning of Halloween candy (at least since the 1960’s), and many other claims that Professor Best discusses in his research articles and books.
In Stat-Spotting Professor Best discusses the importance of knowing benchmark numbers such as the total population of the United States (about 300 million), the number of babies born each year (about 4 million), the total number of deaths per year (about 2.5 million), and the total number of homicides per year (about 18,000 in the US). Knowing these numbers, we can immediately see some problems with the one-million missing children and the 50,000 stranger abductions and murders. So too it is not difficult to confirm that there are only a tiny handful of school shootings in the United States with a total student population of about 55 million. Remarkably there is not a single documented case of Halloween candy poisoning by someone other than the parent of the trick-or-treater in the United States. In Stat-Spotting Professor Best gives good guidelines and methods for identifying and ruling out these sorts of claims, but little guidance in how to evaluate the many more complex and often statistical claims found in the mass media and also often used to market medical procedures and drugs.
An Easy to Use Handbook
Stat-Spotting is organized like a field-guide for spotting birds or a simple handbook for technicians or engineers. Each warning sign or method is given a simple name, a description, a single detailed example, and a binocular icon for quick location. The book includes a short checklist of all of the warning signs and methods at the end of the book for quick reference. Most of the names, descriptions, and illustrative examples are clear and easy to follow.
Some of the examples are debatable and illustrate the weaknesses of Professor Best’s methods for more complex and difficult cases where the claims are often statistical in nature. What I mean by statistical is that the claim is an effect — a trend like global warming for example — that is comparable to or smaller in size that the natural variation in the measured quantity and is detectable only by averaging over many measurements or by fitting complex mathematical models to highly variable data. Such small but potentially real effects can easily be produced by conscious, unconscious, or accidental biased sampling of the data and other subtle errors or manipulations. Such statistical effects are often difficult or impossible to confirm or deny based on personal experience due to the high variability of the measured quantity. Many dubious claims about threatened children that Professor Best has investigated do not fall into this difficult statistical category. At present it is difficult and time-consuming for most of us, even those of us with extensive background and training in advanced statistics, to evaluate these statistical claims.
For example, in his section on “epidemics,” Professor Best uses the alleged “autism epidemic” as an example of a spurious epidemic supposedly caused by the redefinition of autism in the Diagnostic and Statistical Manual (DSM) IV in 1994 as the broader “autism spectrum disorders.” When I began seriously researching autism, I started from a similar premise but the reality is more confusing than Professor Best states in Stat-Spotting.
Professor Best starts by describing the autism epidemic as something reported in the media and on the Internet, perhaps suggesting some dubious unofficial and unscientific source. In fact the source of the widely quoted figures on autism is the US Centers for Disease Control (CDC) which has reported a dramatic and accelerating rise in the incidence of autism up to the latest numbers from 2010. Studies were showing an increase in autism prior to 1994. Indeed, we would expect an increase in cases of autism with a broader definition. However, we would also expect this increase to stop within a few years of the redefinition as the new definition came into widespread use; we would not expect it to both continue and actually accelerate — at least through 2010 if not the present.
One problem with Professor Best’s discussion of epidemics is that if a previously extremely rare phenomenon such as autism (the incidence of autism in the USA in the 1970’s is thought to have been about four cases per 10,000 people, less than one-tenth of a percent versus a purported one in 68 children today according to the CDC) increases sharply in frequency — a genuine epidemic — this will lead to increased attention and scrutiny, more precise and frequently broader definitions that recognize border cases that were previously ignored. There is a chicken and the egg problem. Did increased attention (the popular 1988 Tom Cruise movie Rainman is sometimes credited with seeding the autism epidemic) create a phony epidemic or did a real epidemic create the increased attention or some mixture of the two?
Conclusion
Stat-Spotting is a good complement to Professor Best’s earlier bestseller Damned Lies and Statistics. It is a detailed, practical handbook for identifying numbers and statistics that may be false or misleading and in some cases determining whether they are in fact false or misleading. It does not however provide good advice or tools for analyzing more complex and difficult cases where the claims are often statistical — claiming an effect such as global warming that is comparable to or smaller than the normal variation in the measured quantity and derived from averaging over a large number of highly variable measurements and often fitting a mathematical model to the data or applying abstruse, advanced statistical methods. These statistical claims may be real but can also easily be produced by conscious, unconscious, or accidental biased sampling of the highly variable data or other subtle errors or manipulations. These statistical claims are also difficult or impossible to confirm or deny based on personal experience due to the high variability of the measured quantity compared to the size of the alleged effect.
© 2015 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
I read Stat-Spotting when it first came out several years ago. It was both very interesting and very accessible, and it shows how people can knowingly or unknowingly use statistics to support less than valid conclusions. I think most do it by mistake (confirmation bias), but it simply goes to show that we should never take things as given and be expeditious in how we seek out information.