More Damned Lies and Statistics: How Numbers Confuse Public Issues
By Joel Best
University of California Press
Hardcover, 217 pages
My Rating: 3/5
In this new world of Big Data, Data Science, and the computing power of a 1980’s Cray supercomputer in every pocket, we are inundated with more and more statistics and mathematical models based on more and more “data.” This torrent of real science and pseudoscience (sometimes officially sanctioned) includes the dubious financial models that contributed to the 2008 financial crash and “Great Recession,” global climate models used to urge us to abandon oil and natural gas in favor of austerity or vaguely defined alternative energy technologies, confusing statistics used to convince us to get flu vaccinations that frequently don’t work, statistical claims of small improvements in average life expectancy from chemotherapy that lead millions of terminal cancer patients to pay hundreds of thousands of dollars and suffer enormous pain from “drugs” that would be classified as lethal poisons in almost any other context, and many other examples. In all of these examples, government officials and agencies have given implicit or explicit support for the statistics and models: Alan Greenspan and the Federal Reserve, NASA and former Vice-President Al Gore amongst other political leaders, the Centers for Disease Control (CDC), and the National Institutes of Health (NIH).
How are we as citizens and consumers to make sense of these statistics and mathematical models? Ironically, the standard probability and statistics taught in colleges and in AP Statistics at the high school level in the United States is inadequate, necessary but not sufficient to critically analyze these often expensive and eye-popping claims. The reader can take the standard courses (I have), master the intricacies of Bayesian versus frequentist statistics, download and master the free R programming language used by professional statisticians, read and reread every Wikipedia entry on probability and statistics and still be bamboozled. Why? Real statistics and mathematical models are created by real, fallible, greedy, ambitious, fearful, angry human beings and these human failings appear most often in the definition of what is counted, how it is counted, the selection and sampling of the data, the interpretation and presentation of the results, and sometimes in cheating.
To understand and detect these problems with real statistics and real mathematical models, we need to look beyond the dry academic literature on probability and statistics to works like Darrell Huff’s classic How to Lie with Statistics.
More Damned Lies and Statistics is the 2004 sequel to Professor Best’s Damned Lies and Statistics which I reviewed a few weeks ago. More Damned Lies and Statistics gives more examples of the misuse of statistics in the 1980’s, 1990’s, and early 2000’s including some dramatic examples of the strategic omission of statistics that might put a claim in a much different perspective.
Overall, More Damned Lies and Statistics is a good book, but not the great book that Damned Lies and Statistics is, repeating a number of points from the first book and also a number of points from How to Lie with Statistics. One of the things that I liked about the first book is that it complements How to Lie with Statistics with new and different insights.
Strong First Chapter
More Damned Lies and Statistics begins with a strong first chapter which is available for free at the University of California Press web site for the book (at the time of writing this review). Reading the free first chapter led me to buy the whole book. The later chapters are good, but not at the level of the first chapter or the first book.
More Damned Lies and Statistics starts with a bang: the “epidemic” of school shootings in the 1990’s culminating in the April 20, 1999 Columbine High School shooting in Littleton, Colorado. Professor Best proceeds to cite statistics showing that school shootings of all types are both extremely rare and actually declined during the 1990’s.
For example, there were a total of twenty-five (25) violent school deaths in 1999 including the Columbine deaths, down from forty-four (44) in 1993. The United States Census claims there were about 51.2 million students enrolled in schools (nursery school through 12th grade) in October 1999.
In his first book, Professor Best gives many examples where a big number, often one million or more, based on a guess or broad definition of a social problem was combined with rare, horrific examples of the problem to create a misleading if not false picture of the problem: one-million missing children and the horrific abduction and murder of six-year old Adam Walsh in 1981. In the case of school shootings, no statistics were given although terms like “epidemic” were used by commentators such as Dan Rather. Even a broad definition of school shootings gives a small number, much less than one-million.
Extremely rare examples can be highly misleading and are common in the mass media, self-help books, speeches by politicians, and a number of other widely available sources of information.
The Mass Media as Funhouse Mirror
Historically, the mass media referred to national television networks (ABC, CBS, NBC, and PBS in the United States, the BBC in the UK), major and increasingly national or even international newspapers such as the New York Times and the Wall Street Journal, and widely distributed magazines such as Time and Newsweek. Increasingly the mass media is moving to the Internet and incorporating search engines, especially the market leader Google, and certain web sites such as Wikipedia that frequently appear in the top hits from a Google search. Nearly all of the mass media is heavily or completely funded by advertising. Purely subscription-based models in which the end user pays are rare. He who pays the piper, calls the tune.
Both of Professor Best’s books illustrate the extent to which what we see, hear, and read in the mass media is like a funhouse mirror. Some issues and events are blown up way out of proportion to their actual size. Others are shrunken and squashed. Some issues and events are presented accurately. Some are heavily distorted. Some issues and events are completely fake. Some real things are not shown in the mirror at all.
These problems are not likely to decrease in the Big Data era when the generation of misleading or false statistics can be automated and “personalized,” customized on a per-viewer basis to appeal to our individual prejudices, biases, and cognitive weaknesses. Rapidly improving audio and video, computer generated imagery, and virtual reality technologies are making the distorted images in the funhouse mirror more believable and emotionally appealing, more real than real — hyperreal as some call it.
Freely available open-source programs such as the R programming language contain few if any tools for detecting the sort of mistakes or manipulations of data and analysis chronicled in Professor Best’s books. Using Google or other search engines to track down original data and critically analyze some claim may be difficult or impossible if Google (for example) is promoting the claim. Similar comments apply to other major information services such as those offered by Microsoft, Yahoo, Netflix (consider the documentaries pushed by Netflix), and other major Internet/computer companies.
Unplugging from the Matrix
While Professor Best’s books do a good job of showing the reader what to look for in mistakes or deliberate manipulations of statistics, he does not provide much practical advice on how time-strapped readers can effectively analyze the many statistics and mathematical models that we are increasingly inundated with in the modern world. As I have noted, relying on Google, the current default for many people, and Wikipedia, which frequently turns up in the top few hits from Google and other search engines, has obvious limitations. In his first book, Professor Best gave no practical advice on this important issue.
In More Damned Lies and Statistics: How Numbers Confuse Public Issues Professor Best does include a list of web sites that critique and help analyze statistics (circa 2004). However, this is a pretty short laundry list of web sites in the final chapter which is mostly devoted to arguing for teaching “statistical literacy” in schools and colleges.
Most of us lack the time and resources to conduct the standard analyses that Professor Best, a skilled academic, has performed on several issues such as missing children and that would be, most likely, taught in a statistical literacy course. It is not simply a matter of knowing how to critically analyze statistics and mathematical models that are used to promote public policies such as vaccination or wars but also goods and services such as mortgages, drugs and medical treatments. It is also a question of time, money, and other resources.
To safeguard our own interests and well-being, to be good citizens and make the world a better place, we need faster and cheaper methods and tools to protect against the sort of mistaken, misleading, and even fraudulent statistics described in Damned Lies and Statistics and More Damned Lies and Statistics.
More Damned Lies and Statistics is a good book but not a great book. If you have to choose between this book and the first book Damned Lies and Statistics, I would recommend the first book. If I could only afford two books on the subject, I would get Darell Huff’s How to Lie with Statistics and Damned Lies and Statistics. These two complement each other well (read How to Lie with Statistics first if you are new to the subject). More Damned Lies and Statistics expands upon some of the points in Damned Lies, introduces a few new points, but repeats a lot from Damned Lies and How to Lie with Statistics with some new examples.
The world is still waiting for a good book or open source software program, or perhaps combination of the two, to critically analyze the rising torrent of Big Data fueled statistics and mathematical models.
© 2015 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at firstname.lastname@example.org.