The recent New Yorker article The Truth Wears Off: Is there something wrong with the scientific method? by Jonah Lehrer (December 13, 2010) discusses several cases where a new scientific result was initially confirmed by several seemingly independent scientific studies and then subsequently faded away, sometimes to nothing. To quote briefly from the article:
But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis [professor of psychiatry at the University of Illinois at Chicago] has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.
The article cites a number of possible explanations for these cases ranging from confirmation bias to regression to the mean. None seem entirely satisfactory, either separately or together.
As a graduate student, the author attended a lecture by a senior particle physicist who expressed distinct skepticism of the validity of standard statistics in particle physics, referring parenthetically to several cases of reported results at high levels of statistical significance, several standard deviations, that subsequently proved invalid. A recent example, similar to the cases described in the New Yorker article, is the saga of the pentaquark. Not one, but several research groups, reported evidence of the pentaquark, which then seems to have faded away without a full explanation of the multiple observations. See, for example, the online article “The rise and fall of the pentaquark” in Symmetry Magazine. Indeed, from many years of experience, particle physicists tend to view even claims of new effects or new particles at a five standard deviation level cautiously.
Particle physics is often considered a very “hard” science without some of the problems presumably present in “softer” sciences such as medicine, biology, psychology, or parapsychology from which most of the examples in the New Yorker article are drawn. The New Yorker article notes striking parallels between the cases in mainstream science and the work of J.B. Rhine in parapsychology where a so-called “decline effect” has repeatedly been noted.
The New Yorker article is well written and well worth reading. Nonetheless, a few comments seem in order. The article avoids discussing the possibility of fraud. One possible explanation for cases of this type is organized scientific fraud where multiple research groups collude to produce confirmatory results: for example, to ensure the approval and adoption of a new drug developed and promoted by a large pharmaceutical company. Scientific fraud is extremely difficult to prove. In most cases, all a scientist can legitimately say is that he or she was unable to replicate the results of a another researcher. A suggestion of fraud would be unsupported speculation and quite possibly constitute legally actionable defamation or libel. Most proven cases of scientific fraud involve an insider, a colleague in the same laboratory or office, who blows the whistle. In many cases, the whistleblowers have suffered personally and professionally even if they were eventually vindicated.
The notion of replication seems straightforward and is heavily touted in popular science books and textbooks. Replication will probably weed out statistical flukes and gross errors that a competent researcher should have avoided anyway. However, what if the error is more subtle? The independent scientific study may simply replicate the same subtle error. For example, in particle physics, there are a number of extremely complex simulation programs such as the Lund Monte Carlo, the MINUIT fitting package, and the GEANT detector simulation package that are used by many different groups to simulate particle interactions, particle detectors, and analyze results. Computer programs, of course, have bugs. These bugs can be quite arcane and difficult to detect. Consequently, independent research groups may replicate the same spurious results due to a bug in a widely used software package.
Modern scientific research is often technically quite sophisticated. It often takes years of study and practice to master the theoretical or laboratory techniques of a field. Frontier research where ostensibly important new results such as the pentaquark are likely to be encountered often involves sophisticated new techniques. Consequently if a researcher or research group is unable to replicate a reported new result, they must always ask themselves: am I doing something wrong? The researcher who cannot replicate a result may be accused of lack of skill or even incompetence. This is particularly a concern where the new result is reported by a high status researcher or group, or embraced as the hot “new new thing” of the field. Hence, researchers concerned about their career may, like Thomas More in A Man For All Seasons, adopt a policy of prudent silence.
Even so, cases like the pentaquark or the several cases in “The Truth Wears Out” continue to raise questions about the validity of standard statistical methods in the real world. The New Yorker article touches repeatedly on this concern, without reaching any firm conclusions. “The Truth Wears Out” indirectly alludes to the enormous power of modern mathematical methods in concert with powerful computers and software to slice and dice data to produce, consciously or unconsciously, desired results or to construct elaborate models that will fit the data as discussed in the author’s previous posts Frankenstein Functions and Gold Fever. In conclusion, there is both empirical evidence and theoretical reason to entertain doubts about the validity of seemingly solid, well-established statistical methods in the complex world of modern scientific research.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].
Sponsor’s message: Check out Math Better Explained, an elegant and insightful ebook that will help you see math in a new light and experience more of those awesome “aha!” moments when ideas suddenly click.