Big Data Flubs Donald Versus Hillary

Donald Trump’s unexpected upset victory over Hillary Clinton raises troubling questions about the use of mathematical models and “Big Data” in politics. With the possible exception of the USC Dornsife/LA Times Daybreak Election Poll, nearly all pre-election polls and surveys appear to have significantly underestimated the level of popular support for Donald Trump in the United States. Underestimated and usually by about the same amount — a full six percentage points, not a small discrepancy.

The New York Times published an article on October 12 “How One 19-Year-Old Man in Illinois Is Distorting National Polling Averages” debunking the USC/LA Times Poll. In retrospect, the “distortion” does not appear to have been a distortion at all.

At best this widespread failure of the pre-election polls suggests a systematic bias and at worst deliberate fraud to manipulate the outcome of the election. Indeed, Donald Trump was widely ridiculed for suggesting that the pre-election polls — not the voting which he later suggested might be rigged as well — were being rigged against him, a claim that seems more plausible now in light of the election results.

As the discussions of the USC/LA Times poll and its competitors reveal, the 2016 pre-election polls are not simple surveys of potential voters where the raw results are reported to the public. Rather, they are adjusted in complex ways to supposedly compensate for sampling errors and other biases in the raw results. Apparently not so well in the 2016 Presidential Election.

There is a long history of pre-election polls getting the election results wrong by sometimes striking amounts. Who can forget the famous image of Harry Truman holding a copy of the arch-conservative Republican Chicago Tribune announcing Thomas Dewey as the winner of the 1948 Presidential Election?

Harry Truman

Harry Truman, Defeated by Thomas Dewey in 1948

Questions have been raised for decades that some polls were manipulated. Usually the suggestion is that the polls are manipulated to make a candidate appear stronger than he or she actually is, in hopes of swaying the election in favor of that candidate. Whether this stratagem actually works is debatable as Donald Trump’s victory illustrates. Perhaps honesty after all is the best policy.

I did not vote for Donald Trump and find his erratic behavior and murky business connections alarming. The point of this article is not to endorse Trump or conservative claims of liberal media bias. The point is to emphasize the dramatic failure — once again — of supposedly sophisticated statistics, mathematical modeling, and Big Data in the current election.

At best these sophisticated mathematical methods failed dramatically to predict the outcome of the election. At worst, they were used as an intimidating mathematical smokescreen for unsuccessful propaganda apparently on behalf of Hillary Clinton.

We are increasingly inundated with mathematical models in modern politics. These include the models used to predict global warming. They include the controversial Value Added Models (VAM) used to evaluate, hire and fire teachers. Many other examples can be cited. Extremely powerful computers, high bandwidth networks, the proliferation of data from sensors and other devices, and a Big Data/Machine Learning craze are combining to shift public debate from open understandable arguments in English to arcane disputes about impenetrable statistics and mathematical models.

It is often extremely difficult, perhaps impossible, to evaluate these models. Global warming, for example, is a tiny effect much smaller than normal daily, seasonal, and yearly variations in temperatures. Teacher performance is difficult to evaluate due to substantial variations in students and teaching conditions beyond the control of even the best teachers.

In the case of the 2016 Presidential Election, however, we can see an example of these modern mathematical models clearly failing in real time.

© 2016 John F. McGowan

About the Author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].


  1. Gerald Belton November 9, 2016
    • rmf November 10, 2016
  2. Bill November 10, 2016
  3. John F. McGowan, Ph.D. November 14, 2016
  4. Aaron Montgomery April 7, 2017

Leave a Reply