But It Worked in the Computer Simulation!

People often assume that theoretical mathematical calculations and computer simulations will work well enough that machines or experiments will work successfully the first time or at most within a few tries (or similar levels of performance in other contexts). This belief is often implicit in the promotion of scientific and engineering megaprojects such as the NASA Ares/Constellation program or CERN’s Large Hadron Collider (LHC). One of the reasons for this belief is the apparent success of theoretical mathematical calculations and primitive computer simulations during the Manhattan Project which invented the first atomic bombs in World War II, as discussed in the previous article “The Manhattan Project Considered as a Fluke”. This belief occurs in many contexts. In the debate over the Comprehensive Test Ban Treaty (CTBT) which bans all nuclear tests on Earth, proponents (sincerely or not) argued that sophisticated computer simulations could substitute for actual tests of nuclear weapons in the United States nuclear arsenal. After the terrorist attacks of September 11, 2001, federal, state, and local government officials apparently decided to dispose of most of the wreckage of the World Trade Center and rely on computer simulations to determine the cause of the three major building collapses that occurred (instead of physically reconstructing the buildings as has been done in other major accident investigations). Space entrepreneur Elon Musk apparently believed he could achieve a functioning orbital rocket on the first attempt; he did not succeed until the fourth attempt, recreating a known but extremely challenging technology. This article discusses the many reasons why theoretical mathematical calculations and computer simulations often fail, especially in frontier engineering and science where many unknowns abound.

This article does not argue that theoretical mathematical calculations and computer simulations are not helpful or should not be performed. This is clearly not the case. Occasionally, as in the Manhattan Project, theoretical mathematical calculations and computer simulations have worked right the first time, even in frontier areas of engineering and science. In frontier areas such as major inventions and scientific discoveries, this appears to be the exception rather than the rule. Research and development programs and projects that implicitly or explicitly assume that theoretical mathematical calculations and computer simulations will work right the first or even within the first few attempts are likely to be disappointed and may fail for this reason. Rather, in general, we should plan on combining theoretical mathematical calculations and computer simulations with a substantial number of physical tests or trials. There is evidence from the history of major inventions such as the orbit capable rocket, that one should plan on hundreds, even thousands, of full system tests, and many more partial system tests and component tests. This argues strongly for using scale models or other rapid prototyping methods where feasible — or focusing research and development efforts on small scale machines as in the computer/electronics industry today, again where feasible.

Let Me Count the Ways

There are many reasons why theoretical mathematical calculations and computer simulations fail. Indeed, given the sheer number, it is somewhat remarkable that they do work at all. This section discusses most of the major reasons for failure.

Simple Error

Scientists, engineers, and computer programmers are human beings. Even the best of the best make mistakes. This is worth some elaboration. Most scientists and engineers today are professionally trained in schools and universities until their twenties (sometimes even longer). Much of this training involves solving problems in classes, homework, and exams that typically take anywhere from seconds to, in rare cases, several full days (say eight hours per day) to solve. In the vast, vast majority of cases, these problems have been solved many, many times before by other students; it is often possible to look up, learn, and practice the appropriate method to solve the problem — something not possible with genuine frontier science and engineering problems.

An “order of magnitude” is a fancy way of saying a “factor of ten”. Two orders of magnitude is a fancy way of saying a factor of 100. Three orders of magnitude is a fancy way of saying a factor of 1000. And so on. Even the most difficult problems solved in an advanced graduate level science or engineering course are typically orders of magnitude simpler than the problems in “real life,” especially in frontier science and engineering. At a top science and engineering university such as MIT, Caltech or (fill in your alma mater here), scoring 99% (1 error in 100) is phenomenal performance. Yet a frontier engineering or science problem can easily involve thousands, even millions, of steps. The Russian mathematician Grigoriy Perelman’s arxiv.org postings which are generally thought to have proved the Poincare Conjecture are hundreds of pages in length; Perelman left many steps out as “obvious”. A modern computer simulation such as the highly classified nuclear weapon simulation codes involved in the Comprehensive Test Ban Treaty debate can involve millions of lines of computer code. Even a single subtle error can invalidate a theoretical mathematical proof or calculation or a computer simulation. On complex “real world” problems, even the very best are likely to make mistakes because of the size and complexity of the real world problems. Computer programmers spend most of their time debugging their programs.

In computer simulations, consider a sophisticated numerical simulation program with one million (1,000,000) lines of code written by a team of top programmers with an error rate of one error per 1000 lines of code. If a computer program were implemented as a physical machine like a traditional mechanical clock (a very complex and sophisticated machine in its heyday), each line of code would be at least one moving part (gear, switch, lever, etc.). A computer program with one million lines of code is far more complex than a traditional pre-computer automobile or a nautical chronometer used to measure longitude (John Harrison’s first successful nautical chronometers had a few thousand parts). The Space Shuttle Main Engine (SSME), one of the most powerful and sophisticated engines in the world, has approximately 50,000 parts.

By one error in 1000 lines of code, we mean the programmer can write 1000 lines of code with only one error (bug) before any testing or debugging. This is truly phenomenal performance, but let us assume only one error for 1000 lines of code for the sake of argument: to make a point. This simulation program will have approximately 1000 errors! In general, it will take extensive debugging, testing, and comparison with real world data and trials to find and fix these 1000 errors. A subtle error may evade detection despite very extensive efforts.

The modern professional training in science and engineering produces some seemingly phenomenal individuals, such as the winners of the International Math Olympiad (IMO). Most of these people perform extremely well in school and university classes, homework, exams, and so forth. If you witness their performance in an academic setting, it resembles the magical mathematics depicted in popular culture, in television shows such as Numb3rs or Eureka for example (which depict the same kind of performance on very complex real world problems). Nonetheless they are likely to make errors on extremely complex real world problems, something they are not used to. They can become puzzled or worse angry when this occurs. It couldn’t be me; it must be those idiots in the next office — how did they ever graduate from MIT, Caltech, or (fill in your alma mater here)?

Many real world systems such as aircraft, rockets, particle accelerators, and the human body are complex integrated systems in which a very large number (thousands to millions) of parts must work together within very tight tolerances for the entire system to work correctly (fly, collide beams, stay alive and healthy). Even one undetected error can be fatal. This is beyond the performance level of even the very best students in school where the problems are generally simpler and the solutions are known; the proper methods can be studied and practiced prior to taking a test or exam. This near perfect performance in complex real world systems is usually achieved by an iterative process of trial and error in which some errors are found the hard way (the rocket blew up on the launch pad, the accelerator magnets exploded, the patient died 🙁 ) and eliminated.  The final example is not a snide comment; the author’s father passed away in 2008 participating in yet another unsuccessful clinical trial of a new cancer treatment.

A great deal of modern research consists of measuring some quantity to slightly greater accuracy (known disparagingly as “measuring X to another decimal point”) or computing some theoretical quantity to slightly greater accuracy. Despite the popular image of graduate students like mathematician John Nash in A Beautiful Mind or the physicist Albert Einstein part-time at the University of Zurich performing path-breaking breakthrough research, graduate students are frequently assigned or manipulated into projects of this type in modern research, even at top research universities like MIT, Caltech, or (fill in your alma mater here). These projects often involve repeating something that has been done many times before, only just a little better (hopefully). Although the error rates are noticably higher than academic coursework, the error rates are still far from representative of true frontier or breakthrough research and development. Hence, many graduate students, post-doctoral research associates, all the way up to full professors who have built a career measuring X to another decimal point have negligible experience with the truly high error rates frequently encountered in frontier research and development.

For example, in measuring X to another decimal point, one is often reusing complex simulations or analysis software that has been developed incrementally over many years, even decades (some programs now date back to the 1960’s and 1970’s). Thus much of the testing and debugging is largely done. One encounters far fewer errors. If one ventures into a frontier or breakthrough area, one may need to develop a new computer program from scratch, where the probability of serious errors at first is likely to be near one (1.0, unity) for the reasons discussed above even for truly exceptional individuals and teams.

It is worth understanding that popular science materials such as PBS/Nova specials, Scientific American articles, or Congressional testimony by leading scientists, rarely describes the research as “measuring X to another decimal point” or anything similar. Popular science materials usually focus on the quest for some “Holy Grail” such as unifying the fields in particle physics, a cure for cancer in biology and medicine, cheap access to space in aerospace, and so forth. The quest for the “Holy Grail” captures the imagination and is generally the public reason for funding the research. The Holy Grails have also proven exceedingly difficult to achieve and not necessarily amenable to throwing money and manpower at the problems. And often exceptional intelligence as conventionally measured has proven inadequate to find an answer. The “War on Cancer” for example has consumed about $200 billion in the United States alone since 1971 when President Nixon signed the National Cancer Act, a level of inflation adjusted funding comparable to the wartime Manhattan Project continued for forty years to date.

I should add that measuring X to another decimal point can be quite important. The astronomer/astrologer Tycho Brahe successfully measured the position of the planet Mars in its path through the Zodiac to another decimal point. While it may have been possible to infer the laws of planetary motion correctly prior to this measurement, there is no question that this improved measurement was essential for Johannes Kepler to discover the correct laws of planetary motion, a major scientific breakthrough that now has practical use in the computation of the orbits of communciation satellites, GPS navigation, Earth observing satellites, and so forth. Nonetheless, I will take the position that measuring X to another decimal place has gone to an unhealthy extreme in modern research. It fills curriculum vitae, produces millions of published papers, rarely leads to genuine breakthroughs and practical advances, and provides poor, misleading training for students in genuine breakthroughs, amongst other things by giving a misleading sense of the actual error rates that occur in real breakthroughs.

Most Theoretical Calculations and Simulations Are Approximations

Most theoretical calculations and simulations are approximations. A few grams of matter has on the order of 10^23 (ten raised to the twenty-third power) atoms or molecules. This is about one-hundred billion trillion atoms or molecules. By definition one mole of carbon-12 is 12 grams of carbon. One mole of a substance contains Avogadro’s number, 6.02214179(30)×10^23, atoms or molecules. Even small machines, e.g. computer chips, weigh grams. Automobiles weigh thousands of kilograms (1000 grams). Airplanes and rockets weigh many thousands of kilograms. Nuclear power plants probably weigh millions of kilograms. Each atom or molecule has, in general, several protons and neutrons in the atomic nucleus or nuclei, and several electrons in complex quantum mechanical “orbitals”. Even with thousands of supercomputers, it is impossible to simulate matter at this level of detail. Thus, on close examination, the vast majority of theoretical mathematical calculations and computer simulations are making signficant approximations. Sometimes these approximations introduce serious errors — sometimes subtle errors that are very difficult or impossible to detect in advance. The errors may become obvious after a difference between the theory and experiment (real data, physical trials) is detected (e.g. the rocket blew up on the launch pad).

Computers and Symbolic Math Cannot Reason Conceptually

The Webster’s New World Dictionary (Third College Edition) defines a concept as (page 288):

An idea or thought, especially a generalized idea of a thing or class of things; a notion.

Most human beings think almost entirely conceptually. The vast majority of human beings rarely if ever use abstract mathematical symbols to think, and then only in specialized contexts. A “cat” is a concept: a special kind of “animal,” another concept, distinguishable from, for example, a “dog,” yet another concept. Many things that scientists and engineers deal with are concepts: particle accelerators, rockets, airplanes, electrons, cancer, and so forth. In only a few special cases, such as simple geometrical forms like the perfect sphere, can we express the concept in purely symbolic mathematical terms that can be programmed on a computer.

Most major inventions or scientific discoveries started out as a concept in the inventor or discoverer’s mind: James Watt’s separate condenser for his steam engine, Kepler’s hazy notion of an elliptical orbit, Faraday’s mental picture of pressure and motion in the mysterious aether to explain electricity and magnetism, eccentric (to put it mildly) rocket pioneer Jack Parson’s concept of combining a smooth fuel such as asphalt with a powdered oxidizer such as potassium perchlorate to overcome the severe problems with powdered explosives, and so forth. To this day, we cannot express most concepts in mathematical symbols that can be programmed on a computer. In some cases, we can simulate a specific instance of the concept on a computer or through traditional pencil and paper derivations or calculations.

Johannes Kepler was able to find a mathematical formula that corresponded to his hazy concept of an elliptical orbit in Apollonius of Perga’s Conics. He was lucky that the mathematics of the ellipse had already been worked out and corresponded closely to the motion of the planets. James Clerk Maxwell, after many years of effort, was able to find a set of differential equations, Maxwell’s Equations, that corresponded to Faraday’s mental concepts of pressure and motion in the aether. Even in cases where specific mathematics can be found (in a book, for example) or developed for a concept (from a detailed mechanical model as Maxwell did with Faraday’s ideas, for example), we still cannot represent the process of the transformation from the mental concept to the mathematics either in formal symbolic mathematics or in a computer program.

Computers and symbolic mathematics cannot reason conceptually. Most of the research in artificial intelligence (machine learning, pattern recognition, etc.) has been an attempt to find a way to do this. Most of this research tries to replicate the process by which human beings identify classes and their relationships (concepts) and correctly assign objects (cats, dogs, speech sounds, etc.) to these classes. So far, we have been unable to either understand or duplicate what human beings do, in many everyday cases effortlessly. A conceptual error is often beyond the ability of either formal symbolic mathematics or computer simulations to detect or identify; it can show up in real world tests very dramatically as in a rocket exploding on launch or a miracle cancer drug failing in clinical trials.

Conceptual reasoning is poorly understood. It is not clear how to teach it, if it can be taught, and how to measure it or even if it can be measured. Very basic questions about its nature are unresolved. Conceptual reasoning appears to play a major role in many major inventions and scientific discoveries, so-called breakthroughs. In this context, it is particularly mysterious. Many inventors and discoverers describe a flash of insight, usually following many years of failure and frequently occurring on a break such as a recreational walk, in which a key concept or even the entire answer occurs to them. These are reports, anecdotal data. We cannot be absolutely sure they are true, just like reports of UFO sightings, which are actually more common than breakthroughs. Just to be clear there is a clear possible motive for inventors or discoverers to make up the story of a “Eureka” experience; they, in fact, stole their work from someone else and need to explain a sudden leap forward in another way. There are inventions and discoveries where there are serious questions about what really happened, who did what, and the work may well have been stolen. Even so, the reports of “Eureka” experiences are extremely common in the history of invention and discovery and they resemble less dramatic flashes of insight or creative leaps reported/experienced by many people (including the author).

These conceptual skills or phenomena may account for why some inventors and discoverers do not seem as intelligent as one might expect, and certainly not as intelligent as inventors and discoverers are depicted in popular culture, and also why platoons of the best and brightest scientists, as conventionally measured, have failed (so far) in such heavily funded efforts as the War on Cancer.

The Math is Intractable

In some cases, we believe that we have the correct math and physical theory to solve a problem. However, the math has proven intractable to solve (so far) either through traditional pencil and paper calculations and symbolic manipulations or through numerical simulation on a computer. The Navier-Stokes equations are thought to govern fluids (liquids and gas such as water and air). Nonetheless, the solution of the Navier-Stokes equations in fluid dynamics has proven intractable to date. This is one of the reasons that the Navier-Stokes equations are included in the Clay Mathematics Institute’s Millenium Problems.  Sometimes it may not even be clear that the math is intractable, resulting in reliance on spurious theoretical mathematical calculations or computer simulations.

New Physics

This article is concerned with the use of mathematics and computer simulations for real world problems, not proving theorems in pure abstract mathematics. In this context, inevitably, one is trying to predict or simulate the actual physics of the real world. How do mechanical devices, electricity, magnetism, gravity, and so forth work in the real world? That is the question. If the theoretical mathematical calculations or computer simulations are based on incorrect physics, they will probably fail. In some cases, the fundamental physics may be known but the implications, the theory derived from the fundamental laws of physics, is somehow in error. In other cases, truly new physics may be involved.

One tends to assume that new physics would stand out, that it would be obvious that it is present. Yet this is not always the case. Human beings tend to be conservative. We do not embrace new ideas quickly or easily, especially as we get older. Small discrepancies and anomalies can occur and accumulate for long periods of time without the presence of new physics being recognized. This occurred, for example, with the Ptolemaic theories of the solar system. These theories had predictive power, but they kept making errors. It took about a century of work by Nicolaus Copernicus, Galileo Galilei, Tycho Brahe, Johannes Kepler, Isaac Newton, and many others to overturn this theory and develop a superior, much more accurate theory. It did not happen overnight for solid scientific reasons — Copernicus’s original heliocentric theory was measurably inferior to the prevailing Ptolemaic theory, contrary to the impression given in science classes.  Galileo’s extreme arrogance and grossly inaccurate theory of the tides did not help either.

Electricity and magnetism had been known for thousands of years, both large scale phenomena like lightning and small scale effects such as static electricity or lodestones. Nonetheless, without the battery and the ability to control and study electricity and magnetism in a laboratory, it was almost impossible to make progress or discover the central role electricity and magnetism play in chemistry and matter. New physics can be hiding in plain sight and causing anomalies that are persistently attributed to selection bias, instrument error, or other mundane causes.


There are many reasons that theoretical mathematical calculations or computer simulations may fail, especially in frontier science and engineering where many unknowns abound. The major reasons include:

  1. simple error (almost certain to occur on large, complex projects)
  2. most theoretical mathematical calculations and simulations are approximations
  3. symbolic math and computers cannot reason conceptually and may not detect conceptual errors
  4. the math may be intractable
  5. new physics.

In the history of invention and discovery, it is rare to find theoretical mathematical calculations or computer simulations working right the first time as seemingly occurred in the Manhattan Project which invented the first atomic bombs during World War II. Indeed, it often takes many full system tests or trials to achieve success and to refine the theoretical mathematical calculations or simulations to the point where they are reliable. Even after many full system tests or trials, theoretical mathematical calculations or simulations may still have significant flaws, known or unknown.

This argues for planning on many full system tests of some type in research and development. In turn, this argues strongly in favor of focussing research and development efforts on small-scale machines, or using scale models or other rapid prototyping methods where feasible. This does not mean that theoretical mathematical calculations and computer simulations should not be used. They can be helpful and, in some cases, such as the Manhattan Project may prove highly successful. However, one should not plan on the exceptional level of success apparently seen in the Manhattan Project or some other cases.

In these difficult economic times, almost everyone would like to see more immediate tangible benefits from our vast ongoing investments in research and development. If current rising oil and energy prices reflect “Peak Oil,” a dwindling supply of inexpensive oil and natural gas, then we have an urgent and growing need for new and improved energy technologies. With increasing economic problems and several bitter wars, it is easy to succumb to fear or greed. Yet it is in these difficult times that we need to think most clearly and calmly about what we are doing to achieve success.

© 2011 John F. McGowan

About the Author

John F. McGowan, Ph.D. solves problems by developing complex algorithms that embody advanced mathematical and logical concepts, including video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].



  1. Eric Worrall June 7, 2011
  2. Morgan Rodwell June 7, 2011
    • Brian H June 8, 2011
  3. tadchem June 7, 2011
  4. Stephen June 7, 2011
  5. Stephen June 7, 2011
  6. John McGowan June 7, 2011
  7. John McGowan June 7, 2011
    • Brian H June 8, 2011

Leave a Reply