Estimating the Cost and Schedule of Mathematical Software

Mathematics and mathematical software combined with today’s powerful computers can deliver large improvements in speed and efficiency as well as new useful features. Mathematical software is in widespread use: digital video such as YouTube and Skype, digital audio such as MP3 files, JPEG images, speech recognition such as Apple’s Siri, computer generated images in movies and video games, and the Global Positioning System or GPS that tells people where they are, all multi-billion dollar markets today.

Mathematical software may offer solutions to many pressing problems such as curing cancer by helping develop systems of smart drugs that perform mathematical or logical calculations to identify and selectively kill cancer cells (see the previous post Animations of a Possible Cure for Cancer).

Mathematical software can also solve many small problems such as the need to relax with entertaining new audio/video effects for computer games and movies (see the previous post, Creating Cartoon Voices with Math).

The successful solution of problems using mathematics and mathematical software usually requires estimating the cost and schedule of mathematical software projects based on historical experience. A good coach and quarterback plan the strategy and plays for a successful football team based on what the players and team can actually do.

Estimating the Cost and Schedule of Mathematical Software

Mathematical software development is an uncommon area, unlike mainstream software development such as business web sites and user interface software development. User interface software, for example, is often extremely easy today. With modern scripting languages like Python and GUI (Graphical User Interface) Builders, it is possible to create working user interfaces rapidly with little risk if one sticks with standard GUI components such as buttons, sliders, and data entry fields. Mathematical software development is usually much harder than modern user interface software development — taking longer per line of code, involving much more debugging — and much less predictable.

Individuals and groups often have extremely optimistic ideas about the size and scope of mathematical software projects. Many people appear to be unaware of the size and complexity of various commonly used examples of mathematical software. For example, the widely used open-source x264 h.264 video encoder is over 62,000 lines of code developed between 2004 and 2011 with at least 18 contributors. The H.264 video compression standard is the video compression used in many YouTube videos, BluRay discs, and other high performance video systems.

The popular open source Independent JPEG Group JPEG image encoder/decoder used by many commercial and open-source image editors and viewers is over 52,000 lines of code developed between 2000 and 2011 with at least 13 contributors.

The LAME MP3 audio encoder, best known as the MP3 encoder plugin for the Audacity audio editor, is over 87,000 lines of code (about 40,000 lines of algorithmic C/C++ code and about 47,000 lines of Bourne shell installer code) developed between 1998 and 2011 with at least nine primary developers and at least 21 total contributors.

A line of code is comparable to at least one moving part in a physical machine. As a point of reference, the Space Shuttle Main Engine, one of the most sophisticated engines in the world, has about 50,000 moving parts. These mathematical programs are comparable in complexity to the most sophisticated physical machines in the world and often fail catastrophically due to tiny errors just as a rocket engine will.

Software engineering expert Barry Boehm’s original software cost estimating model COCOMO (Embedded) which stands for the Constructive Cost Model for Embedded Software Development — appears to give a rough estimate of the time it takes to develop these low level mathematical programs, although there is substantial variation between estimates and actual effort.

The model is:

[tex]MM (Man Months) = 3.6(KDSI)^{1.2} [/tex]

where Boehm’s Man Month is 152 man-hours (19 man-days) and KDSI is 1000 (Kilo) Delivered Source Instructions (lines of code). Blank lines and comments are not counted.

A few quick numbers from COCOMO Embedded:

1000 lines of code 3.6 man months
2000 lines of code 8.3 man months
5000 lines of code 24.8 man months

Basic COCOMO (Embedded)

Basic COCOMO (Embedded)

If the project is using consultants at an hourly rate, one should multiply the number of man months times 152 hours times the hourly rate of the consultants. If the project is using direct employees, one should use the cost of the employees per month. Boehm’s model omits one man-day per month for the average paid time off of the employee.

Cost = (Estimated Man-Months)*(152 hours)*(hourly rate)

or

Cost = (Estimated Man-Month)*(Monthly Salary and Overhead)

There is substantial variation between the actual and estimated effort from this simple model. In his book Software Engineering Economics (p. 84), Boehm notes:

From a practical standpoint, it is important to note that Basic COCOMO estimates are within a factor of 1.3 of actuals only 29% of the time, and within a factor of 2 only 60% of the time.

Barry Boehm advises against using his model for projects smaller than 2000 lines of code.

In the author’s experience, shorter projects, such as 1000 lines of code, are still in the same ballpark, on average, as predicted by Basic COCOMO Embedded but there is even more variation between the estimates and actual effort. The author once implemented the Advanced Encryption Standard (AES), about 1500 lines of code, in one week which was much faster than the model would predict or the author’s experience with other projects. The effort varies even more on small projects depending on the details of the algorithm and other factors that are hard to know in advance.

The free open-source CLOC (Count Lines of Code) utility is available for the major programming platforms: Windows, Mac OS X, Linux, and other common forms of Unix. There are now many free open-source programs that implement known mathematics and algorithms such as x264 and the other examples cited above. It is thus often possible to get a rough estimate of the size and scope of mathematical software projects that involve known mathematics and algorithms.

Limitations of Lines of Code

It is important to keep in mind that a line of code (LOC) is a rough estimate of the size and complexity of a computer program. In the C or C++ programming languages, these are both a single line of code:


a = 1;

if ( (a > b && a < c) || d < e) { a = sin(b+c) } else { a = tanh(a + b + c)/(d+e) };

The second example line of code would usually require more actual effort than the first example line. This is one of the reasons cost and schedule estimates based on counts of the lines of code vary a lot compared to the actual effort.

Because of the many problems with using lines of code for cost and schedule estimation, other methods such as function points have been developed. Function points are currently popular in books and articles on software cost and schedule estimation. However, function points were developed for business and user interface software. Function point estimation generally involves counting the number of inputs and outputs to the program such as data entry fields in a business program. This often predicts the actual effort well because many business and user interface programs have simple internal logic or mathematics and the effort is proportional to the number of inputs and outputs of the program. Business software usually uses only basic arithmetic, adding columns of numbers and similar simple operations.

Mathematical programs are generally extremely complex internally but often appear as only a few inputs and outputs. For example, a video compression program takes one input, the uncompressed raw video, and returns one output, the compressed video. Thus, methods such as function points tend to grossly underestimate the size and scope of mathematical software projects.

This weakness of function points has been recognized for many years and there are more advanced versions of the function point method that attempt to better estimate the size and complexity of complex algorithms hidden from the end user. However, it is still better to rely on lines of code for estimating the size and scope of mathematical software, despite the obvious limitations of using lines of code for cost and schedule estimation.

Scripting Languages (Matlab) Versus Low-Level Compiled Languages (C/C++)

One well known way to speed up the development of mathematical software is to use mathematical scripting languages such as Matlab, Mathematica, Octave (a free open-source program that is mostly compatible with Matlab), and many others. These are scripting languages similar to Python or PHP that have large well-integrated libraries of mathematical function combined with a list (e.g. Mathematica) or numerical array/matrix data type (e.g. Matlab).

In the author’s experience, the speed of development of mathematical software using Octave, MATLAB, or similar tools is generally 2-3 times faster on average than C/C++. This is mostly because the number of lines of code is reduced by a factor of 2-3. The Basic COCOMO Embedded model still gives a useful rough estimate of the actual effort required, but the number of lines of code input to the cost model is reduced!

There is a lot of variation in the increased speed of development from using Octave, Matlab, or similar tools, depending on the details of the algorithm and just plain luck. Some algorithms are well adapted to implementation in Octave/MATLAB and the speed of development gain can be 10-20 times the speed to develop in C/C++. Mathworks, which markets MATLAB, plays up cases like this. There are also some algorithms where there is no gain; the Octave/MATLAB code is pretty much the same as the C/C++ code.

Unfortunately, the speed of execution of the programs in Matlab, Octave, and similar tools is often significantly less than compiled code written in C, C++, or similar programming languages. With languages like Matlab that use numerical arrays, the penalty is not as great as it was a decade ago. Some operations such as the Fast Fourier Transform (FFT) often seem to be just as fast in Matlab or similar tools as compiled versions. However, one should generally plan for a penalty of 2-3 times in speed of execution. It is still often not practical due to speed of execution and memory usage problems to develop computationally intensive mathematical software such as video compression programs using tools such as Octave, Matlab, or Mathematica.

Conclusion

Mathematics and mathematical software can deliver large improvements in speed and efficiency as well as new useful features. Success is much more likely with estimates of the size and scope of mathematical software development based on historical experience.

Suggested Reading/References

Barry Boehm, Software Engineering Economics, Prentice-Hall, Englewood Cliffs, NJ, 1981

© 2012 John F. McGowan

About the Author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].

Leave a Reply