The primary purpose of software estimation is not to predict a project’s outcome; it is to determine whether a project’s targets are realistic enough to allow the project to be controlled to meet them. — attributed to Steve McConnell, Software Estimation: Demystifying the Black Art
This is third in a series of articles starting with The Scope of Mathematical Programming Projects. A number of terms are defined precisely and concepts discussed further in the second article in the series: LAME: A Case Study in Mathematical Programming. In the author’s experience, it is common to encounter extremely optimistic ideas about the scope of mathematical programming projects, where mathematical programming refers to mathematics more advanced than the basic arithmetic used in business applications and other common programming projects. Mathematical programming projects are much more likely to be successful with realistic ideas about the scope, cost, schedule, performance, and risk level of the project. This article discusses the scope of the software development and mathematical research and development of the widely used free open-source Maxima computer algebra system (CAS).
Maxima is a computer algebra system derived from the DOE (United States Department of Energy) MACSYMA computer algebra system which began development at the Massachusetts Institute of Technology (MIT) in the 1960’s. MACSYMA was one of the first computer algebra systems, a precursor to present day commercial computer algebra systems such as Mathematica and Maple. MACSYMA was developed in large part for use in theoretical physics, hence the funding by the US Department of Energy (formerly the Atomic Energy Commission). According to the Maxima web site:
The Maxima branch of Macsyma was maintained by William Schelter from 1982 until he passed away in 2001. In 1998 he obtained permission to release the source code under the GNU General Public License (GPL).
Maxima was selected for this case study for several reasons. Maxima is a successful widely used research tool with practical applications. It is an example of an “artificial intelligence” program that works, performing a range of functions that used to be performed only by human mathematicians: factoring polynomials, simplifying expressions, symbolic differentiation, symbolic integration, and a number of other mathematical tasks. Maxima has been hosted by the SourceForge open source web site since 2002. SourceForge maintains a record of downloads and copies of all releases of Maxima from Maxima 5.0 in May, 2002 to the present version. The techniques developed to assess the scope of the LAME free open source MP3 audio encoder project, which is also hosted on SourceForge, can be directly applied to Maxima.
Table 1 Downloads of Recent Windows Versions of Maxima
Table 2: Downloads of Recent Linux Versions of Maxima
Table 3: Downloads of Recent Source Code for Maxima
Table 4: Downloads of Recent MacOS Versions of Maxima
Table 5: Downloads of Versions of Maxima from 2002 to 2009
Table 6: Downloads of Versions of Maxima in 2002
NOTE: The data for these tables was acquired by selecting the tables displayed by the SourceForge web pages (follow the links in the table captions) with a mouse in Mozilla Firefox 3.6.13 on a Microsoft Windows XP Service Pack 2 PC, copying, and pasting the selection into the Notepad++ Version 5.2 text and code editor on Saturday, February 15, 2011. This procedure gives the cumulative downloads to date in the table in the pasted text. The text was then converted to an HTML table using the free open-source Kompozer HTML/web page editor. This same trick does not appear to work in the Safari web browser on the Macintosh; the pasted text does not include the total number of downloads. The number of downloads can be accessed by clicking on the graphic in the final column of the web page table.
Unlike many free open-source projects, Maxima does not have a prominent list of contributors on the web site or in the source code other than frequent mention of William Schelter. As will be discussed futher below, there has been extensive work on Maxima since 2001. A partial list of contributors was collected by using the DOS FIND command to search the files in Maxima for Copyright notices. For example:
C:\Documents and Settings\John F. McGowan\Desktop\maxima-5.23.2\share\vector>find "Copyright" * ---------- RTEST_VECT.MAC ---------- VECT.DEM ---------- VECT.MAC ---------- VECT.USG ---------- VECTOR.DEM ---------- VECTOR.MAC ---------- VECTOR.USG ---------- VECTOR_REBUILD.LISP ;; Copyright (C) Nov. 2008 Volker van Nek ---------- VECTOR_REBUILD.MAC Copyright (C) Nov. 2008 Volker van Nek ---------- VECTOR_REBUILD.USG Copyright (C) Nov. 2008 Volker van Nek (van dot nek at arcor dot de) ---------- VECT_TRANSFORM.MAC
The following contributors were listed in copyright notices in files in the src folder of Maxima 5.23.2
Copyright Holders Listed in *.lisp files in src folder (not numeric subfolder which appears to contain the slatec numerical library)
"MIT" William Schelter J. Villate (Jaime E. Villate) http://villate.org/ Barton Willis http://www.unk.edu/facultyandstaff.aspx?id=669 Paul Foley Andrej Vodopivec Volker van Nek Kevin M. Rosenberg Dieter Kaiser Raymond Toy James F. Amundson (Maxima project leader for several years) David Billinghurst The share folder contains much more of the source code for Maxima. The following additional contributors were identified from the files in the share folder: A number of the contributors listed above also contributed to files in the share folder. share/tensor Viktor T. Toth Valerij Pipin share/simplification Wolfgang Jenkner share/numeric Mario Rodriguez Riotorto share/libfgs Robert Dodier share/draw Mark H. Weaver share/contrib Edmond Orignac Martin Rubey Thomas Baruchel Dan Stanger Salvador Bosch Perez
This is not an exhaustive list of copyright holders (presumably contributors). Maxima is quite large and, for example, the share/contrib folder contains many sub-folders that were not searched due to time constraints.
This is twenty-two (22) individuals including William Schelter and not including whomever “MIT” may, in fact, refer to, presumably the original developers of MACSYMA. A substantial amount of work clearly took place before William Schelter became involved in the early 1980’s. Many, many files contain a copyright notice of the form:
;;; Copyright (c) 1984,1987 by William Schelter,University of Texas ;;;;; ;;; (c) Copyright 1981 Massachusetts Institute of Technology ;;; variously MIT copyrights are 1980, 1981, 1982
The Scope of the Maxima Project
Maxima is written largely in the LISP programming language, one of the oldest programming languages and a favorite of artificial intelligence research. There is some FORTRAN and a tiny amount of other languages (TCL, Bourne Shell, a few others). Maxima was expanded extensively between 2006 and 2008. More recent work has been in adding documentation.
|Release||Date||Lines of Code||Mythical Man Months||Notional Cost|
Table 7: Scope of Selected Releases of Maxima (from 2002 to Feb. 15, 2011) The number of lines of code were determined by using the free open-source CLOC (Count Lines of Code) utility run from a DOS batch file:
cloc --exclude-lang=HTML --force-lang="Lisp",lisp --force-lang="Lisp",mac --force-lang="Lisp",dem -csv maxima-5.23.2 > maxima-5.23.2.csv cloc --exclude-lang=HTML --force-lang="Lisp",lisp --force-lang="Lisp",mac --force-lang="Lisp",dem -csv maxima-5.20.1 > maxima-5.20.1.csv cloc --exclude-lang=HTML --force-lang="Lisp",lisp --force-lang="Lisp",mac --force-lang="Lisp",dem -csv maxima-5.16.3 > maxima-5.16.3.csv cloc --exclude-lang=HTML --force-lang="Lisp",lisp --force-lang="Lisp",mac --force-lang="Lisp",dem -csv maxima-5.9.2 > maxima-5.9.2.csv cloc --exclude-lang=HTML --force-lang="Lisp",lisp --force-lang="Lisp",mac --force-lang="Lisp",dem -csv maxima-5.6 > maxima-5.6.csv cloc --exclude-lang=HTML --force-lang="Lisp",lisp --force-lang="Lisp",mac --force-lang="Lisp",dem -csv maxima-5.0 > maxima-5.0.csv
Maxima is written in the LISP programming language. By default, CLOC expects LISP files to have the extension lsp. The LISP files in the maxima source have the extensions lisp, mac, and dem. The HTML documentation is excluded from the count of the lines of code. The number of man-months, the actual effort that may have been expended is estimated using the “organic” version of the Basic Constructive Cost Model (COCOMO) from Barry Boehm’s Software Engineering Economics. This is a very rough estimate and should not be taken too seriously; the goal is to get a ball park estimate of the size of the project. The estimated man-months and notional cost are cumulative numbers since the start of the Maxima/MACSYMA project in the 1960’s. Maxima includes at least one third party library in source code form, the slatec numerical FORTRAN library, which is also therefore included in the estimate.
The comma separated values (CSV) files of results generated by the cloc utility were analyzed using an Octave script:. Octave is a free open-source numerical programming tool that is mostly compatible with MATLAB.
% analyze maxima lines of code data % max5p23p2 = csvread('maxima-5.23.2.csv'); max5p20p1 = csvread('maxima-5.20.1.csv'); max5p16p3 = csvread('maxima-5.16.3.csv'); max5p9p2 = csvread('maxima-5.9.2.csv'); max5p6 = csvread('maxima-5.6.csv'); max5p0 = csvread('maxima-5.0.csv'); index = 5; sum5p23p2 = sum(max5p23p2(index:end, 5)); sum5p20p1 = sum(max5p20p1(index:end, 5)); sum5p16p3 = sum(max5p16p3(index:end, 5)); sum5p9p2 = sum(max5p9p2(index:end, 5)); sum5p6 = sum(max5p6(index:end, 5)); sum5p0 = sum(max5p0(index:end, 5)); % generate table of results printf("Release\tLines of Code\tMythical Man Months\tNotional Cost\n"); fflush(stdout); [man_months, people, time, cost] = cocomo(sum5p23p2/1000.0); printf("maxima 5.23.2\t%d\t%6.2f\t% 9.2f\n", sum5p23p2, man_months, cost); fflush(stdout); [man_months, people, time, cost] = cocomo(sum5p20p1/1000.0); printf("maxima 5.20.1\t%d\t%6.2f\t% 9.2f\n", sum5p20p1, man_months, cost); fflush(stdout); [man_months, people, time, cost] = cocomo(sum5p16p3/1000.0); printf("maxima 5.16.3\t%d\t%6.2f\t% 9.2f\n", sum5p16p3, man_months, cost); fflush(stdout); [man_months, people, time, cost] = cocomo(sum5p9p2/1000.0); printf("maxima 5.9.2\t%d\t%6.2f\t% 9.2f\n", sum5p9p2, man_months, cost); fflush(stdout); [man_months, people, time, cost] = cocomo(sum5p6/1000.0); printf("maxima 5.6\t%d\t%6.2f\t% 9.2f\n", sum5p6, man_months, cost); fflush(stdout); [man_months, people, time, cost] = cocomo(sum5p0/1000.0); printf("maxima 5.0\t%d\t%6.2f\t% 9.2f\n", sum5p0, man_months, cost); fflush(stdout); disp('ALL DONE');
which calls an Octave function cocomo.m that implements the Basic COCOMO model and estimates the cost by assuming an hourly rate of $50 per hour and that a man-month is 160 hours: This is an updated version of the cocomo function introduced in the first article in this series.
function [man_months, dev_time, people_required, cost] = cocomo(kloc, type, hourly_rate) % [man_months, dev_time, people_required, cost] = cocomo(kloc [, type, hourly_rate]) % % kloc (thousands of lines of code) % type (type of project: organic, semi-detached, embedded) % hourly_rate (rates in USD per hour used to calculate project cost) % % Implements Basic COCOMO (Constructive Cost Model), also known as COCOMO 81, from % Software Engineering Economics by Barry Boehm % if nargin < 2 type = 'organic'; end if nargin < 3 hourly_rate = 50.0; % $50/hour end c = 2.5; if strcmp(type, 'organic') a = 2.4; b = 1.05; d = 0.38; end if strcmp(type, 'semi') % semi detached a = 3.0; b = 1.12; d = 0.35; end if strcmp(type, 'embedded') a = 3.6; b = 1.2; d = 0.32; end man_months = a*(kloc)^b; dev_time = c*(man_months)^d; people_required = man_months / dev_time; cost = man_months * 160 * hourly_rate; end
The output of the maxima_anal.m Octave script was pasted into this article using the Kompozer HTML editor. Kompozer has a feature to convert text tables to HTML tables. The “Date” column was then added manually using the table editing features of Kompozer.
Maxima is clearly a substantial project, covering a calendar period from the 1960’s to the present. It probably incorporated both algorithm research and development as well as the implementation of known algorithms, especially during the 1960’s when it was one of the first computer algebra systems, a subject of intense research at the time.
There are many pressing problems ranging from speech recognition for mobile and other devices to cancer to energy shortages that may fall to mathematical research and development and mathematical programming combined with the enormous power of modern computers. As can be seen from Maxima, the dollar cost of such projects is not especially large. The guesstimated $12 million cost of Maxima is quite small compared to many publicly and privately funded activities. However, the calendar time of such projects is substantial, in this case over forty years. Very few mathematical programming projects can be completed in a calendar quarter (a few can). Generally, such projects take between six months and several years. Successful genuine research projects typically take years; Maxima/MACSYMA is an example, being a rare example of a successful artificial intelligence research and development project. Successful mathematical programming projects are much more likely with realistic targets and plans based on historical experience and data.
Software Engineering Economics
Barry W. Boehm
Prentice Hall (November 1, 1981)
Software Estimation: Demystifying the Black Art
Microsoft Press; 1 edition (March 1, 2006)
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].