Underestimation of the cost, schedule, and risk of projects is common in software development and especially prevalent in mathematical software development. It is common to encounter extremely optimistic ideas about the duration and difficulty level of mathematical software projects, ironically one of the more difficult kinds of software development, as well as magical ideas about mathematics.
There is relatively little publicly available information on the scope and difficulty level of software projects of any kind. Some information is available in books and papers by various self-styled software engineering experts such as Barry Boehm, Donald Reifer, Capers Jones, and several others. These experts usually have consulting businesses and do not disclose their raw data and make limited disclosures of the results of analyses of their data.
Open source software projects can provide an excellent source of information on some aspects, such as the number of lines of computer code, of various software and mathematical software projects. This information can be independently verified by downloading the source code of an open source project and examining it, using tools like the CLOC utility to count the lines of code if needed.
Unfortunately, it is difficult to get accurate estimates of the actual effort expended on an open source project. It is difficult to verify if a contributor worked part-time, full time, or more than full time on the project. Some contributors may not be credited.
This article examines a data set of ninety-three NASA projects between the years 1971-1987 that was collected by Jairus Hihn of the NASA Jet Propulsion Laboratory (JPL). The data is the NASA 93 data set from the PROMISE Software Engineering Repository at the University of Ottawa.
The data lists the number of source lines of code (SLOC) for each project, the actual effort expended in staff months (SM), and classified the projects according to software engineering expert Barry Boehm’s COCOMO I (Constructive Cost Model). The data used for Boehm’s COCOMO I model is also available as a data set in the PROMISE repository.
A Note on Lines of Code
Lines of code is a very imperfect measure of the size and scope of a software project. For example, these are both one line of code in the C Programming Language:
a = 1;
and
a = (1.0/sqrt(2.0*M_PI))*exp(-(x - mean)*(x-mean)/(sigma*sigma));
There are several different definitions of lines of code used in the literature on software cost and schedule estimation. In additional, there are a range of alternatives that have been proposed to lines of codes, such as function points (currently popular).
Nonetheless, lines of code are somewhat reminiscent of Winston Churchill’s quote about democracy:
It has been said that democracy is the worst form of government except all the others that have been tried.
Function points were developed for business applications and rely heavily on counting the number of inputs and outputs to a program. This often works well for business applications where the applications are often relatively simple and the complexity scales with the number of inputs and outputs. Mathematical software such as video codecs often have few inputs (one compressed file or data stream) and outputs (uncompressed video) but a very complex internal implementation (tens of thousands of lines of code). This has been recognized as a weakness of function points for some time and there are some variations such as so-called “feature points” that attempt to address this problem.
Further, methods like function points require substantial training and study to measure and learn to use. They are not relatively intuitive like lines of code. There is much more data on software projects available in lines of code than function points.
One good way to think about lines of code is that each line of code is like a single moving part in a complex machine like a grandfather clock. Some parts are simple like the first line of code above. Some parts are more complex like the second line of code above. In general, lines of code would correspond to moving parts if one tried to implement a computer program as a mechanical device like Victorian era English mathematician Charles Babbage’s steam driven difference engine.
In mathematical software such as video compression, speech recognition, or other advanced applications, a line of code is usually directly equivalent to a single line of a mathematical formula or equation that a math teacher or professor might write on a blackboard or dry erase board in class. Most examples of mathematics taught in high school or college math courses cover at most a dozen blackboards. These are often building blocks of the mathematical solutions to real-world problems or cutting edge research problems. Most real-world examples of mathematical software such as video codecs such as the H.264, Flash, or Microsoft Silverlight video compression used by web sites today are many thousands of lines of code and correspond to hundreds or thousands of blackboards filled with mathematical equations and formulas.
Analysis of the NASA 93 Data
The plots below show various aspects of the NASA 93 data on the scope and effort of these software projects.
The COCOMO model divides software projects into three general categories or “modes”. These are the embedded, semi-detached, and organic. Embedded mode projects such as flight avionics software are most similar in difficulty to mathematical software projects. Indeed, due to safety issues, flight avionics software can be more demanding, requiring higher quality, than commercial applications such as video compression for entertainment. The software productivity in lines of code per staff month is now shown for the three kinds of projects.
The next plot compares the NASA 93 data to Barry Boehm’s Basic COCOMO I model for Embedded Projects (red line) and to a linear fit to the NASA 93 data (green line). As can be seen, there is considerable variation between actual and estimated effort, although the models are on average roughly correct and usually within a factor of three of actual effort.
The final plot shows the relative error between the actual effort and the estimated effort using the fitted model.
Conclusion
On average, the software productivity for demanding software applications such as embedded aerospace applications tends to be quite low, in the range of two-hundred (200) lines of code per staff month (mythical man month). However, there is wide variation between actual and estimated effort. The highest productivity (defined as lines of code per staff month) among the embedded projects in the NASA 93 data set was about 700 lines of code per month, and the lowest around 50 lines of code per month. Given the difficulties in defining lines of code and measuring the quality of the delivered software, it is impossible to evaluate the significance of these variations without more detailed information on the projects.
It is important to keep in mind that numbers like two-hundred lines of code per staff month do not refer to just typing two-hundred lines of code which can take as little as a few minutes. They refer to the entire software development process, usually including requirements analysis, software design, actual coding, and especially debugging to achieve the high levels of quality required for these applications.
There are several cases where a single error in a single line of mathematical software has resulted in the loss of a multi-million dollar mission or human lives. The loss of the Mariner I probe to Mars is frequently attributed to a small error in copying a mathematical formula into the probe’s computer software. In 1991 a subtle error in the mathematical software for a PATRIOT missile system resulted in an Iraqi SCUD missile penetrating to a US base in Dahran, Saudi Arabia and killing 28 soldiers. On June 4, 1995 the European Space Agency’s first launch of the new Ariane 5 rocket exploded due to an error converting a 64 bit floating point number incorrectly to a 16 bit integer number in software. The loss of NASA’s Mars Climate Orbiter (MCO) in 1999 has been attributed to an incorrect conversion between English units (foot-pounds) and metric units (meters-Newtons). Aviation and rocketry have especially demanding requirements for the quality of software.
While commercial applications of mathematical software such as video compression for entertainment are not always as demanding as mission-critical aerospace software, they can still be quite demanding. Viewers of compressed video such as Netflix, YouTube, BluRay, or DVD video have a pretty limited tolerance for visible artifacts and errors in the video. Almost any error in the implementation of a video codec can introduce visible artifacts or errors, so the codecs must, in general, achieve very high levels of quality, though not necessarily perfect.
Credits
Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada . Available: https://promise.site.uottawa.ca/SERepository
© 2012 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].
Appendix I: Source Code for Analysis
The analysis was performed using a program written in the free open source Octave numerical programming environment which is mostly compatible with MATLAB. Here is the code. It generates additional plots beyond the ones highlighted in the body of this article. The raw data file nasa93_raw_data.txt, which is extracted from the PROMISE data file follows.
% Analysis of NASA 93 software effort data % % (C) 2012 John F. McGowan, Ph.D. % E-Mail: [email protected] % data93 = dlmread('nasa93_raw_data.txt'); % COCOMO (Barry Boehm's Constructive Cost Model) MODE CODES (1=ORGANIC, 2=SEMI-DETACHED, 3=EMBEDDED) [e_row, e_col] = find(data93(:,7) == 3); [semi_row, semi_col] = find(data93(:,7) == 2); [org_row, org_col] = find(data93(:,7) == 1); actuals = data93(:,end-1:end); ksloc = actuals(:,1); % thousand (kilo) source lines of code staff_months = actuals(:,2); % also known as man month, work month, person month printf('making figure 1\n'); fflush(stdout); figure(1); loglog(ksloc, staff_months, 'o'); title('NASA 93 SOFTWARE PROJECT DATA'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); print('nasa93_raw_data.jpg'); logloc = log10(ksloc); log_staff_months = log10(staff_months); [p_nasa93, s_nasa93] = polyfit(logloc, log_staff_months, 1); % fit polynomial model to the data pred_logloc = polyval(p_nasa93, logloc); delta = 10.^pred_logloc - staff_months; % difference between predicted staff months and actual staff months relative_error = delta ./ staff_months; % (Estimated Staff Months - Actual Staff Months)/Actual Staff Months cocomo_x = 1:10:max(ksloc(:)); y = polyval(p_nasa93, log10(cocomo_x)); cocomo_org = 2.4 * (cocomo_x).^1.05; % Barry Boehm's Basic COCOMO 81 (Organic) model cocomo_semi = 3.0 * (cocomo_x).^1.12; % Barry Boehm's Basic COCOMO 81 (Semi-detached) model cocomo_e = 3.6 * (cocomo_x).^1.2; % Barry Boehm's Basic COCOMO 81 (Embedded) model printf('making figure 2\n'); fflush(stdout); figure(2); % loglog(ksloc, staff_months, 'o', ksloc, 10.^pred_logloc, '*'); loglog(ksloc, staff_months, 'o', cocomo_x, 10.^y, '-', "linewidth", 3, cocomo_x, cocomo_e, 'r-', "linewidth", 3); title('FIT TO NASA 93 SOFTWARE PROJECT DATA'); xlabel('Thousands of Lines of Code (KSLOC)'); % thousand source lines of code ylabel('Staff Months (SM)'); % staff month legend("NASA 93 DATA", "FIT 93", "COCOMO 81 (EMBEDDED)"); print('nasa93_fit.jpg'); A = 10.^p_nasa93(2); B = p_nasa93(1); x = 1:100:5000; x = x / 1000.0; y = A.*(x.^B); printf('making figure 3\n'); fflush(stdout); figure(3); %plot(x,y); hist(relative_error, 20); title('Relative Error of Estimates'); xlabel('(Estimated Staff Months - Actual Staff Months)/Actual Staff Months'); ylabel('Number of Projects'); print('nasa93_relative_error.jpg'); max_ksloc = max(ksloc(:)); mean_ksloc = mean(ksloc(:)); min_ksloc = min(ksloc(:)); max_mm = max(staff_months(:)); mean_mm = mean(staff_months(:)); min_mm = min(staff_months(:)); mean_are = mean(abs(relative_error(:))); % known as MMRE Mean Magnitude of Relative Error max_are = max(abs(relative_error(:))); min_are = min(abs(relative_error(:))); prod = 1000.0*ksloc ./ staff_months; max_prod = max(prod(:)); mean_prod = mean(prod(:)); median_prod = median(prod(:)); min_prod = min(prod(:)); std_prod = std(prod(:)); % standard deviation of software productivity printf('making figure 4\n'); fflush(stdout); figure(4); hist(prod, 20); title('Software Productivity of NASA 93 Projects'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod.jpg'); % PRED(30) is number of actuals within 30% of predicted value ind = find(abs(relative_error(:) <= 0.3)); pred30 = numel(ind); % gaussian/normal point of reference g_data = randn(1,93); mean_g = mean(g_data(:)); std_g = std(g_data(:)); skewness_g = skewness(g_data(:)); kurtosis_g = kurtosis(g_data(:)); % technically the kurtosis in Octae is the "excess kurtosis" which is defined so the kurtosis of the Normal distribution has an expected value of zero mean_re = mean(relative_error(:)); std_re = std(relative_error(:)); skewness_re = skewness(relative_error(:)); kurtosis_re = kurtosis(relative_error(:)); printf('making figure 5\n'); fflush(stdout); figure(5) hist(g_data*std_re + mean_re, 20); title('Normal Distribution Data'); ylabel('Number Samples'); xlabel('Scaled Relative Error'); print('nasa93_scaled_normal.jpg'); % figure 5 as JPEG % display the distribution of the kurtosis of the normal distribution fflush(stdout); printf("computing kurtosis of normal distribution\n"); fflush(stdout); g_data_k = randn(10000, 93); % 100 test sets of 93 samples g_kurtosis = kurtosis(g_data_k,2); figure(6); hist(g_kurtosis, 20); title('Excess Kurtosis of Normal Distribution'); xlabel('Kurtosis'); ylabel('Number of Test Sets'); print('normal_kurtosis_distribution.jpg'); g_skewness = skewness(g_data_k, 2); printf('making figure 7\n'); fflush(stdout); figure(7) hist(g_skewness, 20); title('Skewness of Normal Distribution'); xlabel('Skewness'); ylabel('Number of Test Sets'); print('normal_skewness_distribution.jpg'); % tails x = -10.0:0.1:10.0; y = (1.0/sqrt(2*pi))*exp(-x.^2/2.0); printf('making figure 8\n'); fflush(stdout); figure(8) plot(x,y,'-', 'linewidth', 3); title('Normal Distribution (Thin Tails)'); print('normal.jpg'); y_cauchy = 1.0./(1.0 + x.^2); norm_cauchy = 0.1*sum(y_cauchy); y_cauchy = y_cauchy ./ norm_cauchy; figure(9) plot(x,y_cauchy,'-', 'linewidth', 3); title('Cauchy Distribution (Fat Tails)'); print('cauchy.jpg'); printf('making figure 10\n'); fflush(stdout); figure(10); plot(x, y, '-', 'linewidth', 3, x, y_cauchy, '-g', 'linewidth', 3); title('Normal and Cauchy Distributions Together'); legend('Normal', 'Cauchy'); legend('boxon'); % put box around legend print('normal_cauchy.jpg'); year = data93(:,6); % year of project printf('making figure 11\n'); fflush(stdout); figure(11); years = 1970:1990; hist(year, years); title('NASA 93 Software Projects by Year'); xlabel('Year'); ylabel('Number of Projects'); print('project_years.jpg'); printf('making figure 12\n'); fflush(stdout); figure(12) hist(ksloc, 50); title('Size of NASA 93 Software Projects'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Number of Projects'); print('project_size_ksloc.jpg'); printf('making figure 13\n'); fflush(stdout); figure(13) hist(staff_months, 50); title('Size of NASA 93 Software Projects'); xlabel('Staff Months'); ylabel('Number of Projects'); print('project_size_sm.jpg'); printf('making figure 14\n'); fflush(stdout); figure(14) staff_years = staff_months / 12.; % convert to mythical man year/staff year hist(staff_years, 50); title('Size of NASA 93 Software Projects'); xlabel('Staff Years'); ylabel('Number of Projects'); print('project_size_sy.jpg'); % plots for different COCOMO Modes printf('making figure 15\n'); fflush(stdout); figure(15) loglog(ksloc(e_row), staff_months(e_row), 'o', cocomo_x, cocomo_e, 'r-'); title('NASA 93 DATA (EMBEDDED PROJECTS ONLY)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Embedded Data', 'Embedded Model', 'location', 'northwest'); legend("boxon"); print('nasa93_embedded_data.jpg'); % largest effort project is embedded as might expect printf('making figure 16\n'); fflush(stdout); figure(16) loglog(ksloc(semi_row), staff_months(semi_row), 'o', cocomo_x, cocomo_semi, 'r-'); title('NASA 93 DATA (SEMI-DETACHED PROJECTS ONLY)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Semi Detached Data', 'Semi Detached Model', 'location', 'northwest'); legend("boxon"); print('nasa93_semi_data.jpg'); % largest size (KSLOC) project is semi-detached printf('making figure 17\n'); fflush(stdout); figure(17) loglog(ksloc(org_row), staff_months(org_row), 'o', cocomo_x, cocomo_org, 'r-'); title('NASA 93 DATA (ORGANIC PROJECTS ONLY)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Organic Data', 'Organic Model', 'location', 'northwest'); legend("boxon"); print('nasa93_org_data.jpg'); printf('making figure 18\n'); fflush(stdout); figure(18) loglog(ksloc(org_row), staff_months(org_row), '*k', ksloc(semi_row), staff_months(semi_row), 'ob', ksloc(e_row), staff_months(e_row),'or'); title('NASA 93 DATA (ALL PROJECTS)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Organic', 'Semi-Detached', 'Embedded', "location", "northwest"); legend("boxon"); print('nasa93_by_mode_data.jpg'); % productivity by cocomo mode printf('making figure 19\n'); fflush(stdout); figure(19); hist(prod(org_row), 20); title('Software Productivity Organic Mode'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod_org.jpg'); printf('making figure 20\n'); fflush(stdout); figure(20); hist(prod(semi_row), 20); title('Software Productivity Semi Detached Mode'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod_semi.jpg'); printf('making figure 21\n'); fflush(stdout); figure(21); hist(prod(e_row), 20); title('Software Productivity Embedded Mode'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod_embedded.jpg'); printf("ALL DONE\n"); fflush(stdout);
nasa93_raw_data.txt
1,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,25.9,117.6 2,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,24.6,117.6 3,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,7.7,31.2 4,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,8.2,36 5,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,9.7,25.2 6,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,2.2,8.4 7,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,3.5,10.8 8,erb,avionicsmonitoring,g,2,1982,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,66.6,352.8 9,gal,missionplanning,g,1,1980,2,h,l,h,xh,xh,l,h,h,h,h,n,h,h,h,n,7.5,72 10,gal,missionplanning,g,1,1980,2,n,l,h,n,n,l,l,h,vh,vh,n,h,n,n,n,20,72 11,gal,missionplanning,g,1,1984,2,n,l,h,n,n,l,l,h,vh,h,n,h,n,n,n,6,24 12,gal,missionplanning,g,1,1980,2,n,l,h,n,n,l,l,h,vh,vh,n,h,n,n,n,100,360 13,gal,missionplanning,g,1,1985,2,n,l,h,n,n,l,l,h,vh,n,n,l,n,n,n,11.3,36 14,gal,missionplanning,g,1,1980,2,n,l,h,n,n,h,l,h,h,h,l,vl,n,n,n,100,215 15,gal,missionplanning,g,1,1983,2,n,l,h,n,n,l,l,h,vh,h,n,h,n,n,n,20,48 16,gal,missionplanning,g,1,1982,2,n,l,h,n,n,l,l,h,n,n,n,vl,n,n,n,100,360 17,gal,missionplanning,g,1,1980,2,n,l,h,n,xh,l,l,h,vh,vh,n,h,n,n,n,150,324 18,gal,missionplanning,g,1,1984,2,n,l,h,n,n,l,l,h,h,h,n,h,n,n,n,31.5,60 19,gal,missionplanning,g,1,1983,2,n,l,h,n,n,l,l,h,vh,h,n,h,n,n,n,15,48 20,gal,missionplanning,g,1,1984,2,n,l,h,n,xh,l,l,h,h,n,n,h,n,n,n,32.5,60 21,X,avionicsmonitoring,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,19.7,60 22,X,avionicsmonitoring,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,66.6,300 23,X,simulation,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,29.5,120 24,X,monitor_control,g,2,1986,2,h,n,n,h,n,n,n,n,h,h,n,n,n,n,n,15,90 25,X,monitor_control,g,2,1986,2,h,n,h,n,n,n,n,n,h,h,n,n,n,n,n,38,210 26,X,monitor_control,g,2,1986,2,n,n,n,n,n,n,n,n,h,h,n,n,n,n,n,10,48 27,X,realdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,15.4,70 28,X,realdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,48.5,239 29,X,realdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,16.3,82 30,X,communications,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,12.8,62 31,X,batchdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,32.6,170 32,X,datacapture,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,35.5,192 33,X,missionplanning,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,5.5,18 34,X,avionicsmonitoring,g,2,1987,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,10.4,50 35,X,avionicsmonitoring,g,2,1987,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,14,60 36,X,monitor_control,g,2,1986,2,h,n,h,n,n,n,n,n,n,n,n,n,n,n,n,6.5,42 37,X,monitor_control,g,2,1986,2,n,n,h,n,n,n,n,n,n,n,n,n,n,n,n,13,60 38,X,monitor_control,g,2,1986,2,n,n,h,n,n,n,n,n,n,h,n,h,h,h,n,90,444 39,X,monitor_control,g,2,1986,2,n,n,h,n,n,n,n,n,n,n,n,n,n,n,n,8,42 40,X,monitor_control,g,2,1986,2,n,n,h,h,n,n,n,n,n,n,n,n,n,n,n,16,114 41,hst,datacapture,g,2,1980,2,n,h,h,vh,h,l,h,h,n,h,l,h,h,n,l,177.9,1248 42,slp,launchprocessing,g,6,1975,2,h,l,h,n,n,l,l,n,n,h,n,n,h,vl,n,302,2400 43,Y,application_ground,g,5,1982,2,n,h,l,n,n,h,n,h,h,n,n,n,h,h,n,282.1,1368 44,Y,application_ground,g,5,1982,2,h,h,l,n,n,n,h,h,h,n,n,n,h,n,n,284.7,973 45,Y,avionicsmonitoring,g,5,1982,2,h,h,n,n,n,l,l,n,h,h,n,h,n,n,n,79,400 46,Y,avionicsmonitoring,g,5,1977,2,l,n,n,n,n,l,l,h,h,vh,n,h,l,l,h,423,2400 47,Y,missionplanning,g,5,1977,2,n,n,n,n,n,l,n,h,vh,vh,l,h,h,n,n,190,420 48,Y,missionplanning,g,5,1984,2,n,n,h,n,h,n,n,h,h,n,n,h,h,n,h,47.5,252 49,Y,missionplanning,g,5,1980,2,vh,n,xh,h,h,l,l,n,h,n,n,n,l,h,n,21,107 50,Y,simulation,g,5,1983,2,n,h,h,vh,n,n,h,h,h,h,n,h,l,l,h,78,571.4 51,Y,simulation,g,5,1984,2,n,h,h,vh,n,n,h,h,h,h,n,h,l,l,h,11.4,98.8 52,Y,simulation,g,5,1985,2,n,h,h,vh,n,n,h,h,h,h,n,h,l,l,h,19.3,155 53,Y,missionplanning,g,5,1979,2,h,n,vh,h,h,l,h,h,n,n,h,h,l,vh,h,101,750 54,Y,missionplanning,g,5,1979,2,h,n,h,h,h,l,h,n,h,n,n,n,l,vh,n,219,2120 55,Y,utility,g,5,1979,2,h,n,h,h,h,l,h,n,h,n,n,n,l,vh,n,50,370 56,spl,datacapture,g,2,1979,2,vh,h,h,vh,vh,n,n,vh,vh,vh,n,h,h,h,l,227,1181 57,spl,batchdataprocessing,g,2,1977,2,n,h,vh,n,n,l,n,h,n,vh,l,n,h,n,l,70,278 58,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,0.9,8.4 59,slp,operatingsystem,g,6,1974,2,vh,l,xh,xh,vh,l,l,h,vh,h,vl,h,vl,vl,h,980,4560 60,slp,operatingsystem,g,6,1975,3,n,l,h,n,n,l,l,vh,n,vh,h,h,n,l,n,350,720 61,Y,operatingsystem,g,5,1976,3,h,n,xh,h,h,l,l,h,n,n,h,h,h,h,n,70,458 62,Y,utility,g,5,1979,3,h,n,xh,h,h,l,l,h,n,n,h,h,h,h,n,271,2460 63,Y,avionicsmonitoring,g,5,1971,1,n,n,n,n,n,l,l,h,h,h,n,h,n,l,n,90,162 64,Y,avionicsmonitoring,g,5,1980,1,n,n,n,n,n,l,l,h,h,h,n,h,n,l,n,40,150 65,Y,avionicsmonitoring,g,5,1979,3,h,n,h,h,n,l,l,h,h,h,n,h,n,n,n,137,636 66,Y,avionicsmonitoring,g,5,1977,3,h,n,h,h,n,h,l,h,h,h,n,h,n,vl,n,150,882 67,Y,avionicsmonitoring,g,5,1976,3,vh,n,h,h,n,l,l,h,h,h,n,h,n,n,n,339,444 68,Y,avionicsmonitoring,g,5,1983,1,l,h,l,n,n,h,l,h,h,h,n,h,n,l,n,240,192 69,Y,avionicsmonitoring,g,5,1978,2,h,n,h,n,vh,l,n,h,h,h,h,h,l,l,l,144,576 70,Y,avionicsmonitoring,g,5,1979,2,n,l,n,n,vh,l,n,h,h,h,h,h,l,l,l,151,432 71,Y,avionicsmonitoring,g,5,1979,2,n,l,h,n,vh,l,n,h,h,h,h,h,l,l,l,34,72 72,Y,avionicsmonitoring,g,5,1979,2,n,n,h,n,vh,l,n,h,h,h,h,h,l,l,l,98,300 73,Y,avionicsmonitoring,g,5,1979,2,n,n,h,n,vh,l,n,h,h,h,h,h,l,l,l,85,300 74,Y,avionicsmonitoring,g,5,1982,2,n,l,n,n,vh,l,n,h,h,h,h,h,l,l,l,20,240 75,Y,avionicsmonitoring,g,5,1978,2,n,l,n,n,vh,l,n,h,h,h,h,h,l,l,l,111,600 76,Y,avionicsmonitoring,g,5,1978,2,h,vh,h,n,vh,l,n,h,h,h,h,h,l,l,l,162,756 77,Y,avionicsmonitoring,g,5,1978,2,h,h,vh,n,vh,l,n,h,h,h,h,h,l,l,l,352,1200 78,Y,operatingsystem,g,5,1979,2,h,n,vh,n,vh,l,n,h,h,h,h,h,l,l,l,165,97 79,Y,missionplanning,g,5,1984,3,h,n,vh,h,h,l,vh,h,n,n,h,h,h,vh,h,60,409 80,Y,missionplanning,g,5,1984,3,h,n,vh,h,h,l,vh,h,n,n,h,h,h,vh,h,100,703 81,hst,Avionics,f,2,1980,3,h,vh,vh,xh,xh,h,h,n,n,n,l,l,n,n,h,32,1350 82,hst,Avionics,f,2,1980,3,h,h,h,vh,xh,h,h,h,h,h,h,h,h,n,n,53,480 84,spl,Avionics,f,3,1977,3,h,l,vh,vh,xh,l,n,vh,vh,vh,vl,vl,h,h,n,41,599 89,spl,Avionics,f,3,1977,3,h,l,vh,vh,xh,l,n,vh,vh,vh,vl,vl,h,h,n,24,430 91,Y,Avionics,f,5,1977,3,vh,h,vh,xh,xh,n,n,h,h,h,h,h,h,n,h,165,4178.2 92,Y,science,f,5,1977,3,vh,h,vh,xh,xh,n,n,h,h,h,h,h,h,n,h,65,1772.5 93,Y,Avionics,f,5,1977,3,vh,h,vh,xh,xh,n,l,h,h,h,h,h,h,n,h,70,1645.9 94,Y,Avionics,f,5,1977,3,vh,h,xh,xh,xh,n,n,h,h,h,h,h,h,n,h,50,1924.5 97,gal,Avionics,f,5,1982,3,vh,l,vh,vh,xh,l,l,h,l,n,vl,l,l,h,h,7.25,648 98,Y,Avionics,f,5,1980,3,vh,h,vh,xh,xh,n,n,h,h,h,h,h,h,n,h,233,8211 99,X,Avionics,f,2,1983,3,h,n,vh,vh,vh,h,h,n,n,n,l,l,n,n,h,16.3,480 100,X,Avionics,f,2,1983,3,h,n,vh,vh,vh,h,h,n,n,n,l,l,n,n,h,6.2,12 101,X,science,f,2,1983,3,h,n,vh,vh,vh,h,h,n,n,n,l,l,n,n,h,3,38
I think you really missed the point of Mythical Man Month.
Mythical Man Month’s key takeaway is that adding people to a project makes the project slower — in other words, you can’t directly correlate man/staff months to development productivity since productivity doesn’t scale linearly with the number of bodies working on the task. (Thus, the whole idea of measuring a project in ‘man months’ is bunk because the idea of a ‘man month’ is mythical in the first place since the men aren’t working in isolation.)
And yet, the sentence in this article where you specifically point to the Mythical Man Month, you’re doing exactly the opposite of what it advises by correlating staff months to lines of code; even going as far as stating there’s an average, as if that average actually means anything.
The author responds:
I am using the phrase “Mythical Man Month” in a more general way to indicate the hopefully by now well known tendency to underestimate the cost and schedule of software projects as well as the many difficulties in planning such projects. The IBM/System 360 and OS/360 that Brooks writes about is now a well known early example of the many problems frequently encountered.
I don’t see a contradiction between my use of mathematical models like COCOMO and Brooks argument that adding software engineers to a late project can make the project take even longer. This problem can be taken as an argument for estimating the scope correctly at the start and hiring enough staff at the start since attempting to recover at a later date by hiring more staff later may fail.
I am careful to emphasize in both the text and illustrations the tremendous variation in how long software projects take as a function of lines of code (or other measures).
Models like COCOMO and simple averages of number of lines of code per staff month are only useful, in my opinion, for getting a ballpark or rough order of magnitude value for the scope of a project. Since I have encountered many cases where people underestimate the scope of mathematical software projects by factors of ten (10) to one-hundred (100), I think these models and numbers are useful for avoiding these kind of gross errors.
However, it would be a mistake to think these models or numbers can give anything like precise estimates of the duration and effort of software development projects valid to within ten or twenty percent as we might expect in other often repetitive physical activities such as building a house with a standard design.
Sincerely,
John