Introduction
This is the second in a series of articles on mathematical programming starting with The Scope of Mathematical Programming Projects. This article discusses LAME (LAME Ain’t an MP3 Encoder), a widely used free opensource MP3 (MPEG Layer 3) audio encoder. MP3 is probably the most widely used audio compression file format. It is part of the MPEG (Moving or Motion Pictures Expert Group) audio/video family of audio/video compression and transmission formats. MPEG is organized under the auspices of ISO (the International Organization for Standardization). Despite appearances, ISO is not an acronym.
3.98.4  20100414  1,221,748  downloads 
3.98.2  20080922  2,055,166  downloads 
3.98  20080703  443,886  downloads 
3.97  20060924  2,541,738  downloads 
3.97beta3  20060820  193,458  downloads 
3.97beta2  20051128  261,121  downloads 
3.97beta  20050918  94,215  downloads 
3.96.1  20040725  680,257  downloads 
3.96  20040411  148,512  downloads 
3.95.1  20040112  118,610  downloads 
3.95  20040111  15,134  downloads 
3.94beta  20031218  16,485  downloads 
3.93.1  20021201  262,901  downloads 
3.93  20021117  30,596  downloads 
3.92  20020415  213,508  downloads 
3.91  20011229  164,299  downloads 
3.90.1  20011222  19,157  downloads 
3.90  20011221  26,509  downloads 
3.70  20010912  88,364  downloads 
3.88beta  20010912  13,523  downloads 
3.89beta  20010912  58,914  downloads 
Table 1: Downloads of the LAME MP3 Audio Encoder from SourceForge
LAME is probably most widely used as a plugin to add MP3 encoding to the widely used free opensource Audacity audio player and editor. By default, due to MP3 patent and licensing issues, Audacity can import but not export MP3 audio files. LAME can be installed as a shared library of dynamically linked library which adds MP3 encoding to Audacity and other programs on a computer. LAME is chosen for this case study because it is a successful, widelyused, free open source mathematical programming project which is also relatively simple. A number of successful free open source mathematical programming projects are toolkits or libraries of many functions and algorithms, often incorporating other free open source projects as components. LAME is largely a selfcontained implementation of the MP3 audio compression algorithm; it does incorporate parts of the free opensource mpg123 MP3 audio decoder player and decoder library.
LAME is chosen in part because it is of high quality and clearly comparable to successful high quality commercial mathematical programming products. Indeed, it is probably more widely used than many commercial products. Mathematical programming projects tend to be components such as shared libraries or dynamic linked libraries that are incorporated within other systems. From a usability point of view, one usually wants to hide complex mathematics from most end users. Exceptions are specialized technical tools where the intended end users are technically sophisticated: engineers, scientists, mathematicians, and so forth. Examples are mathematical prototyping and development tools such as Mathematica and MATLAB. Comparable free open source examples include Octave, Scilab, SAGE, R, and similar tools. This article seeks to address the question of what is the actual scope, cost, schedule, quality, performance, and risk level of successful mathematical programming projects.
In this article, mathematical programming refers to programming largely consisting of advanced mathematics beyond the basic arithmetic used in, for example, bookkeeping and bookkeeping software. Some programming such as user interface software makes little or no use of even basic arithmetic. Many business applications certainly make heavy use of basic arithmetic. Programming projects that involve more advanced mathematics are more the exception than the rule, especially in the commercial world. In this article, specific mathematical methods referred to by mathematical programming include calculus, analytic geometry, linear algebra, transforms such as the Fourier transform, and so forth. In practice, mathematical programming projects typically use mathematical methods taught in late high school (11/12th grade) and the first two years of a good college or a university in the United States. One occasionally encounters more advanced mathematical methods. Specific applications include image, audio, and video compression, speech recognition, the Global Positioning System (GPS), risk models in insurance and other financial fields, and many other specialized technical applications.
In the author’s experience, it is quite common to encounter extremely optimistic ideas about the scope of mathematical programming projects and mathematical research and development. Today, in practical and applied projects, mathematical programming and research and development usually overlap a great deal. One may encounter expectations that a project can be completed in a few weeks or at most three calendar months, a fiscal quarter, that past experience would indicate is likely to take anywhere from six months to several years, or are even technically impossible. To the extent that projects of this type are actually funded and undertaken, they probably often end badly with much wasted effort, time, money, and frustration. These fantasy projects may crowd out funding and support for more realistic efforts, whether short term alternatives more appropriate for organizations with limited resources or time scales, or longer term projects that are much more likely to succeed.
Let it be clear—and this is a judgment which the Members of the Congress must finally make—let it be clear that I am asking the Congress and the country to accept a firm commitment to a new course of action—a course which will last for many years and carry very heavy costs: 531 million dollars in fiscal ’62—an estimated seven to nine billion dollars additional over the next five years. If we are to go only half way, or reduce our sights in the face of difficulty, in my judgment it would be better not to go at all.
The Goal of Sending a Man to the Moon (May
25, 1961)
While mathematical programming projects are generally much smaller in dollar cost than the successful Apollo Moon program, it is worth recalling President Kennedy’s words. Serious successful mathematical programming projects tend, with some exceptions, to take a significant amount of time: six months to several years. They also often involve large amounts of especially nitpicky painstaking debugging. Without a firm commitment based on realistic expectations, they are likely to fail and it is probably better not to start.
With the widespread adoption of powerful computers, mathematical programming is becoming more widespread. Mathematical programming may enable us to harness the vast power of modern computers to solve a wide range of pressing problems ranging from cancer to energy shortages. Remarkably the cost to society of unrealistic ideas about mathematical programming and research and development may already run into the trillions of dollars due to the sizable contribution of inaccurate models of the value of mortgage backed securities in the current economic downturn.
The Development of LAME
According to the LAME project web site, LAME was originally developed by Mike Cheng from 1998 to 1999. After he retied, Mark Taylor maintained the project until early 2003. The project was then taken over by a team of developers who have maintained and expanded LAME. This is a list of the developers and some of their contributions from the LAME web site:
Primary developers:
Robert Hegemann Tuning, optimizations,
psychoacoustics…
Alexander Leidinger Multiplatform
configuration, libraries handling, release management…
Rogério Brito Debian packaging,
debugging.
Primary developers – Retired:
Gabriel Bouvigne Tuning, optimizations,
psychoacoustics…
Takehiro Tominaga Psychoacoustics,
bitstream, optimizations, assembly code…
Mike Cheng Maintainer of LAME v2.x.
Frank Klemm Psychoacoustics,
optimizations.
Naoki Shibata Psychoacoustics (NSPsytune
model, NSSafeJoint).
Mark Taylor Maintainer of LAME v3.x,
initial implementer of GPsycho psychoacoustic model.
Additional developers:
Roberto Amorim Web pages and
documentation.
John Dahlstrom Adaptive ATH.
John Dee LAME extended VBR header.
Dominique Duvivier Speed optimizations.
Albert Faber Author of CDex and
lame_enc.dll.
Joseph Flynn LAME DirectShow Filter.
Peter Gubanov LAME DirectShow Filter.
Guillaume Lessard
Steve Lhomme LAME ACM codec.
Don Melton id3v1 and v2 code.
Darin Morrison Presets tuning.
Josep Maria Antolín Segura Documentation.
Kyle VanderBeek Python bindings, website
cleanup.
LAME is now hosted by sourceforge. The earliest release available on the sourceforge site is LAME 3.70 from October, 2001. The following table lists the size and scope of most major releases of LAME from 3.70 to the present.
Release  Date  Lines of Code (LOC)  Mythical Man Months (COCOMO) 
Notional Dollar Cost  
3.70 

18,573  51.4  $410,990  
3.90  20011221  49,897  145.3  $1,162,000  
3.92 

55,998  164.3  $1,314,900  
3.93.1 

61,249  179.8  $1,438,500  
3.95.1 

80,046  239.0  $1,912,300  
3.96.1 

81,336  243.1  $1,944,900  
3.97  20060924  84,907  254.4  $2,035,400  
3.98.4 

87,694  263.3  $2,106,000 
Table 2: Size and Scope of the LAME MP3 Audio Encoder Releases
Notes: The realease number and date are from the SourceForge download page. The lines of code is the total number of lines of code returned by running the free open source CLOC (Count Lines of Code) utility on the entire LAME release. The “mythical manmonths” is the number of manmonths of estimated actual effort using the “organic” versoin of the Basic COCOMO (Constructive Cost Model) software estimation model. Both the total lines of code and COCOMO are very rough ways of measuring the size and effort of a software project. The Notional Dollar Cost is computed by assuming that a “mythical man month” from COCOMO is 160 hours of paid effort at a rate of $50 per hour ($8,000 per month). Note that the lines of code, mythical manmonths, and notional cost are essentially cumulative numbers since the start of the project, which must include the mpg123 project from which some of the code is derived.
The total lines of code reported by the CLOC utlity combines lines of code in all programming languages: C, C++, assembler, HTML, the Bourne shell, and so forth. The actual MP3 encoder algorithm in LAME is implemented in the C programming language (mostly) and C++. Most of the code in releaes 3.70 and 3.90 is code of this type. The amount of C/C++ code increased dramatically from version 3.70 to version 3.90. Since then, most of the actual code development is in the Bourne shell, especially the Unix configure script for building and installing the LAME package.
A computer program is analagous in function to a mechanical clockwork device. In fact, clockmakers in the past built programmable robots and automatons comparable in function to simple electronic and computational devices. A line of code in a computer program is comparable to a moving part in a mechanical clockwork device, an engine, or something similar. Developing a computer program with 80,000 lines of code is comparable to designing a clockwork device or machine with 80,000 moving parts. A computer program of this size is probably more complex than a twentieth century automobile with no embeded computers and software.
Detailed Results of CLOC for the latest version of
LAME:(3.98.4)
C:\Documents and Settings\John F. McGowan\Desktop>cloc lame3.98.4 320 text files. 306 unique files. 124 files ignored. http://cloc.sourceforge.net v 1.53 T=34.0 s (5.6 files/s, 3510.5 lines/s)  Language files blank comment code  Bourne Shell 10 4308 5015 34297 C 49 5837 5359 28421 m4 2 809 58 7085 C++ 17 1904 2269 6969 HTML 11 1172 11 5264 C/C++ Header 67 1509 2835 4455 make 24 161 50 542 Teamcenter def 4 89 0 285 Pascal 1 50 75 178 Visual Basic 1 19 53 86 DOS Batch 3 11 45 45 Perl 1 5 17 33 CSS 1 3 0 21 XML 1 0 0 13  SUM: 192 15877 15787 87694 
One can see that the amount of Bourne Shell code now actually exceeds the lines of C code. LAME is very easy to install on Unix, Macintosh, and Windows computers. This is important for a successful widely used product.
Conclusion
The LAME MP3 audio encoder project has taken several calendar years. Most of the work appears to have been done between 1998 and 2004 (release 3.95.1). This includes both development and implementation of the core algorithm and a substantial amount of work on packaging and installing the encoder. It is difficult to estimate the actual effort (e.g. hours actually worked) of an open source project. Many projects are volunteer or partially volunteer. Some contributors may contribute more than full time, e.g. a student working 80 hours per week, while others may work work only a few hours per calendar week or even less.
Nonetheless, it is pretty clear that LAME in its current widely used, successful form is a substantial effort. It almost certainly could not be produced in a calendar quarter or less. A full time paid team of professional software developers would probably take anywhere from six months (a very aggressive target) to a few years to produce a comparable MP3 audio encoder from scratch.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at UrbanaChampaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Sponsor’s message: Math Better Explained is an insightful ebook and screencast series that will help you deeply understand fundamental mathematical concepts, and see math in a new light. Get it here.
In the article, it says, “One occasionally encounters more advanced mathematical methods. Specific applications include image, audio, and video compression, speech recognition, the Global Positioning System (GPS), risk models in insurance and other financial fields, and many other specialized technical applications.”
There is another widely used specialized technical application worth mentioning: cryptography. The mathematics of strong public or private key cryptography is definitely far beyond highschool math, even if they did cover modular arithmetic at your high school.
Great post. I’m planning to plot the downloads with dates to perform some time series analysis. There definitely seems to be some linear increase in the time series. I have a quick question; do the download values in the table represent current downloads of that version or was that version erased when the new version became available?
One can get the download statistics for LAME by going to the link:
http://sourceforge.net/projects/lame/files/lame/
One will see a table with headings
Name Modified Size
3.98.4 20100414
In Safari on the Mac I see a graphic under the size column. Click on this graphic to get to the downloads page for that release. In some browsers with some configurations, the graphic may be a number of downloads.
The downloads page that one reaches by clicking the graphic gives detailed time histories of the downloads for that version. One can choose a timer period of interest and see both a time history during that period and total downloads. By default it appears to show downloads for the last week. One can choose a different range to see the downloads for a different period, such as since the start of the project.
Sincerely,
John
The numbers in the LAME Download table (Table 1) appear to be cumulative downloads for each release through Saturday, February 5, 2011.
The numbers were displayed in FireFox on a Windows XP PC used to access the SourceForge LAME page on February 5, 2011. The FireFox display appears to differ from the Safari/Mac display.
The method for viewing downloads described in my previous follow up post appears to show cumulative downloads numbers with a monthly resolution. In other words, if one selects a date range from 2000 to 20110205 (Feb. 5, 2011) one will get the cumulative total number of downloads through the end of January 2011 as well as a table of downloads for each month from the start of 2000 through to January, 2011.
Sincerely,
John
Thanks for the follow up!