Using Octave, a Free MATLAB Alternative

Introduction

Octave is a free, both free as in beer and free as in speech, MATLAB compatible numerical programming tool available under the GNU General Public License. MATLAB, an abbreviation for “Matrix Laboratory”, is currently the most widely used commercial, proprietary numerical programming tool. Even a single license for the MATLAB software is a substantial cost. MATLAB is essentially a scripting language similar to Perl, Python, or PHP with a comprehensive, highly integrated set of numerical, statistical, mathematical, and graphical functions including, for example, the Fourier transform, matrix inversion, and histograms. In part because MATLAB has become the de facto industry standard for numerical programming, Octave is of particular interest to individuals, companies, and organizations engaged in numerical and mathematical programming and research and development.

At this time (2011), there is considerable interest in using machine learning and other advanced mathematical techniques to increase sales by detecting and predicting the buying patterns of end users of mobile devices such as iPhones and Android phones, social networks such as Facebook and LinkedIn, and search engines such as Google and Yahoo. Octave has long been used for data analysis, modeling, and machine learning. In addition to its extensive built-in features, Octave has add on packages for optimization and mathematical model fitting, econometrics, signal processing, image processing, neural networks, and a number of other areas used in the “mobile social search” space. Octave is also well suited for the development and implementation of algorithms for image and video compression, audio compression, image and video processing, audio processing, speech recognition, pattern recognition, artificial intelligence, design and simulation of machines, and many other practical uses.

In general, it is faster and easier to develop mathematical algorithms in Octave and similar tools than compiled languages such as C, C++, and Fortran. Because research and development usually involves large amounts of trial and error, Octave and similar tools have been used traditionally to prototype algorithms, analyze data from experiments, and so forth, and only occasionally for “production code” or commercial products. Traditionally, once a proof of concept existed in Octave or a similar tool, it was often translated to a faster compiled language such as C, C++, or Fortran, often a time consuming, tedious, and costly process. Octave today is quite fast and modern computers are extremely fast and inexpensive, so that there are many projects, such as data mining in the backend of a web site, that can probably skip the time consuming and costly step of conversion to a compiled language. In the future, just in time (JIT) compilers and similar capabilities will probably be added to Octave and similar tools so that the remaining speed difference between Octave and traditional compiled languages such as C, C++, and Fortran will vanish.

A significant problem with using Octave and many other free programs is making Octave work cost-effectively with other programs. It is rarely possible to use Octave for real-world projects as a purely standalone program. It is often necessary to share data and transfer the control flow between Octave and other programs including operating systems, databases, spreadsheets, web sites, and so forth. Many programs, especially proprietary programs, achieve customer lock-in through custom, sometimes secret, data formats and so forth. Microsoft Office is notorious for introducing new Office file formats with each “upgrade” that are not backward compatible with earlier versions of Office. Thus, users find themselves receiving spreadsheets, documents, and so forth that they cannot read with their old version of Office, forcing an “upgrade” at substantial cost.

This proliferation of ever-changing data formats, API’s, programming languages, and miscellaneous other standards, presents a continual problem for using free software. End users often spend hours, even days or weeks, figuring out how to convert from one format to another. Software developers often spend hours, days, weeks, and even months writing functions to read and write unsupported formats, whether new or old. This makes it difficult for end users or software developers to add real value by inventing substantive new features, capabilities, and algorithms.

This article covers a number of methods, specific commands and specific programs to make Octave work cost-effectively with other programs. There are many excellent on-line and published documents on the syntax of Octave and its add-on packages. Readers with specific questions of this type are referred to the excellent existing documentation. Rather, this article discusses some of the issues with making Octave work with other programs that are often not covered explicitly or well in the standard Octave or free software literature.

Using Simple Human Readable Files to Reduce Costs

There are a number of advantages to using simple human readable files (typically ASCII or UTF8) such as standard tab delimited files or comma separated values (CSV) files to export data, import data, and store data. One can view the files in any text editor or word processor. Most programs such as spreadsheets, databases, and many others can easily read and write tab-delimited and comma separated values (CSV) files. Many scripting languages have direct support for tab-delimited and sometimes comma separated values files. Essentially all computer programming languages can easily read and write tab-delimited files; it is just a few lines of code. Essentially all computer programming languages can read and write comma separated values files; it requires more programming, more lines of code, but it can be done. Octave can read and write tab delimited files using its built-in dlmread and dlmwrite functions. Octave can also read and write comma separated values files using its built-in csvread and csvwrite functions.

Historically, disk drives were small. Networks had very limited bandwidth. Computers had limited speed to read and process files. Hence, binary file formats had very clear advantages over bulkier human readable formats such as tab delimited files. With terabyte drives, 100 Megabit per second networks, and inexpensive 3 GHz multi-core computers, this is much less true today. Thus, there is a strong case for using simple human readable file formats such as tab delimited files to reduce high end user and development costs, avoid customer lock-in (a costly proposition if you are the customer), and limit the costly perpetual upgrade cycle to genuine improvements in the products and services.

Simple human readable files, notably tab delimited files, enable users and developers to share data between Octave and other programs quickly and easily. There are a few caveats to be aware of.

First, the basic data type in Octave is a “matrix,” originally a double precision two-dimensional array. The MATLAB/Octave “matrix” has been extended to multiple dimensions and also to support characters and strings. The character/string support in MATLAB and Octave is something of a patch. Thus, the dlmread and dlmwrite commands in Octave handle matrices or arrays of numbers fine but map character strings in tab-delimited files to zero (0). This is often not an issue for the numerical programming and analysis that Octave is usually used for. The string and text manipulation capabilities in Octave are limited, so one should consider other tools such as Perl, Python, or Ruby for heavily text or string oriented work.

The precision of floating point numbers can be an issue using human readable files such as tab delimited files that represent floating point numbers as character strings such as “1.123456” or “1.1234567890123456”. Octave, MATLAB, and most computer programs use the IEEE-754 standard for floating point numbers, or extensions of this standard. In this standard, a single precision floating point number (e.g. a float in the C family of programming languages) requires six decimal digits after the decimal point to represent the floating point number as a character string such as “1.123456”. An IEEE-754 double precision floating point number requires 16 decimal digits after the decimal point to represent the double as a character string such “1.1234567890123456”.

Many programs and programming languages default to writing floating point numbers as character strings with six decimal digits such as “1.123456”. For example, the standard printf() function in the C programming language defaults to six decimal digits of precisoin when its %f format specifier for floating point numbers is used:

double value = 1.12345678901234456;
print("my number: %f\n", value); // defaults to six decimal digits of precision

usually prints:

my number: 1.123456

Thus, if a mathematical algorithm in Octave depends on double precision, it may be necessary to prepare a human readable input data file with 16 decimal digit character strings for the floating point numbers.

In Octave, one can use the command format long to display numbers at full precision:

Octave>format long

Exchanging Binary Files Between Octave and Other Programs

Octave supports a relatively small set of binary file formats directly. This is a common problem with free software. Both users and software developers can find themselves spending hours, days, even weeks on occasion (time is money) converting to and from other binary formats used by other programs. There are however a number of free programs that can quickly convert between a wide range of binary formats, thus greatly reducing or eliminating this overhead.

The free GIMP (The GNU Image Manipulation Program) program can read and write nearly all widely used still image file formats and many obscure still image formats. GIMP is available as both pre-compiled binaries and source code for all three major computing platforms: Microsoft Windows, Linux/Unix, and Mac OS X.

The widely used, free Audacity audio-editing program can read and write raw audio files, Microsoft uncompressed WAV files, and the open Ogg Vorbis audio file format. By default Audacity can read MP3 audio files but it cannot write them due to the onerous MP3 licensing restrictions. Users can install the LAME MP3 encoding library on all three major computing platforms — Microsoft Windows, Linux/Unix, and Mac OS X — and then Audacity can use this add on library to encode audio as MP3. Audacity is available as source code and precompiled binaries for all three major computing platforms.

The widely used, free sox (Sound Exchange) audio conversion utility can read and write a broad range of audio formats. Like Audacity, it needs the LAME encoding library to be separately installed to encode MP3 files due to the MP3 licensing restrictions. Users should consider using the free ogg vorbis audio format to avoid the MP3 licensing issues altogether. sox is available as both source code and precompiled binaries for all three major computing platforms.

The widely used, (mostly) free ffmpeg video and image sequence conversion utility can read and write a broad range of video file formats. Like Audacity and sox, certain video encoding formats have licensing restrictions. Again, users may wish to consider the free ogg theora video format to avoid some of the licensing issues with better known video formats such as h.264.

Installing Octave Add-On Packages

Octave has extensive built-in capabilities. It also has a large number of add-on packages available through the Octave Forge project and web site. These include an extensive library of standard optimization functions such as the Nelder-Mead method used in model fitting, statistics, signal processing, econometrics, and so forth. Some of the Octave Forge packages are rather limited, but a number are very extensive and a number extend Octave’s compatibility with MATLAB.

For example, MATLAB has a function xcorr which computes the cross-correlation of a data series or data set. Octave does not have xcorr in its base set of built-in functions. There are a number of implementations of xcorr on the web as xcorr.m files, but xcorr is also found in the Octave Forge signal package which provides a large number of standard signal processing functions.

The author found the explanations of the Octave pkg install procedure on the Octave and Octave Forge web sites a bit confusing. Here is a fuller explanation. Octave packages are downloaded as standard *.tar.gz gzipped (compressed) Unix tar (tape archive) files. The Unix tar (tape archive) command combines a group of files and/or folders and their contents into a single huge file without compressing the files. They are loosely concatenated together. Historically, the tar command was used to store files, folders, even entire file systems on backup tapes. The Unix gzip utility compresses files. On Unix, a *.tar.gz or *.tgz file is a file that has been tarred and then compressed using gzip. The Octave packages are distributed as *.tar.gz files.

Octave has a built in command pkg install which installs the packages. NOTE: pkg install works directly on *.tar.gz or *.tar files. There is no need to uncompress or unpack the *.tar or *tar.gz files. On the Macintosh, the Mac OS may automatically unzip the file when downloaded, creating a *.tar file.

In Octave, simply switch the folder where the *.tar.gz or *.tar file has been downloaded, then type

Octave> pkg install package-name.tar.gz

or

Octave> pkg install pakcage-name.tar

if the operating system has automatically unzipped the package.

The Octave pkg install command will automatically create the folders for the new package in the Octave installation and copy the package files to these folders and perform all other setup.

Many Octave packages contain C, C++, or Fortran source code. Octave automatically installs a compiler when it is installed. It can compile these files. There is no need for the Octave user to compile them by hand, run the make utility or anything like that. Octave has a built-in mkoctfile command to compile C, C++, or Fortran extensions to Octave and add them to the built-in commands of Octave. The Octave pkg install command will run mkoctfile if needed to build and install the package. On the Macintosh (see below) it may be necessary to modify the mkoctfile command to install many of the Octave packages.

Extending Octave with Fast, Compiled Languages

Developers may want to extend Octave by adding C, C++, or Fortran programs to the built-in functions of Octave. This may be done because the compiled programs will be faster to execute than Octave or to add existing C, C++, or Fortran code to Octave.

mkoctfile [-options] file

The mkoctfile function compiles source code written in C, C++, or Fortran. Depending on the options used with mkoctfile, the compiled code can be called within Octave or can be used as a stand-alone application. mkoctfile can be called from the shell prompt or from the Octave prompt.

Making Octave Work with Emacs

Emacs is a widely used free source-code and text editor. Emacs is available on all three major computing platforms: Microsoft Windows, Unix/Linux, and Mac OS. The author has had good experiences using the free Aquamacs version of Emacs on the Macintosh. Emacs has an “everything including the kitchen-sink” philosophy. Emacs has extensive built-in capabilities. There is extensive and excellent on-line and published documentation on Emacs. Again, this article will only discuss some key gotchas that are likely to arise using Emacs in combination with Octave.

Emacs has a MATLAB mode for editing MATLAB or Octave source code. The MATLAB code editing mode handles indenting, highlights keywords, and so forth. Emacs also has an Objective C mode. Since Objective C “method” or “message” source files also use the .m file extension used by MATLAB and Octave, this can cause problems. There is also an Objective C mode for Emacs. The author has encountered versions of Emacs configured to use the Objective C mode for .m files and other versions of Emacs configured to use the MATLAB mode for .m files.

The Emacs user can explicitly switch to the MATLAB mode by typing Meta-X where the mysterious Emacs Meta key is usually the Escape (Esc) key on most keyboards and then typing matlab-mode at the Emacs command prompt.

If the matlab mode is not preinstalled in the user’s version of Emacs, one can get an Emacs MATLAB mode add on package from the web, for example at https://www.mathworks.com/matlabcentral/fileexchange/104

Emacs can be reconfigured to use the MATLAB mode automatically for .m files instead of the Objective C mode if needed.

As discussed, in addition to source code, it is often necessary to share data between Octave and other programs. Simple human readable file formats such as the common tab delimited file format or simple uncompressed binary file formats such as Microsoft WAV audio files are often the easiest and most cost-effective way to do this. Working with files of this type sometimes requires examining the files, for example to detect or rule out errors such as non-printing characters or incorrect header information. Emacs can be used to do this.

Emacs has a couple of special modes that are useful for examining data files and sometimes source code. Emacs has a hexadecimal viewer/editor mode that can be invoked by typing hexl-mode at the Emacs command prompt. The Emacs hexl-mode is useful for examining binary file formats and occasionally human readable formats. Emacs has a special whitespace mode that displays whitespace characters such as tab, carriage return, newline, and so forth. Type whitespace-mode at the Emacs command prompt. This can useful for examining human readable data files and also source code files where the whitespace characters may be incorrect or confusing.

Making Octave Work on the Macintosh

There are a couple of gotchas to making Octave work on the Macintosh platform. By default, on MacOS X, Octave is not set up to display plots through the X server on the Macintosh. It is necessary to set an environment variable so that Octave can display plots. This is done by entering the following command at the Octave prompt.

Octave> setenv GNUTERM 'x11'

By default, Octave uses the gnuplot package, which it installs if needed, to display plots. This environment variable tells gnuplot to use ‘x11’, the X Windows server on the Macintosh, to display plots.

As discussed above, Octave has a large number of add-on packages available through the Octave Forge web site. By default, a number of these packages do not build and install correctly on MacOS. One gets a cryptic error. This is because the mkoctfile command in Octave defaults to building 64 bit versions on the Macintosh. This can be fixed by adding the lines:

CFLAGS="-m32 ${CFLAGS}"
FFLAGS="-m32 ${FFLAGS}"
CPPFLAGS="-m32 ${CPPFLAGS}"
CXXFLAGS="-m32 ${CXXFLAGS}"
LDFLAGS="-m32 ${LDFLAGS}"

in the file Octave.app/Contents/Resources/bin/mkoctfile-3.2.3 just after the “set -e” line. The problem is that in Snow Leopard (code name for a recent version of Mac OS X) compilers try always to build in 64bit, while the libraries shipped with Octave are 32 bit. The -m32 flag forces the compiler to build 32 bit programs.

Screen Capture with Octave

Using Octave or similar programs often involves making plots, histograms, and other graphics that the user wants or needs to share with others, attach to e-mails, embed in publications, and so forth. One quick way to do this on Microsoft Windows and Macintosh platforms is through the built-in screen capture capabilities of the operating system and the gnuplot package used by Octave.

On Microsoft Windows, there is a “Print Screen” button on keyboards that will capture the entire screen to the MS Windows clipboard. One can then use GIMP or many other image editors to create a file from the contents of the clipboard. One can then edit the image file to select desired regions and so forth. On MS Windows, the gnuplot display window has a window capture feature. One can type Control-C in the plot window to capture the plot window to the Windows clipboard. One can also select the plot window menu by clicking the icon in the upper left corner of the plot window, then “Options” from the menu, and then “Copy to Clipboard” from the Options menu.

The Macintosh has a number of keyboard shortcuts to capture either the entire screen or a region of the screen to the clipboard or to a file on the Macintosh desktop. These keyboard shortcuts are (Apple Command)-Shift-3 to capture the entire screen to a file on the Desktop, (Apple Command Key)-Shift-4 to capture a region to a file on the Desktop, (Apple Command Key)-Control-Shift-3 to capture the entire screen to the Macintosh clipboard, and (Apple Command Key)-Control-Shift-4 to capture a region to the Macintosh clipboard. The region commands turn the mouse cursor into a cross-hairs icon; the user can select a region to capture by clicking and dragging on the screen with the cross hairs.

Conclusion

Octave is free, both free as in beer and free as in speech. Octave has many virtues, notably that it is mostly compatible with MATLAB, which is currently the de facto standard for numerical and mathematical programming. Octave can be used to quickly develop software for a wide range of practical problems including the prediction of buying patterns, image and signal processing, pure and applied scientific research, and the invention and design of new machines, to name only a few. With the tools, methods, and commands discussed above, it is possible to make Octave work quickly and cost-effectively with a wide range of other programs on all three major computing platforms: MS Windows, Linux/Unix, and Macintosh.

© 2011 John F. McGowan

About the Author

John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].

Sponsor’s message: Check out Math Better Explained, an insightful ebook and screencast series that will help you see math in a new light and experience more of those awesome “aha!” moments when ideas suddenly click.

6 Comments

  1. Gustav January 25, 2011
  2. Matt J. January 25, 2011
  3. John F. McGowan January 27, 2011
  4. Ajit Banerjee September 2, 2011
  5. Mike M. Ambiti October 7, 2012

Leave a Reply