Protecting Valuable Intellectual Property in Octave

Octave is a free, open-source high-level interpreted language, primarily intended for numerical computations that is mostly compatible with MATLAB. Octave is an excellent tool for the rapid research and development of new algorithms as well as performing simulations and data analysis. A mathematical software developer can often prototype a new algorithm in Octave two to three times faster than in a compiled programming language such as C or C++. Octave is free both as in beer and as in speech unlike MATLAB. Anyone can download Octave and run an Octave program at no cost on the three major computing platforms: MS Windows, Mac OS X, and other forms of the Unix operating system. Because Octave is open-source, there is much less concern that the vendor will suddenly cease support as Microsoft did with Visual FoxPro or redesign the language into something unusable in order to sell yet another “upgrade.” End users can always build the language from source and create a development “fork” that preserves the compatibility with existing code and the elegance of the original language.

The Problem

A major problem with Octave and many other scripting languages is that it is an interpreted, human-readable scripting language. Potential and actual customers and other third parties can see what is being done in detail. It is easy to reverse engineer or steal programs and algorithms written in scripting languages such as Octave.

Imagine that you are small company operating on a shoe string budget in a loft in West Hollywood that has developed a breakthrough video special effect in Octave. You want to win a contract from a Hollywood movie studio to do the effect in the next blockbuster science fiction movie starring Angelina Jolie and Brad Pitt as quarreling lovers caught in an alien invasion. The famous Hollywood movie studio wants to evaluate the algorithm in-house, make sure you are not cheating with Photoshop on the glamor shot of Angelina in a skin-tight black leather jumpsuit that they sent you. The problem is that the famous Hollywood studio that you are pitching to would steal your algorithm in a microsecond if they could. You are confronted with the cost, time, and general difficulty of converting your hot new video special effect algorithm into a compiled language such as C or C++. Meanwhile your competitors at Really Cool FX in Pasadena may come out with the same algorithm while you are struggling to convert it to C or C++.

You could be a quantitative finance wizard operating out of a poorly ventilated office in Jersey City, New Jersey with a spectacular view of scenic downtown Jersey City visible through your tiny west facing window. You would like to sell your hot new nanosecond trading algorithm to a Too Big Too Fail bank so you can move to a plush well ventilated corner office across the Hudson River in New York City’s financial district, but the bank insists they must thoroughly evaluate the algorithm in-house. Probably enough said right there.

You might be an idealistic junior faculty member at a prestigious, but very low paying major research university in San Francisco. You have developed the breakthrough algorithm in quantitative biology that will cure cancer — in Octave. Now, you are completely above crass materialistic concerns and plan to follow the illustrious example of Jonas Salk in refusing to patent the polio vaccine :-), donate regularly to the Free Software Foundation, and have an autographed poster of Richard Stallman in your tiny cramped office, but nonetheless you would like to get tenure and move out of your landlady’s attic. You know full well that the eminent full professor down the hall who got passed over for last year’s Nobel Prize would steal your idea in a picosecond if he could; it is common knowledge in the department that his didn’t-quite-get-the-Nobel-Prize work was actually stolen from his former graduate student who is now driving a taxicab in New York City. How do you demonstrate your breakthrough algorithm without giving away the secret and get tenure?

The Solution

Fortunately, one can obfuscate Octave code, removing nearly all human-readable information, much as a compiler does when it translates a program written in C or C++ into a machine-readable binary executable. This raises the bar for stealing your ideas and algorithms considerably. In general, code obfuscation removes all comments, indentation and other formatting that clarifies what is going on, and replaces all human readable variable and function names with random strings of characters that convey no meaning to a human reader. Note that the human readable information is completely removed from the obfuscated code. Some schemes to protect programs written in scripting languages use encryption. The program is encrypted but if someone can find or determine the encryption key, they can recover the entire original program including comments, human-readable names, and so forth.

A Simple Example

This is a simple script in Octave.

mytest.m

% test script

disp('hello world'); % test comment

myflag = 1;

printf(\
"this is a \
test\n");
fflush(stdout);

myflag = myflag + 1;
myflag2 = myflag++;
printf("myflag2 is %d\n", myflag2);
fflush(stdout);

if flag > 1
disp('hi');
else
disp('no');
end

for counter = 1:10
disp(counter); % test
end

pivalue = pi;
disp(pivalue)

disp('ALL DONE');

This script generates the following output under Octave 3.2.4 running on a Windows XP Service Pack 2 PC:

octave-3.2.4.exe:18> mytest
hello world
this is a test
myflag2 is 2
no
1
2
3
4
5
6
7
8
9
10
3.1416
ALL DONE

Here is an obfuscated version of the same Octave script generated by an obfuscation function written by the author in Octave:

mytest_obfuscated.m

disp ( 'hello world' ); ; UQWSKDTZQWRO=1 ; ; printf ( "this is a test\n" ); ; fflush ( stdout ); ; UQWSKDTZQWRO=UQWSKDTZQWRO+1 ; ; BSJRZMSBRYXD=UQWSKDTZQWRO++; ; printf ( "myflag2 is %d\n" , BSJRZMSBRYXD ); ; fflush ( stdout ); ; if flag>1 ; disp ( 'hi' ); ; else ; disp ( 'no' ); ; end ; for RBVZQAHJSNWB=1:10 ; disp ( RBVZQAHJSNWB ); ; end ; VIENISLJPENX=pi ; ; disp ( VIENISLJPENX ) ; disp ( 'ALL DONE' ); ;

Note: On a Windows PC using Firefox, one can select the obfuscated code above by selecting the first few characters at the start of the line above (e.g. disp) and then hitting Shift-End on the keyboard. Then copy and paste to Octave to run the obfuscated code.

This script generates the following output (the same as the original script) under Octave 3.2.4 running on a Windows XP Service Pack 2 PC:

octave-3.2.4.exe:22> mytest_obfuscated
hello world
this is a test
myflag2 is 2
no
1
2
3
4
5
6
7
8
9
10
3.1416
ALL DONE

Note that the reserved keywords such as “if” and built-in Octave functions such as “printf” are not obfuscated. It is actually possible to make the obfuscated code even more unreadable than the example above. This is intended as a simple illustration. The obstacles to reverse engineering and theft introduced by code obfuscation are greater for longer programs and more complex algorithms.

Conclusion

A major problem with Octave and other scripting languages is that it is easy for potential or actual customers or other third parties to reverse engineer or steal algorithms or other sensitive information from a program written in a human readable scripting language. This can be a serious problem for algorithm developers using Octave. This is much less of a problem with compiled languages such as C or C++ in which, however, it is usually slower and more costly to develop algorithms than Octave. Compilers generate unreadable binary files which are difficult to reverse engineer (not impossible).

Computer programs can obfuscate Octave code, automatically removing human readable information such as comments, variable and function names, indentations, and so forth. This is very close to the same information that is removed by compilers when they convert a program written in a compiled programming language such as C or C++ to a binary executable. In some ways, this is more secure than encrypting the code since the information is actually removed entirely from the obfuscated code; the encryption can be broken, often by simply stealing the encryption key. Code obfuscation raises the bar substantially for reverse engineering or stealing an algorithm or other critical intellectual property implemented in Octave. The same comments apply to other scripting languages such as Python, Perl, and Ruby.

© 2011 John F. McGowan

About the Author

John F. McGowan, Ph.D. solves problems by developing complex algorithms that embody advanced mathematical and logical concepts, including video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.

5 Comments

  1. liubenyuan June 28, 2011
  2. John McGowan June 28, 2011
  3. Kaan Öztürk June 30, 2011
  4. Alejandro July 5, 2011
  5. Code October 17, 2012

Leave a Reply