An associative array, also known as a dictionary, map, mapping, or hash table, is a powerful data structure that is built into many modern programming languages such as Python, Perl, Ruby, and many others. An associative array is a form of content addressable memory (CAM). For example, when you see someone’s face, you often remember many other facts. This is John Smith. John Smith is your neighbor and lives at 123 Elm Street. John’s wife is Amanda who has long blond hair and so on. As associative array associates an object, often called a key, with another object, often called a value.
In many programming languages the key is a string of characters such as "John Smith"
and the value is another string such as "Address:123 Elm Street;Wife:Amanda"
and so forth. In some object oriented computer programming languages, the key can be any kind of object and the value can be any kind of object such as the key <John Smith's Face Object>
and the value <John Smith's Identifying Information Object>
. An associative array is largely equivalent to a single table in a relational database (RDBMS). In principle, a network of associative arrays can represent complex abstract knowledge and reasoning.
Technically, associative arrays are usually implemented using a hash table. A hash table uses modulo arithmetic to map an object such as a string of characters to the numerical index of an array of values. This array of values is not a simple one dimensional array because there can be collisions where two keys, objects such as strings of characters, “hash” to the same array index. If this happens, there are various methods such as hanging a linked list of elements off the array to handle the collision. Using a hash table means that the time to look up a value is almost constant (O(1)) regardless of the size of the hash table.
The hash table may have millions of entries, but it takes the same small number of operations to look up the associated value. First, compute the array index using modulo arithmetic on the numeric value of the “key”. Second, handle any collisions. The hash table should be large enough that collisions are rare. In principle, an associative array could be implemented in other ways, but hash tables of some kind are generally the fastest, most flexible way to implement an associative array. Hence, the terms associative array, dictionary, map, mapping, and hash table are often used interchangeably.
Octave is a free, open-source numerical programming environment that is mostly compatible with MATLAB. Octave is largely built around matrices (two dimensional arrays) and N (2,3,etc.) dimensional arrays. By default, matrices and N-D arrays in Octave are of the type double (usually a 64 bit IEEE-754 double precision floating point number).
This is due to the history of MATLAB (short for Matrix Laboratory) which started life as an interactive environment for performing two dimensional matrix algebra and computations. At first glance, Octave lacks associative arrays which is a significant deficiency for some types of programming including some types of mathematical programming. Octave does, in fact, have associative arrays. This article shows how to use associative arrays in Octave and use associative arrays to implement cellular automata, a type of discrete mathematical model.
Associative Arrays in Octave
While Octave lacks an explicitly identified associative array, dictionary, map, mapping, or hash table data type, Octave does have data structures or structs similar to structures in the C family of programming languages. For example, in Octave one can define a structure interactively like this:
octave-3.2.3:2> a.key = 'value' a = { key = value } octave-3.2.3:3> a.key2 = 'value2' a = { key = value key2 = value2 }
The structure a now has two fields “key” and “key2” with values ‘value’ and ‘value2’. In Octave, the data structures are implemented using a hash table and can act as a fully functional associative array. Octave provides several functions to access and manipulate structures, making it easy to use a structure in Octave as an associative array.
Built-in Function: struct ("field", value, "field", value,...) Built-in Function: isstruct (expr) Built-in Function: rmfield (s, f) Function File: [k1,..., v1] = setfield (s, k1, v1,...) Function File: [t, p] = orderfields (s1, s2) Built-in Function: fieldnames (struct) Built-in Function: isfield (expr, name) Function File: [v1,...] = getfield (s, key,...) Function File: substruct (type, subs,...)
All of these functions are useful. Most important for the purposes of this article are struct("field", value, "field", value,...)
which creates a data structure explicitly, setfield(structure_name, key, value,...)
which assigns a value to a key, and getfield(structure_name, key)
which retrieves the value associated with key. These are used in the examples in this article to implement cellular automata.
Cellular Automata
A cellular automaton (plural, cellular automata, sometimes abbreviated as CA) is a discrete mathematical model. A typical cellular automaton consists of a grid of cells with one or more dimensions. Often, the cells have two possible values, 0 and 1, which are often displayed as black and white pixels graphically. The cellular automaton evolves over time in discrete time steps. With each time step or update, a cell changes to 0 or 1 based on the values of the cells in its neighborhood. A cellular automaton has a rule that specifies how it updates.
Cellular automata have been used in entertainment (they make pretty pictures), mathematics, physics, and a number of other fields. Probably the most well known cellular automaton is the “Game of Life”, a two dimensional cellular automaton with many entertaining and interesting properties invented by the mathematician John Conway in the 1970’s.
Stephen Wolfram, the creator of the Mathematica computer algebra system and mathematical research tool, has had a long standing interest in cellular automata. His book, A New Kind of Science, speculates that the universe might be a sort of cellular automaton and be “computationally undecidable” (in layman’s terms, math and science don’t have all the answers). Matthew Cook, who assisted Wolfram in the research for A New Kind of Science, proved that a particularly simple cellular automaton known as “rule 110” can function as a general purpose computer just like more complex systems such as the Pentium computer chip. An implementation of the “rule 110” cellular automaton is one of the examples in this article.
The rule for a cellular automaton can be easily represented as an associative array that maps each possible neighborhood to a new value. This is very simple and intuitive. It is easy to implement cellular automata in programming languages with built-in associative array data types. This is illustrated in Octave in the examples in this article.
automata.m
% Description: % % implementation of rule 30 and rule 110 cellular automata using an % associative array in Octave % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % universe_size = 600; seed_start = universe_size/2; rule30 = struct("III", "O", "IIO", "O", "IOI", "O", "IOO", "I", ... "OII", "I", "OIO", "I", "OOI", "I", "OOO", "O"); % use Octave struct as an associative array (also knowns as dictionary or mapping or hash table) myseed = repmat('O', 1, universe_size); myseed(seed_start) = 'I'; myimage = simulate_ca(myseed, rule30, seed_start); figure(1); imshow(myimage); title('Rule 30 Cellular Automaton'); print('rule30.jpg'); % rule 110 % proven to be Turing Complete % rule110 = struct("III", "O", "IIO", "I", "IOI", "I", "IOO", "O", ... "OII", "I", "OIO", "I", "OOI", "I", "OOO", "O"); myimage = simulate_ca(myseed, rule110, seed_start); figure(2) imshow(myimage); title('Rule 110 Cellular Automaton'); print('rule110.jpg'); disp('ALL DONE');
which uses the function simulate_ca defined in the following code:
simulate_ca.m
function [result] = simulate_ca(myseed, rule, niter) % simulate_ca(myseed, rule, niter) % % simulate <niter> iterations of a 1D cellular automaton specified by <rule> % rule starting with the seed <myseed> % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % rule_name = inputname(2); update = repmat('O', 1, length(myseed)); previous = myseed; nx = length(myseed); ny = niter + 1 result = zeros(ny, nx, "uint8"); lut = ones(1, 26)*255; % white background lut(9) = 0; % map I to min gray scale level printf("rule: %s\n", rule_name); fflush(stdout); printf("%s\n", previous); fflush(stdout); index = uint8(previous) - uint8('A') + 1; binary = lut(index); result(1,:) = binary; for iter = 1:niter % iterations for i = 1:length(myseed)-3 % loop over cells key = previous(i:i+2); update(i+1) = getfield(rule, key); end % loop over cells printf("%s\n", update); fflush(stdout); previous = update; index = uint8(previous) - uint8('A') + 1; binary = lut(index); result(iter+1,:) = binary; end % iterations end % function simulate_ca
The Timing of Associative Arrays in Octave
The code below tests the timing of associative arrays in Octave. As expected if a hash table is used to implement an associative array, the lookup time is largely independent of the size of the associative array in Octave, which is good. As with many features in Octave and other mathematical scripting tools, the lookup time is quite slow.
For example, on a 3GHz Macintosh running OS X, the lookup time varied between 1 and 10 milliseconds. This is much slower than a compiled implementation of a hash table in the C programming language or another fast compiled language. Although languages such as Octave are slowly closing the speed of execution gap with compiled languages such as C, the compiled languages still win handily in some cases.
dictionary_time.m
% % measure timing of associative arrays in Octave % demonstrate lookup time is not slowed by number of keys (fields) % in the associative array/dictionary. Works like a hash table. % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % % clear small_dict; small_dict = struct('key', 'value'); % make sure a is defined clear big_dict; big_dict = struct('key', 'value'); % make sure a is defined offset = uint8('A') -1; n = 3; if(n > 26) % 26 characters in English alphabet n = 26; end [total, user, system] = cputime(); for i = 1:n for j = 1:n for k = 1:n key = char([i j k] + offset); value = ['value' key]; % printf("key: %s value: %s\n", key, value); small_dict = setfield(small_dict, key, value); end end end [total1, user1, system1] = cputime(); printf("CPU TIME: %f\n", total1 - total); printf("creating big dictionary\n"); fflush(stdout); % create big dictionary (associative array) n = 26; if(n > 26) % 26 characters in English alphabet n = 26; end [total, user, system] = cputime(); for i = 1:n printf("creating fields starting with %s\n", char(i + offset)); fflush(stdout); for j = 1:n for k = 1:n key = char([i j k] + offset); value = ['value' key]; % printf("key: %s value: %s\n", key, value); big_dict = setfield(big_dict, key, value); end end end [total1, user1, system1] = cputime(); printf("CPU TIME: %f\n", total1 - total); % measure access time [total, user, system] = cputime(); blatz = getfield(small_dict, 'AAA'); [total1, user1, system1] = cputime(); printf("%f seconds to retrieve %s for AAA from small dict\n", total1 - total, blatz); [total, user, system] = cputime(); blatz = getfield(big_dict, 'AAA'); [total1, user1, system1] = cputime(); printf("%f seconds to retrieve %s for AAA from big dict\n", total1 - total, blatz); disp('ALL DONE');
Conclusion
There are associative arrays, also known as dictionaries, maps, mappings, or hash tables, in Octave. This is poorly documented both in the official documentation and most online information about Octave. One can perform the same tasks and implement the same algorithms with the associative arrays (data structures or structs) in Octave that one can with the explicitly identified associative array data types in Perl, Python, Ruby, Java, and many other modern programming languages. Associative arrays are very useful for implementing certain kinds of mathematics in Octave such as, but not limited to, cellular automata.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Admin’s message: Looking for some great mathematical reads? Check out our list of must-read math books.
Very interesting. Nice crash course about associative array 🙂
Agreed. I love John’s series of articles on Octave.