This article shows how to build a simple recommendation engine using GNU Octave, a high-level interpreted language, primarily intended for numerical computations, that is mostly compatible with MATLAB. A recommendation engine is a program that recommends items such as books and movies for customers, typically of a web site such as Amazon or Netflix, to purchase. Recommendation engines frequently use statistical and mathematical methods to estimate what items a customer would like to buy or would benefit from purchasing.
From a purely business point of view, one would like to maximize the profit from a customer, discounted for time (a dollar today is worth more than a dollar next year), over the duration that the customer is a customer of the business. In a long term relationship with a customer, this probably means that the customer needs to be happy with most purchases and most recommendations.
Recommendation engines are “hot” right now. There are many attempts to apply advanced statistics and mathematics to predict what customers will buy, what purchases will make customers happy and buy again, and what purchases deliver the most value to customers. Data scientists are trying to apply a range of methods with fancy technical names such as principal component analysis (PCA), neural networks, and support vector machines (SVM) — amongst others — to predicting successful purchases and personalizing recommendations for individual customers based on their stated preferences, purchasing history, demographics and other factors.
This article presents a simple recommendation engine using Pearson’s product moment correlation coefficient, also known as the linear correlation coefficient. The engine uses the correlation coefficient to identify customers with similar purchasing patterns, and presumably tastes, and recommends items purchased by one customer to the other similar customer who has not purchased those items.
Quick Installation Instructions
This article contains sample code for a simple recommendation engine written in GNU Octave. There are four files: simulate_purchases.m, recommend_purchases.m, csvreadfix.m, and randi.m. The randi.m files implements the randi integer random generation function for earlier versions of GNU Octave.
Download and install GNU Octave 3.6.2 if possible. If you have GNU Octave 3.6 or later, you will not need the randi.m file. The simulation and recommendation software runs much faster under GNU Octave 3.6.2 than GNU Octave 3.2.4 on the author’s Windows 7 laptop.
Download the three files simulate_purchases.m, recommend_purchases.m, and csvreadfix.m to a working directory (folder). It is probably prudent to use a separate directory/folder for running the engine. Download randi.m if needed.
Launch GNU Octave (3.6 if possible). Change current directory to your working directory with the downloaded files.
To run simulation and recommendation with default settings, simply enter:
octave-prompt> simulate_purchases; ... octave-prompt> recommend_purchases;
The figure below shows the working directory after running the simulation which creates the purchase data files SIM_xxx which are in comma separated values (CSV) format.
Simulating Purchases
For illustration and development purposes, a simulator was developed that generates simulated purchasing data. The recommendation is for a hypothetical on-line video service that sells one-time views of movies and other video for a small amount, ninety-nine cents for example. This simple recommendation engine does not need to know the price of the items. The online video service has four categories of movies: science-fiction, romantic comedy, action/adventure, and horror. A real online video service would hopefully have a larger selection of categories. The recommendation engine relies on the purchasing history and does not need to know the movie categories.
The simulator and recommendation engine (see below) are GNU Octave functions. These were developed and tested on an HP laptop running Windows 7 using GNU Octave version 3.6.2. Note that the simulation engine uses the randi function which generates random integers uniformly distributed in the range from 1 to N. randi is a new function in Octave not found in earlier versions of Octave. An implementation of a randi function for earlier versions of Octave can be found in the appendix. randi emulates a random number generation function in MATLAB of the same name.
The simulator returns the simulated purchasing data in Octave “matrices” and saves the data to text files — comma separated values or CSV files with a file extension of CSV. The simulated data includes a list of purchases — customer CUSTOMER_ID bought movie MOVIE_ID, a list of simulated customer names, a list of simulated movie names, and the movie category or class — movie MOVIE_ID class CLASS_ID (science fiction, romantic comedy, action, or horror).
simulate_purchases.m
function [purchase_record, cust_names, cust_prefs, movie_name, movie_class] = simulate_purchases(ncust, npurchases, stem, debug) % [purchase_record, cust_names, cust_prefs, movie_name, movie_class] = simulate_purchases(ncust, npurchases, stem, debug) % % Inputs: % ncust -- number of customers (default 100) % npurchases -- number of purchases to simulate (default 100000) % stem -- stem for file names (default 'SIM' for simulated data) % debug -- debug trace flag (default=false) % % Outputs: % purchase_record -- record of purchases (customer id, movie id) % cust_names -- names of customers % cust_prefs -- movie class (scifi, romcom, action, horror) preferences of customers, 1-5 rating % movie_names -- names of movies % movie_class -- movie class (scifi=1, romcom=2, action=3, horror=4) % % (C) 2012 by John F. McGowan, Ph.D. % % simulate_purchases.m (A Simulator for a Recommendation Engine for an On Line Movie Service) % % Simulates a series of on-line purchases of movies by a group of % customers. It then infers the preferences of the simulated % customers and recommends movies they may like. % % This is an Octave script. % GNU Octave is a high-level interpreted language, primarily intended % for numerical computations. It provides capabilities for the % numerical solution of linear and nonlinear problems, and for % performing other numerical experiments. It also provides extensive % graphics capabilities for data visualization and % manipulation. Octave is normally used through its interactive % command line interface, but it can also be used to write % non-interactive programs. The Octave language is quite similar to % Matlab so that most programs are easily portable. % % URL: https://www.gnu.org/software/octave/ % % NOTE: to install io and xlswrite in Octave 3.6.2 on Windows 7 PC % % download and install the io package windows installer from SourceForge % Octave> pkg load io % Octave> savepath (defaults to .octaverc % % set seed for random number generation % this way can generate the same simulated data each time if needed % VALIDATE THE ARGUMENTS TO THE FUNCTION if nargin < 1 ncust = 100; % start with 100 customers end if ischar(ncust) error('FIRST ARGUMENT IS STRING -- SHOULD BE NUMBER OF CUSTOMERS'); return; end if ~isnumeric(ncust) error('FIRST ARGUMENT IS NOT A NUMBER -- SHOULD BE NUMBER OF CUSTOMERS'); return; end if nargin < 2 npurchases = 100000; end if ischar(npurchases) error('SECOND ARGUMENT IS A STRING -- SHOULD BE NUMBER OF PURCHASES'); return end if ~isnumeric(npurchases) error('FIRST ARGUMENT IS NOT A NUMBER -- SHOULD BE NUMBER OF PURCHASES'); return; end if nargin < 3 stem = 'SIM'; end if ~ischar(stem) error('STEM IS NOT A STRING'); return; end if nargin < 4 debug = false; end if ~islogical(debug) error('ARGUMENT FOUR (DEBUG TRACE FLAG) IS NOT LOGICAL TYPE'); return; end disp('SIMULATING CUSTOMER PURCHASES'); fflush(stdout); rand('seed', 1234567); myseed = rand('seed'); % simulate customers nclasses = 4; class = { 'scifi', 'romcom', 'action', 'horror'}; cust_prefs = randi(5, [ncust nclasses]); % customer preferences by % type of movie cust_cdf = cumsum(cust_prefs, 2); % cumulative sum first_names = { 'John', 'James', 'Jane', 'Jennifer', 'Scott', 'Steve', ... 'Sally', 'Serena', 'Tom', 'Tiffany', 'Adrian', ... 'Ariel', 'Bob', 'Betty', 'Charles', 'Christie', ... 'Derek', 'Dana', 'Edward', 'Ellen', 'Frank', 'Fay', ... 'Greg', 'Gayle', 'Henry', 'Helen', 'Izzy', 'Iris', ... 'Jack', 'Janet', 'Kyle', 'Kay', 'Lincoln', 'Letitia', ... 'Mark', 'May', 'Neville', 'Nell', 'Oscar', 'Ophelia', ... 'Peter', 'Petrana', 'Quimby', 'Quelle', 'Roger', ... 'Rozanne', 'Selden', 'Samantha', 'Terrence', 'Tania', ... 'Ulrich', 'Uma', 'Victor', 'Violet', 'Warner', ... 'Wilma', 'Xipe', 'Xyla', 'Yorick', 'Yvette', 'Zeb', 'Zoe' }; last_names = { 'Smith', 'Jones', 'Russell', 'Walker', 'Edwards', ... 'Allison', 'Barker', 'Coates', 'Denning', 'Essex', ... 'Foley', 'Granger', 'Hart', 'Ivington', 'Johns', ... 'King', 'Lawson', 'McGowan', 'Newhall', 'O''Henry', ... 'Petersen', 'Quinne', 'Rogers', 'Stevens', 'Thomas', ... 'Unitis', 'Volcker', 'Watts', 'Xibalba', 'Yates', ... 'Zoller'}; cust_names = {}; for i=1:ncust name = [first_names{randi(length(first_names))} ' ' ... last_names{randi(length(last_names))}]; cust_names{i} = name; end % generate a library of movies for the customers to purchase nmovies = nclasses*100; movie_class = randi(nclasses, [nmovies 1]); % class of movie sf_movies = find(movie_class == 1); rm_movies = find(movie_class == 2); a_movies = find(movie_class == 3); h_movies = find(movie_class == 4); num_sf = length(sf_movies); num_rm = length(rm_movies); num_a = length(a_movies); num_h = length(h_movies); n = [num_sf num_rm num_a num_h]; maxn = max(n); idxmat = zeros(4, maxn); idxmat(1,1:num_sf) = sf_movies; idxmat(2,1:num_rm) = rm_movies; idxmat(3,1:num_a) = a_movies; idxmat(4,1:num_h) = h_movies; % Make Titles for Movies % Science Fiction Movie Titles sf_start = { 'Zardoz', 'Star', 'Space', 'Mutant', 'Atomic', 'Computer', ... 'Nerd', 'Time', 'Galaxy' }; sf_end = { 'Wars', 'Traveller', 'Genesis', 'Quest', 'Escape', 'Visit', ... 'Explorer' }; % Romantic Comedy (rom-com) Movie Titles rm_start = { 'Harry', 'Tom', 'Peter', 'Gregory', 'Alastair', 'Dirk', ... 'Drake', 'Scott', 'Harrison', 'Berthold' }; rm_middle = { 'Meets', 'Loves', 'Hates', 'and', 'versus' }; rm_end = { 'Julia', 'Sally', 'Aurora', 'Meg', 'Michelle', 'Summer', ... 'Cameron', 'Jennifer', 'Jenny', 'Heather', 'Wendy', ... 'Rachel', 'Natalie', 'Portia'}; % Action Adventure Movie Titles a_start = { 'Ninja', 'Barbarian', 'Warrior', 'Killer', 'Sword', ... 'Kung Fu', 'Knight' }; a_end = { 'Battle', 'War', 'Raid', 'Apocalypse', 'Armageddon', ... 'Rescue', 'Assassin' }; % Horror Movie Titles h_start = { 'The', '', 'Halloween', 'Terror', 'Horror', 'Nightmare', ... 'Satan''s', 'Devil''s'}; h_end = { 'Dog', 'Cat', 'Car', 'Vampire', 'Werewolf', 'Wolf', 'Mummy', ... 'Mutant', 'Demon', 'Apocalypse', 'Zombie', 'Walking Dead', ... 'Armageddon', 'Omen', 'Mother-In-Law', 'Stepfather', ... 'Witch', 'Warlock', 'Ghoul', 'Ghost', 'Banker' }; movie_name = {}; for i = 1:length(movie_class) class = movie_class(i); if class == 1 name = [sf_start{randi(length(sf_start))} ' ' ... sf_end{randi(length(sf_end))}]; elseif class == 2 name = [rm_start{randi(length(rm_start))} ' ' ... rm_middle{randi(length(rm_middle))} ' ' ... rm_end{randi(length(rm_end))}]; elseif class == 3 name = [a_start{randi(length(a_start))} ' ' ... a_end{randi(length(a_end))}]; elseif class == 4 name = [h_start{randi(length(h_start))} ' ' ... h_end{randi(length(h_end))}]; else error('ERROR:INVALID CLASS'); end movie_name{i} = name; end % simulate purchases if nargin < 2 npurchases = 100000; end movies_purchased = []; movies_purchased_class = []; purchasers = []; customer_matrix = zeros(ncust, nmovies); mark = time(); for purchase = 1:npurchases % progress message newtime = time(); delta = newtime - mark; if delta > 2 printf("processsing purchase %d/%d\n", purchase, npurchases); fflush(stdout); mark = time(); end % choose a customer purchaser = randi(ncust); purchasers = [purchasers purchaser]; % choose movie class for purchase score = cust_cdf(purchaser,end)*rand(1, 1); idx = find(cust_cdf(purchaser,:) > score); class = idx(1); movies_purchased_class = [movies_purchased_class class]; index = randi(n(class)); movie_purchased = idxmat(class, index); movies_purchased = [movies_purchased movie_purchased]; customer_matrix(purchaser, movie_purchased)++; end % loop over purchases % Purchase History Simulated disp('Writing Movie Names to Files'); fflush(stdout); %xlswrite('movie_purchases.xls', 'movie_name', 'Name_Sheet'); purchase_record = [purchasers' movies_purchased']; if debug disp(sprintf('stem set to %s\n', stem)); fflush(stdout); end csvwrite([stem '_movie_class.csv'], movie_class); % movie class csvwrite([stem '_customer_preferences.csv'], cust_prefs); % customer prefernces for movie classes (rating 1-5 for each movie class) csvwrite([stem '_movie_name.csv'], char(movie_name)); % names of the movies csvwrite([stem '_customer_name.csv'], char(cust_names)); % names of the customers csvwrite([stem '_purchases.csv'], purchase_record); % (customer id, movie id) csvwrite([stem '_seed.csv'], myseed); % save the random number generator seed used to generate this purchasing data disp('ALL DONE'); end % function simulate_purchases
Recommending Purchases
Once there is enough simulated or real purchasing data, the recommendation engine uses Pearson’s product moment correlation coefficient to identify customers with similar buying patterns. For example, John Smith and Phil Jones might purchase only science fiction movies. In this case, their purchases would be highly correlated. The engine recommends movies that Phil Jones has purchased but not John Smith to John Smith, and vice versa.
The recommendation engine works by building a customer matrix with the customer on one axis and the movie on the other axis. The matrix entries are the number of times the customer has bought a particular movie. Customers with similar or identical tastes such as John Smith and Phil Jones are likely to buy overlapping sets of movies. The engine computes the correlation coefficient for each pair of customers from the customer matrix. If the correlation coefficient is greater than 0.5 and has a p-value, a measure of statistical significance, less than 0.05 (statistician Ronald Fisher’s famous arbitrary cutoff), then the engine makes a recommendation.
The p-value is theoretically the probability that the correlation is due to chance alone; there is no true correlation in the results. A p-value less than 0.05 means there is a less than five percent probablity that the correlation is due to chance alone. The p-value is really more complicated than this simple explanation and the problems with using the p-value are discussed in more detail in the previous article How to Hang Yourself with Statistics.
A recommendation engine for movies is not a life or death application like building a bridge or a nuclear power plant. We would almost never except a five percent probability of error in a life or death application. But in an application like recommending movies, some errors are probably acceptable. Amazon and Netflix make many unsuccessful recommendations but customers keep coming back and buying. This is an application where it is probably acceptable to use the p-value despite its known limitations.
NOTE: The colon : is used heavily in Octave (and MATLAB). Wordpress displays the sequence colon right-parenthesis as smiley face :). The reader will see a few smiley-faces displayed in the source code below. Simply select, copy, and paste the code into a text or code editor. This will correctly paste the colon right-parenthesis sequence into the text or code editor — not a smiley face.
recommend_purchases.m
function [purchase_recommendations] = recommend_purchases(stem, debug) % [purchase_recommendations] = recommend_purchases([stem, debug]) % % Inputs: % stem -- stem for file names of files with simulate or actual purchasing data (default='SIM') % debug -- debug trace flag (default=FALSE) % % Outputs: % purchase_recommendations -- recommended purchases for customers with enough data to make prediction % format of each row is [customer id movieid1 movieid2 ....] % % (C) 2012 by John F. McGowan, Ph.D. % % This is a GNU Octave script. % GNU Octave is a high-level interpreted language, primarily intended % for numerical computations. It provides capabilities for the % numerical solution of linear and nonlinear problems, and for % performing other numerical experiments. It also provides extensive % graphics capabilities for data visualization and % manipulation. Octave is normally used through its interactive % command line interface, but it can also be used to write % non-interactive programs. The Octave language is quite similar to % Matlab so that most programs are easily portable. % % URL: https://www.gnu.org/software/octave/ % % % if nargin < 1 stem = 'SIM'; % default value end % validate argument if ~ischar(stem) error('STEM IS NOT A STRING --- SHOULD BE A STRING -- IDENTIFIER XXX in XXX_NNN.CSV FOR FILES WITH PURCHASING DATA'); return; end if nargin < 2 debug = false; %debug trace flag end if ~islogical(debug) error('DEBUG FLAG IS NOT A LOGICAL -- SHOULD BE TRUE OR FALSE'); return; end cust_names = csvreadfix([stem '_customer_name.csv']); if isempty(cust_names) error('Unable to open data file\n'); return; end movie_name = csvreadfix([stem '_movie_name.csv']); if isempty(movie_name) error('Unable to open data file\n'); return; end % numeric data so use regular csvread function call cust_prefs = csvread([stem '_customer_preferences.csv']); [ncust blatz] = size(cust_names); [nmovies blatz] = size(movie_name); fname = [stem '_purchases.csv']; purchase_record = csvread(fname); [npurchases blatz] = size(purchase_record); purchasers = purchase_record(:,1); % column one is the purchaser/customer id movies_purchased = purchase_record(:,2); % colume two is the movie id if debug disp(sprintf('number of customers: %d\n', ncust) ); disp(sprintf('number of movies: %d\n', nmovies) ); fflush(stdout); end customer_matrix = zeros(ncust, nmovies); % populate the customer matrix for i = 1:npurchases customer = purchasers(i); movieid = movies_purchased(i); customer_matrix(customer, movieid)++; end % analyze purchases now and recommend new purchases for customers %[u s v] = svd(customer_matrix); % correlations bewteen customers version_string = version(); version_cell = strsplit(version_string, '.'); version_value = char(version_cell)'; version_number = str2num(version_value); if version_number >= 360 mycor = corr(customer_matrix'); % columns are customers else % corrcoef is deprecated mycor = corrcoef(customer_matrix'); end mask = eye(ncust); mask = ~mask; mycor = mask .* mycor; similars = zeros(ncust, 3); printf('finding most similar customers by correlation\n'); fflush(stdout); mark = time(); for customer = 1:ncust % progress message newtime = time(); if ((newtime - mark) > 1) printf('processing customer %d / %d\n', customer, ncust); fflush(stdout); mark = time(); end % find a different customer with the most similar buying patterns to the current customer % mx = max(mycor(customer, :)); % maximum correlation other than self customer_index = find(mycor(customer, :) == mx); % % compute p-values test_result = cor_test(customer_matrix(customer, :), customer_matrix(customer_index, :)); similars(customer, 1) = customer_index; % the id of the other customer similars(customer, 2) = mx; % maximum correlation coefficient similars(customer, 3) = test_result.pval; % p-value statistical significance for correlation coefficient if debug printf('%d: %d %f %f\n', customer, customer_index, mx, test_result.pval); fflush(stdout); end end % loop printf('done with computing correlations between customers\n'); fflush(stdout); % ids of customers with very similar tastes % % default to Fisher's 0.05 p-value cutoff % customer_similar = find(similars(:,2) > 0.5 & similars(:,3) < 0.05); matches = similars(customer_similar, 1); % check customer preferences from the purchasing simulation cust_prefs(customer_similar,:) cust_prefs(matches,:) if debug disp('PAUSING'); fflush(stdout); pause(10); end % make some recommendations purchase_recommendations = zeros(length(customer_similar), nmovies+1); for customer_idx = 1:length(customer_similar) printf('processing item %d\n', customer_idx); fflush(stdout); customer = customer_similar(customer_idx); printf('processing customer %d (%s)\n', customer, cust_names(customer,:) ); fflush(stdout); customer_bought = customer_matrix(customer,:); match_purchases = customer_matrix(matches(customer_idx), :); recommendations = find( customer_bought == 0 & match_purchases > 0); printf('suggested movies for customer %d %s\n', customer, cust_names(customer,:) ); fflush(stdout); recommendations; purchase_recommendations(customer_idx, 1) = customer; purchase_recommendations(customer_idx, 2:length(recommendations)+1) = recommendations; for k=1:length(recommendations) printf('%s\n', movie_name(recommendations(k),:) ); fflush(stdout); end % loop over recommendations end % for loop disp('ALL DONE'); end % function recommend_purchases
The recommendation function calls csvreadfix to read some of the data from the data files generated by the simulator.
csvreadfix.m
function [strings] = csvreadfix(fname, debug) % [strings] = csvreadfix(fname, debug) % % Inputs: % fname -- file name of file to read % debug -- debug trace flag (default=FALSE) % % read text lines from CSV file written using GNU Octave csvwrite % % (C) 2012 by John F. McGowan, Ph.D. % if ~ischar(fname) error('FIRST ARGUMENT IS NOT A STRING -- SHOULD BE NAME OF A CSV DATA FILE (E.G. MYDATA.CSV) '); return; end if nargin < 2 debug = false; end if ~islogical(debug) error('SECOND ARGUMENT IS NOT A LOGICAL -- SHOULD BE DEBUG TRACE FLAG (TRUE OR FALSE)'); return; end fid = fopen(fname, 'r'); if fid == -1 strings = []; str = sprintf('Unable to open %s\n', fname); error(str); return; end line = fgets(fid); strings = []; while (line ~= -1) elements = strsplit(line, ','); string = char(elements)'; string = strtrim(string); % remove leading and trailing white space [nrows ncols] = size(string); if nrows > 1 string = string(1,:); end if debug disp('debug'); disp('size(string)'); disp(size(string)); disp(string); fflush(stdout); pause(2); end strings = [strings; string]; if debug disp(strings); fflush(stdout); end line = fgets(fid); end % while loop fclose(fid); end % function
Conclusion
This article shows how to build a simple recommendation engine based on Pearson’s product moment correlation coefficient using GNU Octave. It uses correlation coefficients to identify customers with similar buying patterns and presumably tastes in movies. There are many additional ways to apply correlations to purchasing data to make recommendations.
There are many more advanced statistical and mathematical methods that can be applied to recommendation engines. These include principal components analysis (PCA), clustering algorithms, neural networks, support vector machines (SVM), and many other techniques from artificial intelligence, signal processing, and mathematical modeling. Many of these more advanced methods are related to one another and, in some cases, are actually the same method with a different name or slight cosmetic differences.
© 2012 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Appendix I: RANDI FUNCTION
The randi function is new in recent versions of GNU Octave. It is not present in earlier versions such as Octave 3.2. Below is the source code for a GNU Octave randi function for earlier versions of Octave. It is probably better to upgrade to version 3.6 of GNU Octave if possible. randi emulates a random number generation function in MATLAB.
function [result] = randi(imax, n, m, debug) % [result] = randi(image [, n, m, debug]) % % (C) 2012 by John F. McGowan, Ph.D. if ~isnumeric(imax) error('FIRST ARGUMENT IS NOT A NUMBER -- SHOULD BE IMAX WHERE RANDI RETURNS INTEGER FROM 1 to IMAX'); return; end if nargin < 4 debug = false; end if debug disp(sprintf('randi called with arguments %d\n', imax)); fflush(stdout); end if nargin == 1 result = floor(imax .* rand(1,1)) + 1; return; end if nargin == 2 if debug disp('n'); disp(n); end [nrows ncols] = size(n); if debug disp('size(n)'); disp(size(n)); fflush(stdout); end if debug disp(sprintf('%d\n', n)); fflush(stdout); end result = floor(imax .* rand(n)) + 1; return; end if nargin == 3 if debug disp(sprintf('%d %d\n', n, m)); end result = floor(imax .* rand(n,m)) + 1; return; end end % function randi
The algo takes similarity corelation but I think past purchases should also be considered for same gener. And is there anything we can do with cookies?
Also, for that smiley poblem in code, you can use ascii character to make bracket or colon and wp wont display it.
From my reading, the past purchases are very much taken into account, as that is what provides the correlation coefficient to identify other ‘similar’ customers.
I.e. if one person’s purchase history focuses on a given ‘genre’, so will that of ‘similar’ customers. Assuming those similar customers also bought other items within the same genre, the engine would thus tend to recommend items within the genre the person’s purchase history indicates.
However, in the absence of rich purchase histories of a large number of customers, the engine would fail, so one would have to resort to recommending based on some intuitive grouping (‘genre’) as you say, presuambly.