Reccomendation Systems

Introduction

For this topic, we will be analysing a specific example of a basic recommendation system in order to keep things simple. Imagine a film & tv streaming service such as Netflix, how do they accurately suggest what you should watch?

Training data

Before we even start looking at new users to recommend movies to, we need to establish a model for which kinds of films are enjoyed by which kinds of people. To keep our example simple we are going to do this by taking the reviews of six different films by six different critics. The rating that each critic gave each film is given in the table below.

Critic 1Critic 2Critic 3Critic 4Critic 5Critic 6Film 1377557Film 2755687Film 3455888Film 4930584Film 59889107Film 6784898\begin{array}{c|cccccc} & \text{Critic 1} & \text{Critic 2} & \text{Critic 3} & \text{Critic 4} & \text{Critic 5} & \text{Critic 6} \\\\ \hline \text{Film 1} & 3 & 7 & 7 & 5 & 5 & 7 \\\\ \text{Film 2} & 7 & 5 & 5 & 6 & 8 & 7 \\\\ \text{Film 3} & 4 & 5 & 5 & 8 & 8 & 8 \\\\ \text{Film 4} & 9 & 3 & 0 & 5 & 8 & 4 \\\\ \text{Film 5} & 9 & 8 & 8 & 9 & 10 & 7 \\\\ \text{Film 6} & 7 & 8 & 4 & 8 & 9 & 8 \end{array}

This can then be expressed as a maxtrix XmcX_{mc}, and each critic could be expressed as an individual vector in the form Xc=(xc1,...,xcM)X_c = (x_{c1},...,x_{cM}).

[3775577556874558889305849889107784898]\left[ \begin{array}{cccccc} 3 & 7 & 7 & 5 & 5 & 7 \\\\ 7 & 5 & 5 & 6 & 8 & 7 \\\\ 4 & 5 & 5 & 8 & 8 & 8 \\\\ 9 & 3 & 0 & 5 & 8 & 4 \\\\ 9 & 8 & 8 & 9 & 10 & 7 \\\\ 7 & 8 & 4 & 8 & 9 & 8 \end{array} \right]

Each column vector is called a ‘feature vector’. We like to always work with vertical vectors becasue they are better to compute.

Introducing new data

Now that we have our table of training data in a form we can compute, we can turn our attention to new user data. Here we have a similar table but showing two users who have only seen and rated two or three films of the six.

User 1User 2Film 1Film 26Film 39Film 42Film 5Film 676\begin{array}{c|cc} & \text{User 1} & \text{User 2} \\\\ \hline \text{Film 1} & - & - \\\\ \text{Film 2} & - & 6 \\\\ \text{Film 3} & - & 9 \\\\ \text{Film 4} & 2 & - \\\\ \text{Film 5} & - & - \\\\ \text{Film 6} & 7 & 6 \end{array}

Our goal is to predict the ratings that these users would give to the films they haven’t seen to fill in the gaps in the table.

We also want to convert this table of data into vector form to allow us to do calculations with it.

Graphing vectors

For visual purposes, we can display all of the vectors we have looked at above in a 2D vector space.

As humans, we can see visually that the two user points are closer to critic x and critic y and therefore who these users align with most. For computers, we have to do a series of calculations in order to determine which points are closest to which. This becomes especially apparent at higher volumes of data and higher dimensions when the data can no longer be visualised in a meaningful way.

Vector comparison

In this two-dimentional case, the equation for the distance between two data points is called the Euclidean distance and is given by:

u=(u1u2)v=(v1v2)\bm{u} = \dbinom{u_1}{u_2} \qquad \bm{v} = \dbinom{v_1}{v_2}

r(u,v)=(u1v1)2+(u2v2)2r(\bm{u},\bm{v}) = \sqrt{(u_1 - v_1)^2 + (u_2 - v_2)^2}

This can also be generalised for when dealing with cases in any dimention:

r(u,v)=k=1D(ukvk)2r(\bm{u},\bm{v}) = \sqrt{\sum\limits_{k=1}^{D} (u_k - v_k)^2}