Improving the results of Collaborative Filtering

Filtering Out the Noise

Let's figure out what went wrong with Item-Based Collaborative Filtering. We computed correlation scores between movies based on their user ratings vectors, and the results we got weren't the best; some of the recommended movies were pretty obscure. One thing that may be happening here is that we are affected by noise. Given that a lot of people watcg Star wars, if even a small number of those people also watched some other obscure film, we'd end up with a good correlation. This probably isn't what we want. Yes, the two people in the world that watch the movie "Full Speed", and both liked it in addition to Star Wars, it may be a good recommendation for them, but it's probably not a good recommendation for the rest of the world. We need to have some sort of a confidence level in our similarities by enforcing a minimum boundary of how many people watched a given movie. We can't make a judgment that a given movie is good just based on the behavior of one or two people.

To improve the recommendations, we filter out movies with fewer than 100 ratings. We construct a new DataFrame that counts up how many ratings exist for each movie, and also the average rating for later. After that, we can join this data with our original set of similar movies to Star Wars, and sort by similarity:

import numpy as np
movieStats = ratings.groupby('title').agg({'rating': [np.size, np.mean]})
popularMovies = movieStats['rating']['size'] >= 100
movieStats[popularMovies].sort_values([('rating', 'mean')], ascending=False)[:15]
df = movieStats[popularMovies].join(pd.DataFrame(similarMovies, columns=['similarity']))
df.sort_values(['similarity'], ascending=False)[:15]

title	rating	mean	similarity
Star Wars (1977)	584	4.359589	1.000000
Empire Strikes Back, The (1980)	368	4.206522	0.748353
Return of the Jedi (1983)	507	4.007890	0.672556
Raiders of the Lost Ark (1981)	420	4.252381	0.536117
Austin Powers: International Man of Mystery (1997)	130	3.246154	0.377433
Sting, The (1973)	241	4.058091	0.367538
Indiana Jones and the Last Crusade (1989)	331	3.930514	0.350107
Pinocchio (1940)	101	3.673267	0.347868
Frighteners, The (1996)	115	3.234783	0.332729
L.A. Confidential (1997)	297	4.161616	0.319065

This leaves us with a better set of movie recommendations. Now, ideally, we'd also filter out Star Wars, because we don't want to be looking at similarities to the movie itself, and ideally we'd experiment with different cut-off values.

Comprehensive Collaborative Filtering

We can go beyond recommending movies similar to just one you liked. We can look at all the movies you've rated and offer suggestions based on a broader understanding of your taste, and use that to actually produce the best recommendation movies for any given user in our dataset. Computing pivot table like before:

title user_id	'Til There Was You (1997)	1-900 (1994)	101 Dalmatians (1996)	12 Angry Men (1957)
0	NaN	NaN	NaN	NaN
1	NaN	NaN	2.0	5.0
2	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN

This time, we can use the corr() function from Pandas that computes the correlation score for every column pair found in the entire matrix.

corrMatrix = userRatings.corr()
corrMatrix.head()

title title	'Til There Was You (1997)	1-900 (1994)	101 Dalmatians (1996)	12 Angry Men (1957)
'Til There Was You (1997)	1.0	NaN	-1.000000	-0.500000
1-900 (1994)	NaN	1.0	NaN	NaN
101 Dalmatians (1996)	-1.0	NaN	1.000000	-0.049890
12 Angry Men (1957)	-0.5	NaN	-0.049890	1.000000

Now just like earlier, we have to deal with spurious results. So, I don't want to be looking at relationships that are based on a small amount of behavior information. We'll use the min_periods argument to throw out results where fewer than 100 users rated a given movie pair. One other method we can use is the actual correlation score method e.g. Pearson correlation.

corrMatrix = userRatings.corr(method='pearson', min_periods=100)
corrMatrix.head()

Running that will get rid of the spurious relationships that are based on just a handful of people.

title title	'Til There Was You (1997)	1-900 (1994)	101 Dalmatians (1996)	12 Angry Men (1957)
'Til There Was You (1997)	NaN	NaN	NaN	NaN
1-900 (1994)	NaN	NaN	NaN	NaN
101 Dalmatians (1996)	NaN	NaN	1.0	NaN
12 Angry Men (1957)	NaN	NaN	NaN	1.0

It's a little bit different to what we did in the item similarities where we just threw out any movie that was rated by less than 100 people. What we're doing here, is throwing out movie similarities where less than 100 people rated both of those movies. So, we can see in the preceding matrix that we have a lot more NaN values.

In fact, even movies that are similar to themselves get thrown out, so for example, the movie 1-900 (1994) was, presumably, watched by fewer than 100 people so it just gets tossed entirely. The movie, 101 Dalmatians (1996) however, survives with a correlation score of 1.

Generating Recommendations

Now let's produce some movie recommendations for user ID 0. This guy really likes Star Wars and The Empire Strikes Back, but hated Gone with the Wind. Extract ratings from the userRatings DataFrame, and use dropna() to get rid of missing data.

title
Empire Strikes Back, The (1980)    5.0
Gone with the Wind (1939)          1.0
Star Wars (1977)                   5.0
Name: 0, dtype: float64

Loop over each movie, and build up a list of possible recommendations based on the movies similar by retrieving the list of similar movies from our correlation matrix. I'll then scale those correlation scores by how well the user rated the movie they are similar to i.e. movies similar to ones the user liked count more than movies similar to ones he hated:

simCandidates = pd.Series()
for i in range(0, len(myRatings.index)):
    print ("Adding sims for " + myRatings.index[i] + "...")
    # Retrieve similar movies to this one that I rated
    sims = corrMatrix[myRatings.index[i]].dropna()
    # Now scale its similarity by how well I rated this movie
    sims = sims.map(lambda x: x * myRatings[i])
    # Add the score to the list of similarity candidates
    simCandidates = simCandidates.append(sims)
 
#Glance at our results so far:
print ("sorting...")
simCandidates.sort_values(inplace = True, ascending = False)
print (simCandidates.head(10))

Adding sims for Empire Strikes Back, The (1980)...
Adding sims for Gone with the Wind (1939)...
Adding sims for Star Wars (1977)...
sorting...
Empire Strikes Back, The (1980)                       5.000000
Star Wars (1977)                                      5.000000
Empire Strikes Back, The (1980)                       3.741763
Star Wars (1977)                                      3.741763
Return of the Jedi (1983)                             3.606146
Return of the Jedi (1983)                             3.362779
Raiders of the Lost Ark (1981)                        2.693297
Raiders of the Lost Ark (1981)                        2.680586
Austin Powers: International Man of Mystery (1997)    1.887164
Sting, The (1973)                                     1.837692
dtype: float64

These results are good. However, we're seeing that we're getting duplicate values. If we have a movie that was similar to more than one movie that the user had rated, it will come back more than once in the results, so we want to combine those together.

Grouping Duplicate Recommendations

Some of the same movies came up more than once, because they were similar to more than one movie the user rated. We'll use groupby() to add together the scores from movies that show up more than once, so they'll count more.

simCandidates = simCandidates.groupby(simCandidates.index).sum()
simCandidates.sort_values(inplace = True, ascending = False)
simCandidates.head(10)

Empire Strikes Back, The (1980)              8.877450
Star Wars (1977)                             8.870971
Return of the Jedi (1983)                    7.178172
Raiders of the Lost Ark (1981)               5.519700
Indiana Jones and the Last Crusade (1989)    3.488028
Bridge on the River Kwai, The (1957)         3.366616
Back to the Future (1985)                    3.357941
Sting, The (1973)                            3.329843
Cinderella (1950)                            3.245412
Field of Dreams (1989)                       3.222311
dtype: float64

This is looking really good! The last thing we need to do is filter out the movies that the user has already rated/watched, because it doesn't make sense to recommend movies you've already seen.

filteredSims = simCandidates.drop(myRatings.index)
filteredSims.head(10)

Return of the Jedi (1983)                    7.178172
Raiders of the Lost Ark (1981)               5.519700
Indiana Jones and the Last Crusade (1989)    3.488028
Bridge on the River Kwai, The (1957)         3.366616
Back to the Future (1985)                    3.357941
Sting, The (1973)                            3.329843
Cinderella (1950)                            3.245412
Field of Dreams (1989)                       3.222311
Wizard of Oz, The (1939)                     3.200268
Dumbo (1941)                                 2.981645
dtype: float64

Improving the recommendation results

So, we've built a movie recommender system. Great! But there's always room to make it better:

Tweak Existing Heuristics
- How to weigh different recommendation results based on rating of that item that it came from
- How to group duplicate recommendations?
- What threshold you want to pick for the minimum number of people that rated two given movies.
- Correlation Matrix Method of correlation e.g. pearson, kendall, spearman
- Changing the 'minimum period value', e.g. Raising it will return mostly blockbusters for example, lowering it will return new, unheard of movies
Penalize Bad Recommendations
- Let's say you hated "Gone with the Wind," but the system keeps recommending similar movies. Maybe those types of recommendations should get a penalty instead of getting weighted lower
Remove Outliers
- Some users rate a crazy number of movies and might be skewing your results. Consider removing them from your dataset.
Train/Test Evaluation
- Evaluate the results of this recommender engine by using the techniques of train/test.
- Instead of having an arbitrary recommendation score that sums up the correlation scores of each individual movie, actually scale that down to a predicted rating for each given movie
- That would be a quantitative and principled way to measure the error of this recommender engine
Real-world Testing
- The ultimate test? Real-world results. If your goal is to make people watch or buy more, then real-world controlled experiments are your best bet.

Item-Based Collaborative Filtering Bias/variance trade-off

title user_id	'Til There Was You (1997)	1-900 (1994)	101 Dalmatians (1996)	12 Angry Men (1957)
0	NaN	NaN	NaN	NaN
1	NaN	NaN	2.0	5.0
2	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN

title user_id	'Til There Was You (1997)	1-900 (1994)	101 Dalmatians (1996)	12 Angry Men (1957)
0	NaN	NaN	NaN	NaN
1	NaN	NaN	2.0	5.0
2	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN

title user_id	'Til There Was You (1997)	1-900 (1994)	101 Dalmatians (1996)	12 Angry Men (1957)
0	NaN	NaN	NaN	NaN
1	NaN	NaN	2.0	5.0
2	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN