For a movie recommendation system, cosine similarity is often used to calculate similarity scores between movies based on genre correlation.
Key takeaways:
Recommendation systems enhance user experience through personalized suggestions in areas like e-commerce and movie streaming.
In R, a movie recommender is built by creating a dataset, calculating cosine similarity between users, and recommending unseen movies based on similar users' ratings.
Cosine similarity is used to find users with similar preferences, avoiding division by zero errors.
We know already that recommendation systems are a class of algorithms and techniques that are utilized in information filtering and decision support systems to provide personalization suggestions to users. They are commonly used in domains including e-commerce, movie streaming, and social media due to their ability to improve user experience. These systems are mostly based on collaborative and content-based filtering techniques due to their ease of implementation.
Recommendation systems can be essential in the movie streaming domain, and it is important to know how movie recommenders are implemented with our programming knowledge to make these systems work. Here, we will discuss the key steps to build a movie recommender in the R language.
Firstly, we will construct a simple dataset from a set of users with their ratings of various movies. This is done by defining the data in a DataFrame and then converting it into a matrix (via the spread function) so it can be used in later steps. We can run the code below to see what the input data will look like in the matrix form.
library(tidyr)#users ID range from 1 to 5, movie names are A to E and ratings are 1 to 5ratings <- data.frame(user = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),movie = c("A", "B", "A", "C", "B", "C", "D", "E", "D", "E"),rating = c(5, 4, 4, 3, 3, 2, 2, 1, 1, 5))# Create a user-item matrix; output dimension is 5 by 5 due to 5 users and 5 different moviesuser_item_matrix <- spread(ratings, movie, rating, fill = 0)#The rows of the matrix represent the user number and the columns representing each movie.print(user_item_matrix)
Note: To use the
spread
function, we must import thetidyr
package first.
Next, we will implement a basic cosine similarity function that will calculate the similarity of values between two vectors, or in this case, users. We also must take extra care when handling potential division by zero
errors when we implement this function.
#cosine similarity functioncosine_similarity <- function(user1, user2) {dot_product <- sum(user1 * user2)norm_user1 <- sqrt(sum(user1^2))norm_user2 <- sqrt(sum(user2^2))if (norm_user1 == 0 || norm_user2 == 0) {return(0) # Handling division by zero error}return(dot_product / (norm_user1 * norm_user2))}
Click the “Show Formula” button below to get a better idea of how cosine similarity is calculated.
Finally, we will calculate the cosine similarity in a pair-wise fashion, between the target user and all the other users in a finite loop. These similarity scores are then sorted in descending order, making the rating filtering process more efficient (from lines 5 to 10).
# Function to recommend movies for a given user IDrecommend_movies <- function(user_id, user_item_matrix) {user_ratings <- user_item_matrix[user_id, ]# Calculate cosine similarity between the selected user and all other userssimilarities <- sapply(1:nrow(user_item_matrix), function(i) {cosine_similarity(user_ratings, user_item_matrix[i, ])})# Sorting users by similarity in descending ordersimilar_users <- order(similarities, decreasing = TRUE)# Find movies the user has not seen (i.e with a rating = 0)unseen_movies <- which(user_ratings == 0)# Recommend movies from most similar users; using the union operator to avoid duplicate movies being returnedrecommended_movies <- numeric(0)for (user in similar_users) {similar_user_ratings <- user_item_matrix[user, ]recommended_movies <- union(recommended_movies, unseen_movies[unseen_movies %in% which(similar_user_ratings > 0)])# Limit the number of recommendations (here we are limiting the number of recommendations to five)if (length(recommended_movies) >= 5) {break}}#returning the list of recommended movies for a given user IDreturn(names(user_ratings)[recommended_movies])}
The recommended movies are then evaluated by adding unseen movies with user_ratings
equal to zero and user ratings of similar users greater than zero (i.e similar_user_ratings
> 0), by the target user to the list of recommended movies, which is returned in the end (from lines 15 to 27).
By combining all of these steps, we get the final code for the movie recommender in R, which uses collaborative filtering. It is shown below. Run it to see the output for any user ID.
library(tidyr)# Sample ratings dataratings <- data.frame(user = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),movie = c("A", "B", "A", "C", "B", "C", "D", "E", "D", "E"),rating = c(5, 4, 4, 3, 3, 2, 2, 1, 1, 5))# Create a user-item matrixuser_item_matrix <- spread(ratings, movie, rating, fill = 0)# Cosine similarity functioncosine_similarity <- function(user1, user2) {dot_product <- sum(user1 * user2)norm_user1 <- sqrt(sum(user1^2))norm_user2 <- sqrt(sum(user2^2))if (norm_user1 == 0 || norm_user2 == 0) {return(0) # Handling division by zero}return(dot_product / (norm_user1 * norm_user2))}# Function to recommend movies for a given user idrecommend_movies <- function(user_id, user_item_matrix) {user_ratings <- user_item_matrix[user_id, ]# Calculate similarity between the selected user and all other userssimilarities <- sapply(1:nrow(user_item_matrix), function(i) {cosine_similarity(user_ratings, user_item_matrix[i, ])})# Exclude the user itselfsimilarities[user_id] <- -Inf# Sort users by similarity in descending ordersimilar_users <- order(similarities, decreasing = TRUE)# Find movies the user has not seen (rating = 0)unseen_movies <- which(user_ratings == 0)# Calculate predicted ratings for unseen moviesmovie_scores <- numeric(ncol(user_item_matrix) - 1) # Subtract 1 to ignore user columnnames(movie_scores) <- colnames(user_item_matrix)[-1]for (movie_index in unseen_movies) {movie <- colnames(user_item_matrix)[movie_index]weighted_sum <- 0similarity_sum <- 0for (user in similar_users) {if (user_item_matrix[user, movie_index] > 0) {weighted_sum <- weighted_sum + similarities[user_id] * user_item_matrix[user, movie_index]similarity_sum <- similarity_sum + abs(similarities[user_id])}}if (similarity_sum > 0) {movie_scores[movie] <- weighted_sum / similarity_sum}}# Recommend movies with the highest predicted ratingsrecommended_movies <- names(sort(movie_scores, decreasing = TRUE))return(recommended_movies)}# Test the function for User 1user_id <- 1recommended_movies <- recommend_movies(user_id, user_item_matrix)print(paste("Recommended movies for User", user_id, ":"))print(recommended_movies)
Note: We can change the input user ID (ranging from 1 to 5) to generate different results.
We are recommending movies A and B to user 1. Let’s break down the process behind suggesting these movies.
Lines 3–7: We create a data frame called ratings
with three columns: user
, movie
, and rating
. It contains the ratings given by different users to different movies.
Line 10: The spread
function converts the ratings
data frame into a user-item matrix where rows represent users, columns represent movies, and cells contain ratings. Missing ratings are filled with 0
.
Lines 13–20: This function calculates the cosine similarity between two users. The steps are:
Compute the dot product of the two user vectors.
Compute the norms (magnitudes) of the two user vectors.
Handle the case where one of the norms is zero to avoid division by zero.
Return the cosine similarity, which is the dot product divided by the product of the norms.
Lines 25–50: This function recommends movies for a given user:
Retrieve the ratings for the given user.
Compute the cosine similarity between the given user and all other users.
Sort users based on similarity in descending order.
Identify movies that the given user has not seen.
Recommend movies that similar users have rated, avoiding duplicates.
Limit the recommendations to a maximum of 5 movies.
Return the names of the recommended movies.
Lines 53–56: We set user_id
to 1, calls the recommend_movies
function to get movie recommendations for User 1 and print the recommended movies, which are A and B.
An important takeaway is that this was one of the simpler ways to implement movie recommenders in R. Having movie recommenders at our disposal can increase user engagement, help us discover new content, and filter content efficiently.
Haven’t found what you were looking for? Contact Us
Free Resources