Building a movie recommender in R

Key takeaways:

  • Recommendation systems enhance user experience through personalized suggestions in areas like e-commerce and movie streaming.

  • In R, a movie recommender is built by creating a dataset, calculating cosine similarity between users, and recommending unseen movies based on similar users' ratings.

  • Cosine similarity is used to find users with similar preferences, avoiding division by zero errors.

We know already that recommendation systems are a class of algorithms and techniques that are utilized in information filtering and decision support systems to provide personalization suggestions to users. They are commonly used in domains including e-commerce, movie streaming, and social media due to their ability to improve user experience. These systems are mostly based on collaborative and content-based filtering techniques due to their ease of implementation.

Step-by-step procedure

Recommendation systems can be essential in the movie streaming domain, and it is important to know how movie recommenders are implemented with our programming knowledge to make these systems work. Here, we will discuss the key steps to build a movie recommender in the R language.

Step 1: Create a sample dataset

Firstly, we will construct a simple dataset from a set of users with their ratings of various movies. This is done by defining the data in a DataFrame and then converting it into a matrix (via the spread function) so it can be used in later steps. We can run the code below to see what the input data will look like in the matrix form.

library(tidyr)
#users ID range from 1 to 5, movie names are A to E and ratings are 1 to 5
ratings <- data.frame(
user = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
movie = c("A", "B", "A", "C", "B", "C", "D", "E", "D", "E"),
rating = c(5, 4, 4, 3, 3, 2, 2, 1, 1, 5)
)
# Create a user-item matrix; output dimension is 5 by 5 due to 5 users and 5 different movies
user_item_matrix <- spread(ratings, movie, rating, fill = 0)
#The rows of the matrix represent the user number and the columns representing each movie.
print(user_item_matrix)

Note: To use the spread function, we must import the tidyr package first.

Step 2: Implement cosine similarity

Next, we will implement a basic cosine similarity function that will calculate the similarity of values between two vectors, or in this case, users. We also must take extra care when handling potential division by zero errors when we implement this function.

#cosine similarity function
cosine_similarity <- function(user1, user2) {
dot_product <- sum(user1 * user2)
norm_user1 <- sqrt(sum(user1^2))
norm_user2 <- sqrt(sum(user2^2))
if (norm_user1 == 0 || norm_user2 == 0) {
return(0) # Handling division by zero error
}
return(dot_product / (norm_user1 * norm_user2))
}

Click the “Show Formula” button below to get a better idea of how cosine similarity is calculated.

Step 3: Evaluate the recommended movies of each user

Finally, we will calculate the cosine similarity in a pair-wise fashion, between the target user and all the other users in a finite loop. These similarity scores are then sorted in descending order, making the rating filtering process more efficient (from lines 5 to 10).

# Function to recommend movies for a given user ID
recommend_movies <- function(user_id, user_item_matrix) {
user_ratings <- user_item_matrix[user_id, ]
# Calculate cosine similarity between the selected user and all other users
similarities <- sapply(1:nrow(user_item_matrix), function(i) {
cosine_similarity(user_ratings, user_item_matrix[i, ])
})
# Sorting users by similarity in descending order
similar_users <- order(similarities, decreasing = TRUE)
# Find movies the user has not seen (i.e with a rating = 0)
unseen_movies <- which(user_ratings == 0)
# Recommend movies from most similar users; using the union operator to avoid duplicate movies being returned
recommended_movies <- numeric(0)
for (user in similar_users) {
similar_user_ratings <- user_item_matrix[user, ]
recommended_movies <- union(recommended_movies, unseen_movies[unseen_movies %in% which(similar_user_ratings > 0)])
# Limit the number of recommendations (here we are limiting the number of recommendations to five)
if (length(recommended_movies) >= 5) {
break
}
}
#returning the list of recommended movies for a given user ID
return(names(user_ratings)[recommended_movies])
}

The recommended movies are then evaluated by adding unseen movies with user_ratings equal to zero and user ratings of similar users greater than zero (i.e similar_user_ratings > 0), by the target user to the list of recommended movies, which is returned in the end (from lines 15 to 27).

Code example

By combining all of these steps, we get the final code for the movie recommender in R, which uses collaborative filtering. It is shown below. Run it to see the output for any user ID.

library(tidyr)
# Sample ratings data
ratings <- data.frame(
user = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
movie = c("A", "B", "A", "C", "B", "C", "D", "E", "D", "E"),
rating = c(5, 4, 4, 3, 3, 2, 2, 1, 1, 5)
)
# Create a user-item matrix
user_item_matrix <- spread(ratings, movie, rating, fill = 0)
# Cosine similarity function
cosine_similarity <- function(user1, user2) {
dot_product <- sum(user1 * user2)
norm_user1 <- sqrt(sum(user1^2))
norm_user2 <- sqrt(sum(user2^2))
if (norm_user1 == 0 || norm_user2 == 0) {
return(0) # Handling division by zero
}
return(dot_product / (norm_user1 * norm_user2))
}
# Function to recommend movies for a given user id
recommend_movies <- function(user_id, user_item_matrix) {
user_ratings <- user_item_matrix[user_id, ]
# Calculate similarity between the selected user and all other users
similarities <- sapply(1:nrow(user_item_matrix), function(i) {
cosine_similarity(user_ratings, user_item_matrix[i, ])
})
# Exclude the user itself
similarities[user_id] <- -Inf
# Sort users by similarity in descending order
similar_users <- order(similarities, decreasing = TRUE)
# Find movies the user has not seen (rating = 0)
unseen_movies <- which(user_ratings == 0)
# Calculate predicted ratings for unseen movies
movie_scores <- numeric(ncol(user_item_matrix) - 1) # Subtract 1 to ignore user column
names(movie_scores) <- colnames(user_item_matrix)[-1]
for (movie_index in unseen_movies) {
movie <- colnames(user_item_matrix)[movie_index]
weighted_sum <- 0
similarity_sum <- 0
for (user in similar_users) {
if (user_item_matrix[user, movie_index] > 0) {
weighted_sum <- weighted_sum + similarities[user_id] * user_item_matrix[user, movie_index]
similarity_sum <- similarity_sum + abs(similarities[user_id])
}
}
if (similarity_sum > 0) {
movie_scores[movie] <- weighted_sum / similarity_sum
}
}
# Recommend movies with the highest predicted ratings
recommended_movies <- names(sort(movie_scores, decreasing = TRUE))
return(recommended_movies)
}
# Test the function for User 1
user_id <- 1
recommended_movies <- recommend_movies(user_id, user_item_matrix)
print(paste("Recommended movies for User", user_id, ":"))
print(recommended_movies)

Note: We can change the input user ID (ranging from 1 to 5) to generate different results.

We are recommending movies A and B to user 1. Let’s break down the process behind suggesting these movies.

  • Lines 3–7: We create a data frame called ratings with three columns: user, movie, and rating. It contains the ratings given by different users to different movies.

  • Line 10: The spread function converts the ratings data frame into a user-item matrix where rows represent users, columns represent movies, and cells contain ratings. Missing ratings are filled with 0.

  • Lines 13–20: This function calculates the cosine similarity between two users. The steps are:

    • Compute the dot product of the two user vectors.

    • Compute the norms (magnitudes) of the two user vectors.

    • Handle the case where one of the norms is zero to avoid division by zero.

    • Return the cosine similarity, which is the dot product divided by the product of the norms.

  • Lines 25–50: This function recommends movies for a given user:

    • Retrieve the ratings for the given user.

    • Compute the cosine similarity between the given user and all other users.

    • Sort users based on similarity in descending order.

    • Identify movies that the given user has not seen.

    • Recommend movies that similar users have rated, avoiding duplicates.

    • Limit the recommendations to a maximum of 5 movies.

    • Return the names of the recommended movies.

  • Lines 53–56: We set user_id to 1, calls the recommend_movies function to get movie recommendations for User 1 and print the recommended movies, which are A and B.

Conclusion

An important takeaway is that this was one of the simpler ways to implement movie recommenders in R. Having movie recommenders at our disposal can increase user engagement, help us discover new content, and filter content efficiently.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What is the methodology for movie recommendation system?

For a movie recommendation system, cosine similarity is often used to calculate similarity scores between movies based on genre correlation.


What two techniques do recommender systems use?

Recommender systems use two main techniques: collaborative filtering and content-based filtering.


Which algorithms are best for recommender system?

The best algorithms for recommender systems include:

  1. Collaborative filtering (both user-based and item-based)
  2. Content-based filtering
  3. Matrix factorization (e.g., SVD)
  4. Hybrid models combining multiple techniques

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved