Building a movie recommender in R

Key takeaways:
Recommendation systems enhance user experience through personalized suggestions in areas like e-commerce and movie streaming.
In R, a movie recommender is built by creating a dataset, calculating cosine similarity between users, and recommending unseen movies based on similar users' ratings.
Cosine similarity is used to find users with similar preferences, avoiding division by zero errors.

Step-by-step procedure

Recommendation systems can be essential in the movie streaming domain, and it is important to know how movie recommenders are implemented with our programming knowledge to make these systems work. Here, we will discuss the key steps to build a movie recommender in the R language.

Step 1: Create a sample dataset

Firstly, we will construct a simple dataset from a set of users with their ratings of various movies. This is done by defining the data in a DataFrame and then converting it into a matrix (via the spread function) so it can be used in later steps. We can run the code below to see what the input data will look like in the matrix form.

library(tidyr)
#users ID range from 1 to 5, movie names are A to E and ratings are 1 to 5
ratings <- data.frame(
  user = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
  movie = c("A", "B", "A", "C", "B", "C", "D", "E", "D", "E"),
  rating = c(5, 4, 4, 3, 3, 2, 2, 1, 1, 5)
)
# Create a user-item matrix; output dimension is 5 by 5 due to 5 users and 5 different movies
user_item_matrix <- spread(ratings, movie, rating, fill = 0)
#The rows of the matrix represent the user number and the columns representing each movie.
print(user_item_matrix)

# Function to recommend movies for a given user ID
recommend_movies <- function(user_id, user_item_matrix) {
  user_ratings <- user_item_matrix[user_id, ]
  
  # Calculate cosine similarity between the selected user and all other users
  similarities <- sapply(1:nrow(user_item_matrix), function(i) {
    cosine_similarity(user_ratings, user_item_matrix[i, ])
  })
  # Sorting users by similarity in descending order
  similar_users <- order(similarities, decreasing = TRUE)
  
  # Find movies the user has not seen (i.e with a rating = 0)
  unseen_movies <- which(user_ratings == 0)
  
  # Recommend movies from most similar users; using the union operator to avoid duplicate movies being returned 
  recommended_movies <- numeric(0)
  for (user in similar_users) {
    similar_user_ratings <- user_item_matrix[user, ]
    recommended_movies <- union(recommended_movies, unseen_movies[unseen_movies %in% which(similar_user_ratings > 0)])
    
    # Limit the number of recommendations (here we are limiting the number of recommendations to five)
    if (length(recommended_movies) >= 5) {
      break
    }
  }
  #returning the list of recommended movies for a given user ID
  return(names(user_ratings)[recommended_movies])
}

library(tidyr)
# Sample ratings data
ratings <- data.frame(
  user = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
  movie = c("A", "B", "A", "C", "B", "C", "D", "E", "D", "E"),
  rating = c(5, 4, 4, 3, 3, 2, 2, 1, 1, 5)
)
# Create a user-item matrix
user_item_matrix <- spread(ratings, movie, rating, fill = 0)
# Cosine similarity function
cosine_similarity <- function(user1, user2) {
  dot_product <- sum(user1 * user2)
  norm_user1 <- sqrt(sum(user1^2))
  norm_user2 <- sqrt(sum(user2^2))
  
  if (norm_user1 == 0 || norm_user2 == 0) {
    return(0)  # Handling division by zero
  }
  
  return(dot_product / (norm_user1 * norm_user2))
}
# Function to recommend movies for a given user id
recommend_movies <- function(user_id, user_item_matrix) {
  user_ratings <- user_item_matrix[user_id, ]
  
  # Calculate similarity between the selected user and all other users
  similarities <- sapply(1:nrow(user_item_matrix), function(i) {
    cosine_similarity(user_ratings, user_item_matrix[i, ])
  })
  
  # Exclude the user itself
  similarities[user_id] <- -Inf
  
  # Sort users by similarity in descending order
  similar_users <- order(similarities, decreasing = TRUE)
  
  # Find movies the user has not seen (rating = 0)
  unseen_movies <- which(user_ratings == 0)
  
  # Calculate predicted ratings for unseen movies
  movie_scores <- numeric(ncol(user_item_matrix) - 1)  # Subtract 1 to ignore user column
  names(movie_scores) <- colnames(user_item_matrix)[-1]
  
  for (movie_index in unseen_movies) {
    movie <- colnames(user_item_matrix)[movie_index]
    weighted_sum <- 0
    similarity_sum <- 0
    
    for (user in similar_users) {
      if (user_item_matrix[user, movie_index] > 0) {
        weighted_sum <- weighted_sum + similarities[user_id] * user_item_matrix[user, movie_index]
        similarity_sum <- similarity_sum + abs(similarities[user_id])
      }
    }
    
    if (similarity_sum > 0) {
      movie_scores[movie] <- weighted_sum / similarity_sum
    }
  }
  
  # Recommend movies with the highest predicted ratings
  recommended_movies <- names(sort(movie_scores, decreasing = TRUE))
  
  return(recommended_movies)
}
# Test the function for User 1
user_id <- 1
recommended_movies <- recommend_movies(user_id, user_item_matrix)
print(paste("Recommended movies for User", user_id, ":"))
print(recommended_movies)

We are recommending movies A and B to user 1. Let’s break down the process behind suggesting these movies.

Lines 3–7: We create a data frame called ratings with three columns: user, movie, and rating. It contains the ratings given by different users to different movies.
Line 10: The spread function converts the ratings data frame into a user-item matrix where rows represent users, columns represent movies, and cells contain ratings. Missing ratings are filled with 0.
Lines 13–20: This function calculates the cosine similarity between two users. The steps are:
- Compute the dot product of the two user vectors.
- Compute the norms (magnitudes) of the two user vectors.
- Handle the case where one of the norms is zero to avoid division by zero.
- Return the cosine similarity, which is the dot product divided by the product of the norms.
Lines 25–50: This function recommends movies for a given user:
- Retrieve the ratings for the given user.
- Compute the cosine similarity between the given user and all other users.
- Sort users based on similarity in descending order.
- Identify movies that the given user has not seen.
- Recommend movies that similar users have rated, avoiding duplicates.
- Limit the recommendations to a maximum of 5 movies.
- Return the names of the recommended movies.
Lines 53–56: We set user_id to 1, calls the recommend_movies function to get movie recommendations for User 1 and print the recommended movies, which are A and B.

Conclusion

An important takeaway is that this was one of the simpler ways to implement movie recommenders in R. Having movie recommenders at our disposal can increase user engagement, help us discover new content, and filter content efficiently.

Frequently asked questions

Haven’t found what you were looking for? Contact Us

What is the methodology for movie recommendation system?

For a movie recommendation system, cosine similarity is often used to calculate similarity scores between movies based on genre correlation.

What two techniques do recommender systems use?

Recommender systems use two main techniques: collaborative filtering and content-based filtering.

Which algorithms are best for recommender system?

The best algorithms for recommender systems include:

Collaborative filtering (both user-based and item-based)
Content-based filtering
Matrix factorization (e.g., SVD)
Hybrid models combining multiple techniques