What is item-based collaborative filtering?

Recommendation systems are deployed in applications related to e-learning, online shopping, social media, etc., to provide personalized experiences to their users while harnessing the power of big data and machine learning.

Recommendation systems
Recommendation systems

Recommendation systems are broadly categorized into the following three types:

  • Content-based filtering

  • Collaborative filtering

    • User-based collaborative filtering

    • Item-based collaborative filtering

  • Hybrid filtering

Collaborative filtering and its types

Collaborative filtering considers a user’s past behavior and builds a recommendation model on top of that. For any target user, this past behavior is judged in two ways, i.e., user-user similarity and item-item similarity.

  • User-based collaborative filtering: This technique identifies users who have similar tastes to the target user and suggests items that similar users have enjoyed.

  • Item-based collaborative filtering: This method finds items that share similarities with the ones the target user has shown interest in and then suggests those similar items to the user.

Item-based collaborative filtering
Item-based collaborative filtering

How does item-based collaborative filtering work?

1- Creating a user-item interaction matrix

Let’s suppose there is a dataset of user-item interactions where users have rated movies on a scale of 1 (lowest) to 5 (highest). The data looks as shown in the following table.


Movie1

Movie2

Movie3

Movie4

User1

4

5

3

-

User2

-

3

-

4

User3

5

4

-

2

In this matrix, each row represents a user, and each column represents a movie. The users have provided the ratings, and “-” indicates that a user hasn’t rated a particular movie.

2- Calculating item similarities

The next step for item-based collaborative filtering is to calculate the similarity between items. One common similarity metric is cosine similarity.

The similarity calculations for all pairs of movies result in an item similarity matrix like the one shown below.


Movie1

Movie2

Movie3

Movie4

Movie1

1

0.88

0.62

0.35

Movie2

0.88

1

0.71

0.63

Movie3

0.62

0.71

1

0

Movie4

0.35

0.63

0

1

3- Generating recommendations

Finally, to make recommendations for a user, we take their ratings and multiply them by the respective similarity scores. We then sum up the weighted scores and divide by the sum of the absolute similarities.

For example, if we want to recommend movies for User2, and they haven’t seen Movie1 and Movie3, we’ll use the following formula:

Where:

Based on the provided ratings and the item similarity matrix, the predicted ratings for User2 will be:

  • Movie1: ≈ 3.3

  • Movie3: ≈ 3.0

4- Providing recommendations

Based on the predicted ratings, we can now provide recommendations for User2 by ranking the movies in order of their predicted ratings and suggesting the top ones.

Code example

import numpy as np
import pandas as pd
# Step 1: Create user-item interaction matrix
interaction_matrix = np.array([
[4, 5, 3, 0],
[0, 3, 0, 4],
[5, 4, 0, 2]
])
# Step 2: Calculate item similarities (Cosine Similarity)
def cosine_similarity(item_matrix):
norm = np.linalg.norm(item_matrix, axis=0)
similarity_matrix = np.dot(item_matrix.T, item_matrix) / (norm[:, None] * norm[None, :])
np.fill_diagonal(similarity_matrix, 0) # Set diagonal elements to 0 to avoid self-similarity
return similarity_matrix
# Step 3: Generate recommendations for a specific user
def generate_recommendations(user_ratings, item_similarity_matrix):
# Initialize an array to store the weighted sum and absolute similarity sum
weighted_sum = np.zeros(item_similarity_matrix.shape[0])
abs_similarity_sum = np.zeros(item_similarity_matrix.shape[0])
# Iterate through each item
for item_id, rating in enumerate(user_ratings):
if rating != 0: # Ignore rated items
# Find similar items (non-zero similarity) and their similarity scores
similar_items = np.where(item_similarity_matrix[item_id] != 0)[0]
sim_scores = item_similarity_matrix[item_id, similar_items]
# Update the weighted sum and absolute similarity sum
weighted_sum[similar_items] += rating * sim_scores
abs_similarity_sum[similar_items] += np.abs(sim_scores)
# Calculate the final recommendations
recommendations = np.zeros_like(user_ratings, dtype=float)
non_zero_indices = np.where(abs_similarity_sum != 0)[0]
recommendations[non_zero_indices] = weighted_sum[non_zero_indices] / abs_similarity_sum[non_zero_indices]
# Exclude items already rated by the user
recommendations[user_ratings != 0] = 0
return recommendations
# Step 4: Provide recommendations for User2
user2_ratings = interaction_matrix[1] # User2's ratings
item_similarity_matrix = cosine_similarity(interaction_matrix)
user2_recommendations = generate_recommendations(user2_ratings, item_similarity_matrix)
# Display the non-zero recommendations
movie_names = ["Movie1", "Movie2", "Movie3", "Movie4"]
non_zero_recommendations = pd.DataFrame({
"Movie": [movie_names[i] for i in np.where(user2_recommendations != 0)[0]],
"PredictedRating": user2_recommendations[user2_recommendations != 0]
})
print("Movie Recommendations for User2")
print(non_zero_recommendations)

Code explanation

  • Lines 4–9: We define an interaction matrix where every row represents a user, and every column represents an item.

  • Lines 12–16: The cosine_similarity() function calculates the item similarities as described earlier, except that the diagonal elements are set to 0 instead of 1. Setting the diagonal elements to 0 in the cosine similarity matrix avoids self-similarity, ensuring that an item is not considered similar to itself in collaborative filtering. Therefore, it improves the quality of recommendations for unseen items.

  • Lines 19–42: The generate_recommendations() function takes the user’s ratings and item similarity matrix, iterates through each item the user has interacted with, calculates the weighted sum and absolute similarity sum of similar items, and generates recommendations by dividing the weighted sum by the absolute similarity sum, excluding items already rated by the user.

  • Lines 45–55: We calculate the recommendations for User2 (as an example), using the functions created above.

  • Lines 57–58: We display the recommendations of unseen items for User2.

Conclusion

This Answer gives a simple overview without diving into all the details. We can improve real applications by using more data, advanced techniques and considering various features. The true strength of item-based collaborative filtering comes from constantly improving and optimizing the technique.

Free Resources

HowDev By Educative. Copyright ©2025 Educative, Inc. All rights reserved