Recommendation systems are widely used on retail websites. They provide personalized recommendations to users based on certain factors like their search history, demographic information, and past purchases and, therefore, are an important component of these retail websites. Testing the performance of a recommendation system is necessary before it’s integrated with these systems. For this, we use certain performance metrics, which are discussed below.
Several evaluation metrics are used for testing the performance of a recommendation system. They are described as follows:
The following contains all the accuracy metrics.
To understand MAP@K, we first need to go over Precision@K.
Precision@K is the proportion of relevant recommended items in a list of top K recommended items. Suppose the recommendation system had a Precision of 70% for K = 10. Then, the number of relevant items recommended in a list of top 10 items would be 7. The following shows the formula to calculate Precision@K .
MAP(@K) is an evaluation metric that shows a ranked list of recommended items. The recommender is rewarded for showing the top recommended items first. It measures the average number of relevant items suggested among the top K recommended items. Therefore, MAP@K is the average Precision@K.
We’ll go over Recall@K before going over MAR@K.
Recall@K is the proportion of relevant items present in the top K recommendations. Suppose, Recall@K was 60% for K = 10. This would mean that 60% of relevant items can be found in the top 10 recommendations. Although this may seem similar to Precision@K, it’s not. Precision@K focuses on how many of the items recommended are relevant, while recall looks at all the relevant items and sees how many relevant items were selected.
MAR@K is the average Recall@K and shows a ranked list just like MAP@K, but unlike MAP@K, it shows the average of the top K relevant items recommended from all the relevant items.
The following describes two predictive metrics for recommendation systems.
This evaluation metric measures the average errors in the predictions. It calculates the average of the absolute difference between the actual and predicted ratings. Its formula is given as follows.
Root mean squared error is the average value of the squared difference between the actual and predicted value. This metric is commonly used for continuous rating values. We can calculate RMSE using the formula.
Coverage is the percent measure of the number of items a recommender suggests from the test set that are also present in the training set.
The following sections cover the personalization metrics.
Personalization is the dissimilarity between the user recommendations and is calculated by subtracting the cosine similarity from one. This metric evaluates the number of the same items the recommender can suggest to different users. We can calculate the personalization ratio by the cosine similarity from one. The
Intra-list similarity is the average cosine similarity of all items in the recommendation list for one user. To calculate the intra-list similarity for the model, we can calculate the cosine similarity for all users and then calculate the average.
We need to modify the training data by removing already popular items to improve the recommender system. Users don’t need recommendations for popular items; they can search for them independently. We can also scale the ratings assigned to each item by the user’s
Position bias: Items placed first in the ranked list of recommended items will be selected by a user, no matter their relevance. Thus, items with higher relevance but lower ranking will get a lower engagement.
Popularity bias: Popular items will always be ranked first. We don’t need to rank them because users can easily search for them. Thus, they need to be removed.
Degenerate feedback loop: This loop can result when the model learns from the feedback of user interactivity. The recommender system might suggest the same items repeatedly to the user, reducing user engagement with newer, undiscovered items.
Let’s test your knowledge of the things covered above.
Test your knowledge
Which evaluation metric measures the ratio of the relevant recommended items to all the possible recommended items?
Unlock your potential: Recommendation system series, all in one place!
If you've missed any part of the series, you can always go back and check out the previous Answers:
What is a recommendation system?
Understand the basic definition and workings of recommendation systems.
What are the types of recommendation systems?
Explore the different types of recommendation systems and how they function.
What is collaborative filtering?
Learn about collaborative filtering, a popular technique used in recommendation systems.
What is content-based filtering?
Discover how content-based filtering works to provide personalized recommendations.
What is a hybrid recommendation system?
Learn about hybrid systems that combine different recommendation approaches.
What are the evaluation metrics for recommendation systems?
Understand the key metrics used to evaluate the effectiveness of recommendation systems.
Free Resources