What are the evaluation metrics for recommendation systems?

Recommendation systems are widely used on retail websites. They provide personalized recommendations to users based on certain factors like their search history, demographic information, and past purchases and, therefore, are an important component of these retail websites. Testing the performance of a recommendation system is necessary before it’s integrated with these systems. For this, we use certain performance metrics, which are discussed below.

Evaluation metrics for recommendation systems

Several evaluation metrics are used for testing the performance of a recommendation system. They are described as follows:

Accuracy metrics

The following contains all the accuracy metrics.

Mean Average Precision at K (MAP@K)

To understand MAP@K, we first need to go over Precision@K.

Precision@K

Precision@K is the proportion of relevant recommended items in a list of top K recommended items. Suppose the recommendation system had a Precision of 70% for K = 10. Then, the number of relevant items recommended in a list of top 10 items would be 7. The following shows the formula to calculate Precision@K .

MAP@K

MAP(@K) is an evaluation metric that shows a ranked list of recommended items. The recommender is rewarded for showing the top recommended items first. It measures the average number of relevant items suggested among the top K recommended items. Therefore, MAP@K is the average Precision@K.

Mean Average Recall at K (MAR@K)

We’ll go over Recall@K before going over MAR@K.

Recall@K

Recall@K is the proportion of relevant items present in the top K recommendations. Suppose, Recall@K was 60% for K = 10. This would mean that 60% of relevant items can be found in the top 10 recommendations. Although this may seem similar to Precision@K, it’s not. Precision@K focuses on how many of the items recommended are relevant, while recall looks at all the relevant items and sees how many relevant items were selected.

MAR@K

MAR@K is the average Recall@K and shows a ranked list just like MAP@K, but unlike MAP@K, it shows the average of the top K relevant items recommended from all the relevant items.

Predictive metrics

The following describes two predictive metrics for recommendation systems.

Mean Absolute Error (MAE)

This evaluation metric measures the average errors in the predictions. It calculates the average of the absolute difference between the actual and predicted ratings. Its formula is given as follows.

Recommendation-centric metrics

Coverage

Coverage is the percent measure of the number of items a recommender suggests from the test set that are also present in the training set.

Personalization metrics

The following sections cover the personalization metrics.

Personalization

Personalization is the dissimilarity between the user recommendations and is calculated by subtracting the cosine similarity from one. This metric evaluates the number of the same items the recommender can suggest to different users. We can calculate the personalization ratio by the cosine similarity from one. The cosine similarity matrixMeasure of the similarity between two non-zero matrices in an inner product space and the average of the upper triangle of this matrix must be determined before we calculate the personalization ratio. The higher the ratio, the more personalized suggestions a recommendation system gives.

Intra-list similarity

Intra-list similarity is the average cosine similarity of all items in the recommendation list for one user. To calculate the intra-list similarity for the model, we can calculate the cosine similarity for all users and then calculate the average.

Improving the performance of a recommendation system

We need to modify the training data by removing already popular items to improve the recommender system. Users don’t need recommendations for popular items; they can search for them independently. We can also scale the ratings assigned to each item by the user’s transaction valueTransaction value is the measure of the average amount spent on a purchase., increasing customer loyalty and the count of high-value customersHigh-value customers are those on which a business’s profitability depends.. We must also keep a few factors in mind while evaluating recommendation systems, which are stated as follows:

Position bias: Items placed first in the ranked list of recommended items will be selected by a user, no matter their relevance. Thus, items with higher relevance but lower ranking will get a lower engagement.
Popularity bias: Popular items will always be ranked first. We don’t need to rank them because users can easily search for them. Thus, they need to be removed.
Degenerate feedback loop: This loop can result when the model learns from the feedback of user interactivity. The recommender system might suggest the same items repeatedly to the user, reducing user engagement with newer, undiscovered items.

Let’s test your knowledge of the things covered above.

Unlock your potential: Recommendation system series, all in one place!

If you've missed any part of the series, you can always go back and check out the previous Answers:

What is a recommendation system?
Understand the basic definition and workings of recommendation systems.
What are the types of recommendation systems?
Explore the different types of recommendation systems and how they function.
What is collaborative filtering?
Learn about collaborative filtering, a popular technique used in recommendation systems.
What is content-based filtering?
Discover how content-based filtering works to provide personalized recommendations.
What is a hybrid recommendation system?
Learn about hybrid systems that combine different recommendation approaches.
What are the evaluation metrics for recommendation systems?
Understand the key metrics used to evaluate the effectiveness of recommendation systems.

Free Resources