Evaluation metrics are quantitative measures of machine learning models' performance. They are essential to determining whether our model is performing well or poorly for specific tasks.
METEOR (Metric for Evaluation of Translation with Explicit Ordering) is a metric used to measure the quality of candidate text based on the
Following are the steps to calculate the METEOR score:
Calculate the unigram precision and recall.
Compute the F-score.
Compute chunk penalty.
Calculate the METEOR score.
We calculate the
The unigram recall is calculated as the ratio between the overlapping unigrams between the candidate and reference summary and the total number of unigrams in the reference summary.
After calculating the unigram precision and recall, we compute the weighted F-score by taking their harmonic mean, with precision being weighted higher than recall.
where,
P: Unigram precision
R: Unigram recall
Note: The precision is weighted higher than the recall so that the candidate summary is more precised in the meaning then the word-to-word matches.
A chunk is a set of consecutive words appearing in the sentence. The precision, recall, and
Where,
What would be the chunk size in case of candidate summary is exactly similar to the reference summary?
After computing the F-score and chunk penalty, we are now ready to calculate the METEOR score.
METEOR scores are given on a scale of 0 to 1, with higher values indicating greater similarity between the candidate and the reference summary.
Now, let’s see how to calculate the METEOR score using Python.
import nltk nltk.download('wordnet') reference_summary = [['Machine', 'learning', 'is', 'a', 'subset', 'of', 'artificial', 'intelligence']] candidate_summary = ['Machine', 'learning', 'is', 'seen', 'as', 'a', 'subset', 'of', 'artificial', 'intelligence'] METEORscore = nltk.translate.meteor_score.meteor_score(reference_summary, candidate_summary) print(METEORscore)
Let’s get the insight of the above code.
Line 1: We import the nltk
library, which is used widely in the field of NLP.
Line 2: We download the wordnet
corpus reader from the nltk
library.
Line 4: We define a list named reference_summary
and set “Machine learning is a subset of artificial intelligence” as a reference summary.
Line 5: We define a candidate_summary
variable and set its value to “Machine learning is seen as a subset of artificial intelligence."
Line 7: We use the meteor_score()
function from the nltk.translate.meteor_score
to calculate the METEOR score.
Line 8: We print the METEOR score for the provided candidate summary.
Free Resources