In specific data-mining applications such as clustering, it is essential to find how similar or dissimilar objects are to each other.
A similarity measure for two objects will return 1
if similar and 0
if dissimilar.
A dissimilarity measure works just opposite to how the similarity measure works, i.e., it returns 1
if dissimilar and 0
if similar.
Similarity and dissimilarity measures help remove the outliers. Their use quickly eliminates redundant data since they help identify potential outliers as highly dissimilar objects to others.
The measure of similarity and dissimilarity is referred to as proximity.
The measure of similarity can often be measured as a function of a measure of dissimilarity.
Similarity and dissimilarity measures can be calculated as:
A dissimilarity matrix stores a collection of proximities that are available for all pairs of
In a dissimilarity matrix,
Let’s look at an example and try to find similarity and dissimilarity measures.
Obj Id | Grade | Progress | Numeric |
---|---|---|---|
1 | A | Excellent | 45 |
2 | B | Fair | 22 |
3 | C | Good | 64 |
4 | A | Excellent | 28 |
While constructing a dissimilarity matrix, we give the value of 1
for dissimilar objects and 0
for similar things.
For a similarity matrix, it is vice-versa.
The proximity measure for the grade attribute is calculated below.
The dissimilarity matrix values are calculated as shown below:
The similarity matrix values for this are shown below:
The matrices from the example problem are given below: