What are similarity and dissimilarity measures?

Overview

In specific data-mining applications such as clustering, it is essential to find how similar or dissimilar objects are to each other.

A similarity measure for two objects (i,j)(i,j) will return 1 if similar and 0 if dissimilar.

A dissimilarity measure works just opposite to how the similarity measure works, i.e., it returns 1 if dissimilar and 0 if similar.

Similarity and dissimilarity measures help remove the outliers. Their use quickly eliminates redundant data since they help identify potential outliers as highly dissimilar objects to others.

The measure of similarity and dissimilarity is referred to as proximity.

The measure of similarity can often be measured as a function of a measure of dissimilarity.

Similarity and dissimilarity measures can be calculated as:

dis(i,j)=1(m/p)=pm/pdis (i,j)= 1-(m/p)=p-m/p

sim(i,j)=1dis(i,j)=m/psim(i,j)=1-dis(i,j) = m/p

  • i,ji,j are row and column values of the dissimilarity matrix.
  • mm is several matches for which i,ji,j are in the same state.
  • pp is a total number of attributes.

Dissimilarity matrix

A dissimilarity matrix stores a collection of proximities that are available for all pairs of nn-objects.

In a dissimilarity matrix, d(i,j)d(i,j)is measured as dissimilarity or difference between ii and jj.

Dissimilarity matrix

Example

Let’s look at an example and try to find similarity and dissimilarity measures.

Obj Id Grade Progress Numeric
1 A Excellent 45
2 B Fair 22
3 C Good 64
4 A Excellent 28

While constructing a dissimilarity matrix, we give the value of 1 for dissimilar objects and 0 for similar things. For a similarity matrix, it is vice-versa.

The proximity measure for the grade attribute is calculated below.

Calculating proximity measures

The dissimilarity matrix values are calculated as shown below:

dis(2,1)=(A,B)=1dis(2,1)=(A,B) =1

dis(3,1)=(C,A)=1dis(3,1)=(C,A) =1

dis(3,2)=(A,B)=1dis(3,2)=(A,B) =1

dis(4,1)=(A,A)=0dis(4,1)=(A,A) =0

dis(4,2)=(A,B)=1dis(4,2)=(A,B) =1

dis(4,3)=(A,C)=1dis(4,3)=(A,C) =1

The similarity matrix values for this are shown below:

sim(2,1)=1dis(2,1)=0sim(2,1)=1-dis(2,1) =0

sim(3,1)=1dis(3,1)=0sim(3,1)=1-dis(3,1)=0

sim(3,2)=1dis(3,2)=0sim(3,2)=1-dis(3,2) =0

sim(4,1)=1dis(4,1)=1sim(4,1)=1-dis(4,1) =1

sim(4,2)=1dis(4,2)=0sim(4,2)=1-dis(4,2) =0

sim(4,3)=1dis(4,3)=0sim(4,3)=1-dis(4,3) =0

The matrices from the example problem are given below:

Proximity measures

Free Resources