How to calculate dissimilarity matrix for mixed types attributes

Dissimilarity measures for mixed attribute types are crucial in data analysis, allowing the quantification of differences between data points, regardless of their attribute types. Calculating dissimilarity matrices is essential for tasks like clustering and classification, accurately measuring proximity between different types of objects, and revealing valuable data patterns and structures. These measures and matrices collectively form the foundation for uncovering insights, making informed decisions, and enhancing overall data comprehension in the analysis process.

Example

Suppose we have a table with five products, each assigned one of three priorities: Urgent (assigned the ordinal value of 3), High Priority (assigned the ordinal value of 2), and Low Priority (assigned the ordinal value of 1). This table also includes their values in numeric forms. The table is as follows:

Object Identifier

Test I

 (Nominal)


Test II

 (Ordinal)

Test III

 (Numeric)


1

Product A

Low Priority

45

2

Product B

Urgent

93

3

Product B

High Priority

65

4

Product C

High Priority

74

5

Product A

Low Priority

23

The steps to find the proximity measure for mixed attributes are as follows:

Step 1: Find the dissimilarity matrix individually

  • For nominal attributes:

In the example above, we have five objects. The formula to calculate the dissimilarity measure for nominal attributes is as follows:

where,

  • pp is the total number of attributes. In our example, p=1p = 1.

  • mm is the total number of matches between two objects.

Let’s find the dissimilarity between two objects. For Objects 2 and 3, mm= 1:

The dissimilarity matrix for nominal attributes is as follows:

Dissimilarity matrix for nominal attributes
Dissimilarity matrix for nominal attributes

  • For ordinal attributes:

We have already assigned priorities to each data point. Next, normalize these priorities to fall from 0.0 to 1.0. We can map priorities with the help of the following formula:

where, Xmax=3X_{max} = 3, Xmin=1X_{min} = 1, and XX is the ordinal value.

Let’s check how the updated normalized table for this example will look like:

Object

Identifier

Test II

Priorities

Test II

Normalized Values

1

1

0

2

3

1

3

2

0.5

4

2

0.5

5

1

0

With the normalized ranks, let’s calculate the dissimilarity between pairs of data points using the Euclidean distance formula. The Euclidean distance between two points (x1)(x1) and (x2)(x2) in 1D space is given by:

In our case:

  • Distance between Object 1 and 2 is 01=1|0 - 1| = 1.

The dissimilarity matrix for ordinal attributes is as follows:

Dissimilarity matrix for ordinal attributes
Dissimilarity matrix for ordinal attributes

  • For numeric attributes:

Numeric attributes are variables with numerical values. We need to normalize these values to ensure fair comparison by adjusting them to a standard scale. To find dissimilarity, the Manhattan distance, commonly used for numeric attributes, operates more effectively when attributes are normalized. This normalization mitigates scale discrepancies, leading to more accurate distance computations.

The formula for the Manhattan distance is as follows:

Here’s what the formula represents:

  • d(A,B)d(A, B): Manhattan distance between points A and B.

  • nn: The number of dimensions (attributes) in the data.

  • AiA_i and BiB_i: The values of the ith attribute for points A and B, respectively.

According to the Manhattan distance formula, the distance between objects is as follows:

  • Distance between Object 2 and 1:9345=48|93 - 45| = 48

  • Distance between Object 3 and 1: 6545=20|65 - 45| = 20

  • Distance between Object 3 and 2: 6593=28|65 - 93| = 28

The matrix is as follows:

In our case min=0min = 0, max=70max = 70. So, divide numbers by 70 for normalization.

Dissimilarity matrix for numeric attributes
Dissimilarity matrix for numeric attributes

Step 2: Combining attributes

Now, combines the different attributes into a single dissimilarity matrix. The dissimilarity d(x,y)d(x,y) between objects xx and yy is defined as:

where δxyf=0δ^f_{xy} = 0, if

  • axfa_{xf} or ayfa_{yf} is missing where axfa_{xf} denotes the value of the attribute ff for object xx and axfa_{xf} denotes the value of the attribute ff for object yy.

  • axfa_{xf} == ayfa_{yf} =0= 0, and attribute f is an asymmetric binaryAsymmetric binary means that when we’re deciding if something is true for two things, it might be true for one but not the other, making the relationship unequal between them..

Otherwise, δxyf=1δ^f_{xy} = 1.

As we can see, there is no missing value and also no asymmetric binary attribute, so δxyf=1δ^f_{xy} = 1 for all data points. And p=3p=3 as there are 3 attributes.

Now, apply the formula.

Objects

Calculation

Object 2 and 1

((1*1) + (1*1) + (1*0.68)) / 3 = 0.89

Object 3 and 1

((1*1) + (1*0.5) + (1*0.29)) / 3 = 0.60

Object 3 and 2

((1*0) + (1*0.5) + (1*0.40)) / 3 = 0.30

Object 4 and 1

((1*1) + (1*0.5) + (1*0.41)) / 3 = 0.64

Object 4 and 2

((1*1) + (1*0.5) + (1*0.27)) / 3 = 0.59

Object 4 and 3

((1*1) + (1*0) + (1*0.13)) / 3 = 0.38

Object 5 and 1

((1*0) + (1*0) + (1*0.31)) / 3 = 0.10

Object 5 and 2

((1*1) + (1*1) + (1*1)) / 3 = 1.00

Object 5 and 3

((1*1) + (1*0.5) + (1*0.60)) / 3 = 0.70

Object 5 and 4

((1*1) + (1*0.5) + (1*0.73)) / 3 = 0.74

The final dissimilarity matrix is as follows:

Dissimilarity matrix for mixed attributes
Dissimilarity matrix for mixed attributes

As a result, we can say that:

  • Object 1 is highly similar to Object 5 with a dissimilarity score of 0.10.

  • Object 5 is highly dissimilar to Object 2 with a dissimilarity score of 1.00.

  • Object 3 is moderately similar to Object 1 with a dissimilarity score of 0.60.

Conclusion

In the context of dissimilarity for attributes of mixed types, extracting meaningful patterns involves quantifying differences between diverse data types such as numerical and categorical variables. This dissimilarity measure proves valuable in tasks like clustering heterogeneous datasets, facilitating effective feature selection, and enhancing the performance of machine learning models by capturing the nuanced relationships within multifaceted attribute sets.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved