How to calculate dissimilarity matrix for mixed types attributes

Dissimilarity measures for mixed attribute types are crucial in data analysis, allowing the quantification of differences between data points, regardless of their attribute types. Calculating dissimilarity matrices is essential for tasks like clustering and classification, accurately measuring proximity between different types of objects, and revealing valuable data patterns and structures. These measures and matrices collectively form the foundation for uncovering insights, making informed decisions, and enhancing overall data comprehension in the analysis process.

Example

Suppose we have a table with five products, each assigned one of three priorities: Urgent (assigned the ordinal value of 3), High Priority (assigned the ordinal value of 2), and Low Priority (assigned the ordinal value of 1). This table also includes their values in numeric forms. The table is as follows:

Here’s what the formula represents:

$d(A, B)$ : Manhattan distance between points A and B.
$n$ : The number of dimensions (attributes) in the data.
$A_i$ and $B_i$ : The values of the ith attribute for points A and B, respectively.

According to the Manhattan distance formula, the distance between objects is as follows:

Distance between Object 2 and 1: $|93 - 45| = 48$
Distance between Object 3 and 1: $|65 - 45| = 20$
Distance between Object 3 and 2: $|65 - 93| = 28$

The matrix is as follows:

Objects	Calculation
Object 2 and 1	((11) + (11) + (1*0.68)) / 3 = 0.89
Object 3 and 1	((11) + (10.5) + (1*0.29)) / 3 = 0.60
Object 3 and 2	((10) + (10.5) + (1*0.40)) / 3 = 0.30
Object 4 and 1	((11) + (10.5) + (1*0.41)) / 3 = 0.64
Object 4 and 2	((11) + (10.5) + (1*0.27)) / 3 = 0.59
Object 4 and 3	((11) + (10) + (1*0.13)) / 3 = 0.38
Object 5 and 1	((10) + (10) + (1*0.31)) / 3 = 0.10
Object 5 and 2	((11) + (11) + (1*1)) / 3 = 1.00
Object 5 and 3	((11) + (10.5) + (1*0.60)) / 3 = 0.70
Object 5 and 4	((11) + (10.5) + (1*0.73)) / 3 = 0.74

As a result, we can say that:

Object 1 is highly similar to Object 5 with a dissimilarity score of 0.10.
Object 5 is highly dissimilar to Object 2 with a dissimilarity score of 1.00.
Object 3 is moderately similar to Object 1 with a dissimilarity score of 0.60.

Conclusion

In the context of dissimilarity for attributes of mixed types, extracting meaningful patterns involves quantifying differences between diverse data types such as numerical and categorical variables. This dissimilarity measure proves valuable in tasks like clustering heterogeneous datasets, facilitating effective feature selection, and enhancing the performance of machine learning models by capturing the nuanced relationships within multifaceted attribute sets.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Object Identifier	Test I (Nominal)	Test II (Ordinal)	Test III (Numeric)
1	Product A	Low Priority	45
2	Product B	Urgent	93
3	Product B	High Priority	65
4	Product C	High Priority	74
5	Product A	Low Priority	23

Object Identifier	Test II Priorities	Test II Normalized Values
1	1	0
2	3	1
3	2	0.5
4	2	0.5
5	1	0

How to calculate dissimilarity matrix for mixed types attributes

Example

Step 1: Find the dissimilarity matrix individually

Step 2: Combining attributes

Conclusion