How to create Dendrogram in Python

A dendrogram is essentially a tree diagram that is used to visualize the hierarchical relationships between similar entities. In Python, a dendrogram is created to illustrate the output of hierarchical clusters. Hierarchical clustering is an example of an unsupervised learning algorithm that assigns objects to different clusters based on similarities in a top-down fashion.

The resulting diagram contains groups or clusters different from each other, having multiple endpoints or leaves significantly similar to their counterparts within the same group. A real-world example of hierarchical clustering is the organization of files and folders in a computer hard drive which are stored in a hierarchy.

Dendrogram example

An example of hierarchical clustering is shown below. In the first image, different data points are represented on a plane while the second image illustrates the relevant clusters.

Hierarchical clustering of data points
Hierarchical clustering of data points

Can we have different dendrograms against the same dataset?

Yes! We can generate different dendrograms against the same dataset using various hierarchical clustering methods (e.g., single linkage, complete linkage) or different distance metrics (e.g., Euclidean, Manhattan).

Reading the Dendrogram

It is important to note that dendrograms describe the relationship between clusters and their relative instances. This is why, we can read the dendrogram by analyzing the respective height on which objects are grouped.

In the figure above, it is evident that the instances 3,43, 4, and 55 are closer to each other followed by 11 and 22, and so on. Thus the height of the link joining 3,4,53, 4, 5 is the smallest. The next comparable height is between the link 11 and 22 and vice versa.

Creating Dendrogram in Python

In the following code snapshot, a sample code is given to create a dendrogram using random data points in Python. For this purpose, the linkage() method of the cluster.hierachy package of scipy library is used.

import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
# Generate random coordinates
x = np.random.randint(1, 50, 10)
y = np.random.randint(1, 50, 10)
# Scatter plot of the randomly generated points
fig, ax = plt.subplots(dpi=800)
ax.scatter(x, y, c='red', marker='o')
ax.set_title('Scatter plot of randomly generated points')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
fig.savefig("output/scatter_plot.png")
plt.show() # Display the scatter plot
plt.close(fig)
# Prepare data for clustering
coord_points = list(zip(x, y))
clusters = linkage(coord_points, method='average', metric='euclidean')
# Plot dendrogram
fig, ax = plt.subplots(dpi=800)
dendrogram(clusters, ax=ax)
ax.set_title('Sample Dendrogram')
ax.set_xlabel('Points')
ax.set_ylabel('Euclidean distance')
# Save the dendrogram plot to a file
fig.savefig("output/dendrogram.png")
plt.close(fig)

Let’s understand the code above:

  • Line 1: We import numpy library to create random numbers which will act as points to perform hierarchical clustering.
  • Line 2: We import linkage and dendrogram methods from scipy.cluster.hierarchy.
    • The linkage function is used to perform hierarchical or agglomerative clustering.
    • The dendrogram function is used to visualize the hierarchical clustering encoded by the linkage matrix.
  • Line 3: We import matplotlib.pyplot to create scatter plot and dendrogram.
  • Lines 5–7: We create lists of 10 random numbers between 1 to 50 to act as coordinate points.
  • Lines 10–17: We generate a scatter plot of points generated above.
  • Line 20: We convert the lists of random integers as a list of (x,y) coordinates.
  • Line 21: We create clusters of (x, y) points using average distance measure in euclidean method.
  • Lines 24–28: We generate the resulting dendrogram.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved