What is Agglomerative clustering ?

In Unsupervised Machine Learning, many clustering algorithms are used to group objects for analysis and finding patterns. One commonly known technique is Agglomerative Clustering, where objects that are close to each other are placed in one group. In the beginning, all objects are single clusters (leaves) and the algorithm keeps on clustering objects until a single cluster (root) remains. Clustering forms a tree like structure called a dendrogram.

Agglomerative cluster is a common type of Hierarchical Clustering that is also called Agglomerative Nesting (AGNES). It follows bottom up approach while clustering objects.

Firstly, import all the required libraries. Then, generate a 2D array of all the data points with their coordinates array. After you initialize the Agglomerative Clustering model, call the fit method on it. Lastly, plot the dendrogram to see the clustering results.

The Agglomerative function takes distance threshold and n_clusters as parameters.

distance threshold`: It is the linkage distance threshold above which clusters will not be merged, and it shows the limit at which to cut the dendrogram tree.

n_clusters: It shows the number of clusters to find.

For more parameters details, follow the link.

# import libraries
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram
#  generate coordinates array for six samples 
X = np.array([[1.3, 4.8], [2.3, 5.5], [3.6, 1.3], [6.1, 5.1], [6.2, 2.5], [6.7, 3.4]])
# instantiate Agglomerative Clustering instance
clustering_model = AgglomerativeClustering(distance_threshold=0, n_clusters=None)
# call fit method with array of sample coordinates passed as a parameter
trained_model = clustering_model.fit(X)
# A method for generating dendrogram
def plot_dendrogram(model, **kwargs):
    # Create linkage matrix and then plot the dendrogram
    # create the counts of samples under each node
    counts = np.zeros(model.children_.shape[0])
    n_samples = len(model.labels_)
    for i, merge in enumerate(model.children_):
        current_count = 0
        for child_idx in merge:
            if child_idx < n_samples:
                current_count += 1  # leaf node
            else:
                current_count += counts[child_idx - n_samples]
        counts[i] = current_count
    linkage_matrix = np.column_stack([model.children_, model.distances_, counts]).astype(float)
    # Plot the corresponding dendrogram
    dendrogram(linkage_matrix, **kwargs)
# plot dendrogram to visualize clusters
plot_dendrogram(trained_model)

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

What is Agglomerative clustering ?

Algorithm steps

Method representation

Distance measurement

Agglomerative clustering with Scikit-Learn