What is the curse of dimensionality and how to overcome it?

Problem diagnosis

Clustering machine learning algorithms like k-NN and others work based on the inductive bias that points that are close to each other in the feature space have the same class label.

This measure of proximity can be mapped onto real-life examples. However, when data spreads onto higher dimensions, a fundamental issue with this approach arises. An increasing number of dimensions in the feature space implies that seemingly close data instances are in fact themselves quite different and far from one another. Since k-NN uses a definite approach, which takes into account the kk nearest data instances, the algorithm must always return some relatively close points. However, when we consider a greater number of features for these data points, we find that the relatively closer data points are distant from the test instance despite being relatively near when compared with the rest of the test instances. This problem of clustering data spread over many dimensions is called the curse of dimensionality.

The depiction of this problem can be visualized from the illustration above. The points clustered together when k=3k=3 in 3 dimensions are definitely close together on a relative scale when compared to every other point in the dataset. However, their nearness is not that great when the points are viewed in isolation.

This erodes the inductive bias of k-NN itself, bringing the suitability of the algorithm into question.

Comparative analysis

Let our initial hypothesis be that the data points can be sampled uniformly within a container in the hyperspace with dd dimensions of unit length. Moreover, within this hyperspace, let there be a smaller subspace that encapsulates the k-nearest points from an objective test point. Let the length of each of the subspace dimensions be xx, which is also the minimum length required to encapsulate the maximally distant nearest neighbor.

Since the training instances are sampled uniformly, the ratio of the volumes of the hyperspace to the subspace would approximately be the same as the ratio of the total number of points nn to kk.

xxcan be approximated using the following relationship:

Assuming k=7k = 7 and n=1000n = 1000, while d=3d=3, we can approximatexxto be 0.191.

When we increase the number of dimensions from 3 to 100 for the same values of kk and nn, we can approximatexxto be 0.974.

From this, we can deduce the following. An increase in the number of dimensions shoots the distances of the k-nearest neighbors so much that it cannot be placed as a reasonable subject to k-NN's inductive bias of proximity.

Overcoming the curse of dimensionality

Real-world data sets have a tremendous number of dimensions. Classical problems such as image classification or email filters are bound by multiple supplementing features, drawing us into the curse of dimensionality almost every time.

We can address this by resolving our dimensions so that only the very useful and deterministic features are used. This way, we can have spaces with low intrinsic dimensionalityThe actual number of features that can be used to classify data points efficiently., which will not be subject to the curse of dimensionality.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved