Difference between invariant and equivariant to translation

CNN (Convolutional Neural Network) is an artificial neural network used to process images by extracting features and image classification.

The two different properties of CNN are:

  • Invariant to Translation
  • Equivariant to Translation

Invariant to translation

The prefix “in” in invariant means that there is no variance at all. So, when we say that CNN is invariant to translation, it means that if we move an object in our input image, it will still be detected, and we can still tell which class that input belongs to. And the system should output the same result regardless of the object’s position in the image.

For example, if we want to detect the position of the object apple in an image, we have to identify the apple based on its features, such as the arrangement of its key characteristics (like the presence of eyes, stem, or specific patterns). We don’t need to know its exact location but we can infer its position from these distinctive features.

Invariant to translation
Invariant to translation

We train our model on an input image where the object is in the middle, and when we give CNN the image in which the object is not in the middle, it will still detect that object. The output remains the same no matter how we translate the image.

In mathematical terms

A function ff is known to be invariant to function tt if

f(t(I))=f(I)f(t(I))=f(I)

Here,

  • tt is the translation group
  • ff is the function or feature
  • II is the image

This equation explains that if we apply translation tt on our input image t(I)t(I) and then pass it through our model f(t(I))f(t(I)), the output doesn’t change upon applying tt.

Equivariant to translation

The prefix “equi” in equivariant means “equivalent” or “varying in a similar proportion.” Equivariant to translation is another fundamental property of CNN. The CNN does not require a fixed position of an object to recognize it in an image; this means that if the input changes, the output changes accordingly. The property of translational equivariance is accomplished in CNN by the idea of weight sharing. Equivariance implies that the system functions admirably across positions, yet its output shifts with the place of the target.

This property is helpful if we have multiple instances of an object in the input image. It will detect all the instances of the object in that image.

For example, if we have multiple instances of an apple in the input image, where

  • one instance can be where the apple is upright,
  • and the other instance can be where the apple is upside down.

Our model will detect all the instances of the apple as they all are equivariant to translation.

Equivariant to translation
Equivariant to translation

In mathematical terms

A function ff is known to be equivariant to a function tt if

f(t(I))=t(f(I))f(t(I))=t(f(I))

Here,

  • tt is the translation function
  • ff is the function (model) or feature
  • II is the image

This formula explains that if we apply translation tt on the image II and then pass it through our model f(t(I))f(t(I)), the results will be the same if we pass the image to our model first f(I)f(I) and then apply the translation function t(I)t(I).

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved