Difference between invariant and equivariant to translation

CNN (Convolutional Neural Network) is an artificial neural network used to process images by extracting features and image classification.

The two different properties of CNN are:

Invariant to Translation
Equivariant to Translation

Invariant to translation

The prefix “in” in invariant means that there is no variance at all. So, when we say that CNN is invariant to translation, it means that if we move an object in our input image, it will still be detected, and we can still tell which class that input belongs to. And the system should output the same result regardless of the object’s position in the image.

For example, if we want to detect the position of the object apple in an image, we have to identify the apple based on its features, such as the arrangement of its key characteristics (like the presence of eyes, stem, or specific patterns). We don’t need to know its exact location but we can infer its position from these distinctive features.

We train our model on an input image where the object is in the middle, and when we give CNN the image in which the object is not in the middle, it will still detect that object. The output remains the same no matter how we translate the image.

In mathematical terms

A function $f$ is known to be invariant to function $t$ if

$f(t(I))=f(I)$

Here,

$t$ is the translation group
$f$ is the function or feature
$I$ is the image

This equation explains that if we apply translation $t$ on our input image $t(I)$ and then pass it through our model $f(t(I))$ , the output doesn’t change upon applying $t$ .

Equivariant to translation

The prefix “equi” in equivariant means “equivalent” or “varying in a similar proportion.” Equivariant to translation is another fundamental property of CNN. The CNN does not require a fixed position of an object to recognize it in an image; this means that if the input changes, the output changes accordingly. The property of translational equivariance is accomplished in CNN by the idea of weight sharing. Equivariance implies that the system functions admirably across positions, yet its output shifts with the place of the target.

This property is helpful if we have multiple instances of an object in the input image. It will detect all the instances of the object in that image.

For example, if we have multiple instances of an apple in the input image, where

one instance can be where the apple is upright,
and the other instance can be where the apple is upside down.

Our model will detect all the instances of the apple as they all are equivariant to translation.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources