CNN (Convolutional Neural Network) is an artificial neural network used to process images by extracting features and image classification.
The two different properties of CNN are:
The prefix “in” in invariant means that there is no variance at all. So, when we say that CNN is invariant to translation, it means that if we move an object in our input image, it will still be detected, and we can still tell which class that input belongs to. And the system should output the same result regardless of the object’s position in the image.
For example, if we want to detect the position of the object apple in an image, we have to identify the apple based on its features, such as the arrangement of its key characteristics (like the presence of eyes, stem, or specific patterns). We don’t need to know its exact location but we can infer its position from these distinctive features.
We train our model on an input image where the object is in the middle, and when we give CNN the image in which the object is not in the middle, it will still detect that object. The output remains the same no matter how we translate the image.
A function is known to be invariant to function if
Here,
This equation explains that if we apply translation on our input image and then pass it through our model , the output doesn’t change upon applying .
The prefix “equi” in equivariant means “equivalent” or “varying in a similar proportion.” Equivariant to translation is another fundamental property of CNN. The CNN does not require a fixed position of an object to recognize it in an image; this means that if the input changes, the output changes accordingly. The property of translational equivariance is accomplished in CNN by the idea of weight sharing. Equivariance implies that the system functions admirably across positions, yet its output shifts with the place of the target.
This property is helpful if we have multiple instances of an object in the input image. It will detect all the instances of the object in that image.
For example, if we have multiple instances of an apple in the input image, where
Our model will detect all the instances of the apple as they all are equivariant to translation.
A function is known to be equivariant to a function if
Here,
This formula explains that if we apply translation on the image and then pass it through our model , the results will be the same if we pass the image to our model first and then apply the translation function .
Free Resources