Introduction to autoencoders using PyTorch

Components

Autoencoders have two main parts, which is the encoder and the decoder. These parts work together to do something interesting, which is figuring out a special way to describe data, and this can be used for lots of cool things in generative AI.

Encoders

The encoder is a vital component within autoencoders. Its main role is to simplify complex input data by transforming it into a more compact, typically a lower-dimensional representation. It achieves this by using multiple layers with activation functions like ReLU to identify and emphasize essential patterns in the data.

Note: The ReLU (Rectified Linear Unit) is an activation function that transforms an input value to be zero if it’s negative or leaves it unchanged if it's positive.

import torch.nn.functional as F
import torch.nn as nn
class Encoder(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(Encoder, self).__init__()
        self.input_layer = nn.Linear(input_dim, hidden_dim)
        self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
        self.output_layer = nn.Linear(hidden_dim, output_dim)
    def forward(self, x):
        x = F.relu(self.input_layer(x))
        x = F.relu(self.hidden_layer(x))
        x = F.relu(self.output_layer(x))
        return x

Unlocking data simplification with the Encoder

Decoders

The decoder serves as a crucial counterpart to the Encoder. Its primary function is to reverse the process initiated by the encoder. While the encoder transforms complex input data into a compact representation, the decoder reconstructs this representation back into a form resembling the original input. This process is instrumental in restoring vital details and patterns.

The decoder’s architecture mirrors that of the encoder but in reverse. It typically comprises an initial layer followed by one or more hidden layers, concluding with an output layer. This collaborative structure works harmoniously to recreate the initial input data from the encoded representation. This reconstruction not only reclaims essential information but also offers valuable applications, which include image generation, data restoration, and more.

import torch.nn.functional as F
import torch.nn as nn
class Decoder(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(Decoder, self).__init__()
        self.input_layer = nn.Linear(input_dim, hidden_dim)
        self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
        self.output_layer = nn.Linear(hidden_dim, output_dim)
    def forward(self, x):
        x = F.relu(self.input_layer(x))
        x = F.relu(self.hidden_layer(x))
        x = F.relu(self.output_layer(x))
        return x

Decoder unveiling data's hidden complexity

Building an autoencoder

To construct the autoencoder, we can seamlessly combine the two components we’ve created: the encoder and the decoder. In this process, we’ll establish a bottleneck layerThe latent space in autoencoders, often referred to as the "bottleneck" layer, is a low-dimensional representation of the input data that captures its essential features. where the output of the encoder serves as the input to the decoder. This clever arrangement enables the model to learn and recreate the essential features of the input data.

Let’s put our components into action and assemble the autoencoder model.

import torch
import torch.nn.functional as F
import torch.nn as nn
from torchsummary import summary
# Define the Encoder class
class Encoder(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(Encoder, self).__init__()
        self.input_layer = nn.Linear(input_dim, hidden_dim)
        self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
        self.output_layer = nn.Linear(hidden_dim, output_dim)
    def forward(self, x):
        x = F.relu(self.input_layer(x))
        x = F.relu(self.hidden_layer(x))
        x = F.relu(self.output_layer(x))
        return x
# Define the Decoder class
class Decoder(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(Decoder, self).__init__()
        self.input_layer = nn.Linear(input_dim, hidden_dim)
        self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
        self.output_layer = nn.Linear(hidden_dim, output_dim)
    def forward(self, x):
        x = F.relu(self.input_layer(x))
        x = F.relu(self.hidden_layer(x))
        x = F.relu(self.output_layer(x))
        return x  # No need for reshape
# Define the Autoencoder class
class Autoencoder(nn.Module):
    def __init__(self, encoder_dim, decoder_dim):
        super(Autoencoder, self).__init__()
        self.encoder = Encoder(*encoder_dim)
        self.decoder = Decoder(*decoder_dim)
  
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x
# Define the dimensions for the encoder and decoder
input_dim = 64
hidden_dim = 32
output_dim = 64
encoder_dim = (input_dim, hidden_dim, output_dim)
decoder_dim = (output_dim, hidden_dim, input_dim)
# Create an instance of the Autoencoder model
model = Autoencoder(encoder_dim, decoder_dim)
batch_size = 1  # You can adjust the batch size as needed
input_shape = (batch_size, input_dim)
summary(model, input_shape)

Code explanation

Lines 1–4: Importing necessary modules and libraries for building and summarizing the neural network model.
Lines 7–18: Defining the Encoder class, representing the autoencoder’s encoding part. It uses nn.Linear for input, hidden, and output layers, with ReLU activation in the forward method.
Lines 21–32: Defining the Decoder class for the decoding part, mirroring the structure of the Encoder using nn.Linear and ReLU.
Lines 35–44: Defining the Autoencoder class, combining both Encoder and Decoder and passing input through them in the forward method.
Lines 47–51: Setting dimensions (input_dim, hidden_dim, output_dim) for Encoder and Decoder layers.
Line 54: Instantiating the Autoencoder model with specified dimensions.
Lines 56–67: Setting the batch_size to 1 and defining the input_shape as a tuple representing input sample shape, (1, 64).
Line 58: Using summary to generate a model summary displaying layer details, input/output shapes, and trainable parameters, providing insights into the model's architecture and size.

Applications

Let’s explore the diverse applications of autoencoders, demonstrating their versatility in various domains.

Autoencoders can be used to transfer the style of one image onto another, creating unique artistic effects.
In the audio domain, autoencoders can generate new sound samples or denoise existing ones.
Autoencoders can help colorize black and white images by predicting color information based on the provided grayscale input.
Autoencoders can compress data while retaining critical information, making them useful for efficient storage and transmission.

Unlock your potential: Autoencoders series, all in one place!

To deepen your understanding of Autoencoders, explore our series of Answers below:

Introduction to autoencoders using PyTorch
Learn the fundamentals of autoencoders and how to implement them using PyTorch for unsupervised learning tasks.
Anomaly detection with autoencoders
Discover how autoencoders can identify anomalies by learning normal data patterns and flagging deviations.
Image denoising using an autoencoder
Explore how autoencoders can remove noise from images by learning to reconstruct clean versions from noisy inputs.
Image reconstruction with autoencoders
Understand how autoencoders compress and reconstruct images, preserving key features while reducing dimensionality.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources