Introduction to autoencoders using PyTorch

Autoencoders are fundamental in the world of generative AI. They’re neural networks used for various tasks, and they all start with the same basic idea that is encoding information in a compact form and then decoding it to reproduce the original input. In this Answer, we’ll dive into the world of autoencoders, exploring their components, building one, and exploring various applications.

Architecture of an Autoencoders
Architecture of an Autoencoders

Components

Autoencoders have two main parts, which is the encoder and the decoder. These parts work together to do something interesting, which is figuring out a special way to describe data, and this can be used for lots of cool things in generative AI.

Encoders

The encoder is a vital component within autoencoders. Its main role is to simplify complex input data by transforming it into a more compact, typically a lower-dimensional representation. It achieves this by using multiple layers with activation functions like ReLU to identify and emphasize essential patterns in the data.

Note: The ReLU (Rectified Linear Unit) is an activation function that transforms an input value to be zero if it’s negative or leaves it unchanged if it's positive.

ReLU activation function
ReLU activation function

Within the encoder’s architecture, it includes an initial input layer, followed by several hidden layers, and culminating in an output layer. This collaborative arrangement effectively produces a streamlined representation of the input data. This condensed form retains vital information, rendering it valuable for diverse applications, including data compression, noise reduction, and feature extraction.

import torch.nn.functional as F
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(Encoder, self).__init__()
self.input_layer = nn.Linear(input_dim, hidden_dim)
self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
self.output_layer = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = F.relu(self.input_layer(x))
x = F.relu(self.hidden_layer(x))
x = F.relu(self.output_layer(x))
return x
Unlocking data simplification with the Encoder

Decoders

The decoder serves as a crucial counterpart to the Encoder. Its primary function is to reverse the process initiated by the encoder. While the encoder transforms complex input data into a compact representation, the decoder reconstructs this representation back into a form resembling the original input. This process is instrumental in restoring vital details and patterns.

The decoder’s architecture mirrors that of the encoder but in reverse. It typically comprises an initial layer followed by one or more hidden layers, concluding with an output layer. This collaborative structure works harmoniously to recreate the initial input data from the encoded representation. This reconstruction not only reclaims essential information but also offers valuable applications, which include image generation, data restoration, and more.

import torch.nn.functional as F
import torch.nn as nn
class Decoder(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(Decoder, self).__init__()
self.input_layer = nn.Linear(input_dim, hidden_dim)
self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
self.output_layer = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = F.relu(self.input_layer(x))
x = F.relu(self.hidden_layer(x))
x = F.relu(self.output_layer(x))
return x
Decoder unveiling data's hidden complexity

Building an autoencoder

To construct the autoencoder, we can seamlessly combine the two components we’ve created: the encoder and the decoder. In this process, we’ll establish a bottleneck layerThe latent space in autoencoders, often referred to as the "bottleneck" layer, is a low-dimensional representation of the input data that captures its essential features. where the output of the encoder serves as the input to the decoder. This clever arrangement enables the model to learn and recreate the essential features of the input data.

Let’s put our components into action and assemble the autoencoder model.

import torch
import torch.nn.functional as F
import torch.nn as nn
from torchsummary import summary
# Define the Encoder class
class Encoder(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(Encoder, self).__init__()
self.input_layer = nn.Linear(input_dim, hidden_dim)
self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
self.output_layer = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = F.relu(self.input_layer(x))
x = F.relu(self.hidden_layer(x))
x = F.relu(self.output_layer(x))
return x
# Define the Decoder class
class Decoder(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(Decoder, self).__init__()
self.input_layer = nn.Linear(input_dim, hidden_dim)
self.hidden_layer = nn.Linear(hidden_dim, hidden_dim)
self.output_layer = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = F.relu(self.input_layer(x))
x = F.relu(self.hidden_layer(x))
x = F.relu(self.output_layer(x))
return x # No need for reshape
# Define the Autoencoder class
class Autoencoder(nn.Module):
def __init__(self, encoder_dim, decoder_dim):
super(Autoencoder, self).__init__()
self.encoder = Encoder(*encoder_dim)
self.decoder = Decoder(*decoder_dim)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
# Define the dimensions for the encoder and decoder
input_dim = 64
hidden_dim = 32
output_dim = 64
encoder_dim = (input_dim, hidden_dim, output_dim)
decoder_dim = (output_dim, hidden_dim, input_dim)
# Create an instance of the Autoencoder model
model = Autoencoder(encoder_dim, decoder_dim)
batch_size = 1 # You can adjust the batch size as needed
input_shape = (batch_size, input_dim)
summary(model, input_shape)

Code explanation

  • Lines 1–4: Importing necessary modules and libraries for building and summarizing the neural network model.

  • Lines 7–18: Defining the Encoder class, representing the autoencoder’s encoding part. It uses nn.Linear for input, hidden, and output layers, with ReLU activation in the forward method.

  • Lines 21–32: Defining the Decoder class for the decoding part, mirroring the structure of the Encoder using nn.Linear and ReLU.

  • Lines 35–44: Defining the Autoencoder class, combining both Encoder and Decoder and passing input through them in the forward method.

  • Lines 47–51: Setting dimensions (input_dim, hidden_dim, output_dim) for Encoder and Decoder layers.

  • Line 54: Instantiating the Autoencoder model with specified dimensions.

  • Lines 56–67: Setting the batch_size to 1 and defining the input_shape as a tuple representing input sample shape, (1, 64).

  • Line 58: Using summary to generate a model summary displaying layer details, input/output shapes, and trainable parameters, providing insights into the model's architecture and size.

Applications

Let’s explore the diverse applications of autoencoders, demonstrating their versatility in various domains.

  • Autoencoders can be used to transfer the style of one image onto another, creating unique artistic effects.

  • In the audio domain, autoencoders can generate new sound samples or denoise existing ones.

  • Autoencoders can help colorize black and white images by predicting color information based on the provided grayscale input.

  • Autoencoders can compress data while retaining critical information, making them useful for efficient storage and transmission.

Unlock your potential: Autoencoders series, all in one place!

To deepen your understanding of Autoencoders, explore our series of Answers below:

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved