Implementation of activation functions in PyTorch

An activation function in PyTorch is a fundamental element of a neural network, and like the on-off switch, it helps our network learn and make sense of data. Activation functions play a crucial role in neural networks because they introduce nonlinearity into the model. But what does that mean, and why is it important? Let’s explore it in a simple way.

Why do we use activation functions?

Here are the reasons why we use activation functions in the neural network.

  • Introducing nonlinearity: Neural networks aim to learn complex patterns and relationships in data. If we only used linear operations (like adding or multiplying), the whole network would be a combination of linear functions. In such a case, no matter how deep our network is, it could be represented by a single layer, severely limiting its learning capabilities. Activation functions add essential nonlinearity to our networks, allowing them to learn and represent more complex patterns and relationships.

  • Learning from errors: During training, neural networks adjust their parameters (weights and biases) to minimize errors, typically represented as a loss function. Activation functions play a role in understanding and reacting to these errors. Based on the input, they determine which neurons fire (activate) and which ones do not. This selective activation helps the network focus on certain features of the data and adjust accordingly.

If we don’t use activation functions, our neural network essentially becomes a linear model. Imagine trying to recognize intricate patterns in images or make sense of complex data without nonlinear activation functions—it would be like trying to paint a masterpiece with only black and white paint. Activation functions add the colors that allow our network to express the rich nuances in data, making it a powerful tool for various tasks like image recognition, natural language processing, and more.

Syntax

In PyTorch, implementing activation functions within a neural network involves a standard pattern that starts with the following:

  • We import the necessary libraries, including PyTorch and its neural network module nn.

  • We define the structure of the neural network by creating the layers and specifying the input and output sizes. This includes defining fully connected (linear) layers.

Note: We can choose the activation function we want to use (e.g., ReLU, sigmoid, tanh, and softmax) from the nn module or create custom activation functions.

  • We create a forward method that specifies how data flows through the network. In this method, we apply the linear transformation to the input data and then apply the activation function to the output of the linear layer.

import torch
import torch.nn as nn
fc1 = nn.Linear(input_size, output_size)
activation_function = nn.activation_function
def forward(x):
x = fc1(x)
x = activation_function(x)
Syntax to implement activation function

Implementation in PyTorch

In PyTorch, we can easily implement an activation function using built-in functions. Let’s take the ReLU (rectified linear unit) as an example. ReLU is one of the most commonly used activation functions.

import torch
import torch.nn as nn
# Define the neural network architecture
class SimpleNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size) # Fully connected layer 1
self.relu = nn.ReLU() # ReLU activation function
self.fc2 = nn.Linear(hidden_size, output_size) # Fully connected layer 2
self.sigmoid = nn.Sigmoid()
def forward(self, x):
out = self.fc1(x) # Apply the first fully connected layer
out = self.relu(out) # Apply the ReLU activation function
out = self.fc2(out) # Apply the second fully connected layer
out = self.sigmoid(out)
return out
# Define network parameters
input_size = 64 # Number of input features
hidden_size = 128 # Number of neurons in the hidden layer
output_size = 2 # Number of output classes
# Input data
input_data = torch.rand(32, input_size) # 32 is the batch size
target = torch.randint(0, 2, (32, output_size), dtype=torch.float32) # Random binary target values
# Create an instance of the SimpleNN model
model = SimpleNN(input_size, hidden_size, output_size)
# Define the loss function (Cross Entropy Loss) and optimizer (Adam)
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Example training loop
num_epochs = 10 # Define the number of training epochs
for epoch in range(num_epochs):
# Forward pass
outputs = model(input_data)
loss = criterion(outputs, target) # Compute the loss
# Backward pass and optimization
optimizer.zero_grad() # Clear gradients
loss.backward() # Backpropagate to compute gradients
optimizer.step() # Update the model parameters
# Print the loss for each epoch
print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')

Code explanation

  • Lines 5–11: We define the neural network architecture using a class called SimpleNN. This network has the following components:

    • nn.Linear: A fully connected (linear) layer that transforms the input data.

    • nn.ReLU: The rectified linear unit (ReLU) activation function.

    • nn.Sigmoid: The sigmoid activation function.

  • Lines 13–18: The forward method of the SimpleNN class applies linear transformations, ReLU activation, and sigmoid activation to the input data in sequence.

  • Lines 26–27: We generate the random input data and binary target values.

  • Lines 30–34: We create an instance of the SimpleNN model by providing the defined parameters. Then, we define the loss function and the optimizer:

    • criterion: We define the cross-entropy loss BCELoss , which is commonly used for binary classification tasks.

    • optimizer: We define the Adam optimizer, which updates the model's parameters during training.

  • Lines 37–49: We perform a training loop for a specified number of epochs 10. In each epoch, the following steps are performed:

    • Forward pass: We pass the input data through the model to obtain predictions and compute the loss by comparing the model’s predictions with the target values.

    • Backward pass and optimization: We calculate gradients and update the model’s parameters using the optimizer. Finally, we print the loss for each epoch to monitor training progress.

Conclusion

In the code widget above, we have used two of the most commonly used activation functions, namely ReLU and Sigmoid. However, we can utilize other activation functions such as LeakyReLU, Softmax, Tanh, and ELU (exponential linear unit function).

We can see this Educative Answer for more code explanation and details of how to build a neural network.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved