An activation function in PyTorch is a fundamental element of a neural network, and like the on-off switch, it helps our network learn and make sense of data. Activation functions play a crucial role in neural networks because they introduce nonlinearity into the model. But what does that mean, and why is it important? Let’s explore it in a simple way.
Here are the reasons why we use activation functions in the neural network.
Introducing nonlinearity: Neural networks aim to learn complex patterns and relationships in data. If we only used linear operations (like adding or multiplying), the whole network would be a combination of linear functions. In such a case, no matter how deep our network is, it could be represented by a single layer, severely limiting its learning capabilities. Activation functions add essential nonlinearity to our networks, allowing them to learn and represent more complex patterns and relationships.
Learning from errors: During training, neural networks adjust their parameters (weights and biases) to minimize errors, typically represented as a loss function. Activation functions play a role in understanding and reacting to these errors. Based on the input, they determine which neurons fire (activate) and which ones do not. This selective activation helps the network focus on certain features of the data and adjust accordingly.
If we don’t use activation functions, our neural network essentially becomes a linear model. Imagine trying to recognize intricate patterns in images or make sense of complex data without nonlinear activation functions—it would be like trying to paint a masterpiece with only black and white paint. Activation functions add the colors that allow our network to express the rich nuances in data, making it a powerful tool for various tasks like image recognition, natural language processing, and more.
In PyTorch, implementing activation functions within a neural network involves a standard pattern that starts with the following:
We import the necessary libraries, including PyTorch and its neural network module nn
.
We define the structure of the neural network by creating the layers and specifying the input and output sizes. This includes defining fully connected (linear) layers.
Note: We can choose the activation function we want to use (e.g., ReLU, sigmoid, tanh, and softmax) from the
nn
module or create custom activation functions.
We create a forward
method that specifies how data flows through the network. In this method, we apply the linear transformation to the input data and then apply the activation function to the output of the linear layer.
import torchimport torch.nn as nnfc1 = nn.Linear(input_size, output_size)activation_function = nn.activation_functiondef forward(x):x = fc1(x)x = activation_function(x)
In PyTorch, we can easily implement an activation function using built-in functions. Let’s take the ReLU (rectified linear unit) as an example. ReLU is one of the most commonly used activation functions.
import torchimport torch.nn as nn# Define the neural network architectureclass SimpleNN(nn.Module):def __init__(self, input_size, hidden_size, output_size):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(input_size, hidden_size) # Fully connected layer 1self.relu = nn.ReLU() # ReLU activation functionself.fc2 = nn.Linear(hidden_size, output_size) # Fully connected layer 2self.sigmoid = nn.Sigmoid()def forward(self, x):out = self.fc1(x) # Apply the first fully connected layerout = self.relu(out) # Apply the ReLU activation functionout = self.fc2(out) # Apply the second fully connected layerout = self.sigmoid(out)return out# Define network parametersinput_size = 64 # Number of input featureshidden_size = 128 # Number of neurons in the hidden layeroutput_size = 2 # Number of output classes# Input datainput_data = torch.rand(32, input_size) # 32 is the batch sizetarget = torch.randint(0, 2, (32, output_size), dtype=torch.float32) # Random binary target values# Create an instance of the SimpleNN modelmodel = SimpleNN(input_size, hidden_size, output_size)# Define the loss function (Cross Entropy Loss) and optimizer (Adam)criterion = nn.BCELoss()optimizer = torch.optim.Adam(model.parameters(), lr=0.01)# Example training loopnum_epochs = 10 # Define the number of training epochsfor epoch in range(num_epochs):# Forward passoutputs = model(input_data)loss = criterion(outputs, target) # Compute the loss# Backward pass and optimizationoptimizer.zero_grad() # Clear gradientsloss.backward() # Backpropagate to compute gradientsoptimizer.step() # Update the model parameters# Print the loss for each epochprint(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')
Lines 5–11: We define the neural network architecture using a class called SimpleNN
. This network has the following components:
nn.Linear
: A fully connected (linear) layer that transforms the input data.
nn.ReLU
: The rectified linear unit (ReLU) activation function.
nn.Sigmoid
: The sigmoid activation function.
Lines 13–18: The forward
method of the SimpleNN
class applies linear transformations, ReLU activation, and sigmoid activation to the input data in sequence.
Lines 26–27: We generate the random input data and binary target values.
Lines 30–34: We create an instance of the SimpleNN
model by providing the defined parameters. Then, we define the loss function and the optimizer:
criterion
: We define the cross-entropy loss BCELoss
, which is commonly used for binary classification tasks.
optimizer
: We define the Adam
optimizer, which updates the model's parameters during training.
Lines 37–49: We perform a training loop for a specified number of epochs 10
. In each epoch, the following steps are performed:
Forward pass: We pass the input data through the model to obtain predictions and compute the loss by comparing the model’s predictions with the target values.
Backward pass and optimization: We calculate gradients and update the model’s parameters using the optimizer. Finally, we print the loss for each epoch to monitor training progress.
In the code widget above, we have used two of the most commonly used activation functions, namely ReLU
and Sigmoid
. However, we can utilize other activation functions such as LeakyReLU
, Softmax
, Tanh
, and ELU
(exponential linear unit function).
We can see this Educative Answer for more code explanation and details of how to build a neural network.
Free Resources