Implementation of the loss function in PyTorch

Imagine we’re learning to play a game, and we want to minimize the number of mistakes we make. We might keep track of the difference between our actual performance and the expected or desired performance. This difference is similar to what a loss function does in a neural network. It measures how far off our predictions are from the actual targets.

In the context of a neural network, the loss function quantifies how well the model is performing. It helps the network adjust its parameters (weights and biases) during training to make better predictions.

In the illustration, the flow describes how a loss function is implemented in a neural network, likely using a framework like PyTorch. Here’s a breakdown of the components:

x (Input data):
1. This is the input to the neural network. It represents the features of your data that the model will use to make predictions. For instance, in an image classification task, x could represent pixel values.
Neural network (Hidden layers):
1. The network processes the input data x through several hidden layers with interconnected neurons. These layers apply transformations and weights to the input to extract features and make predictions.
y_pred (Predicted output):
1. This is the output of the neural network after processing x. It represents the predicted value(s) or class probabilities depending on the type of problem (regression or classification). It is the model’s best guess based on its current parameters.
y (Actual label/Target):
1. This is the ground truth or the actual value corresponding to the input x. In a supervised learning task, it’s the correct label or output that the model is trying to predict.
Loss function (J(y_pred, y)):
1. The loss function calculates the difference between the predicted output (y_pred) and the actual target (y). This error or loss helps quantify how well the model is performing. A common loss function for classification is Cross-Entropy Loss, while Mean Squared Error (MSE) is used for regression tasks.
Loss value:
1. The calculated loss is fed back into the model to adjust its parameters (during backpropagation) in order to minimize this error over time, improving predictions for future inputs.

Cross-entropy loss

Cross-entropy loss is a commonly used loss function in machine learning, especially for classification problems. It measures the performance of a classification model whose output is a probability value between $0$ and $1$ . The cross-entropy loss increases as the predicted probability diverges from the actual label.

Mathematical representation

Here’s the mathematical representation of cross-entropy loss for a binary and multiclass classification problems.

For binary classification (where the outcomes are labeled as 0 or 1), the cross-entropy loss for a single example is given by:

Import the necessary libraries, including PyTorch and its neural network module nn, where torch is used to create tensors and torch.nn is used to load loss functions.
Load the loss function according to our data using nn.LossFunction() and provide the required inputs in the form of tensors, i.e., predicted_values, which are the output predicted by our model and real_values, which are the actual target values.

Note: We can choose the loss function we want to use (e.g., cross entropy loss (CrossEntropyLoss), mean square error (MSELoss), and binary cross entropy loss (BCELoss) etc) from the nn module or create custom activation functions.

Implementation

In PyTorch, we can easily implement a loss function using the built-in module torch.nn. Let’s consider the example of cross-entropy loss, which is used for multiclass classification. If we are working with binary classification, we’ll use binary cross-entropy (BCE) loss.

import torch
import torch.nn as nn
# Define the neural network architecture
class Simple_NN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Simple_NN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)  # Fully connected layer 1
        self.relu = nn.ReLU()  # ReLU activation function
        self.fc2 = nn.Linear(hidden_size, output_size)  # Fully connected layer 2
    def forward(self, x):
        out = self.fc1(x)  # Apply the first fully connected layer
        out = self.relu(out)  # Apply the ReLU activation function
        out = self.fc2(out)  # Apply the second fully connected layer
        return out
# Define network hyperparameters
input_size = 64  # Number of input features
hidden_size = 128  # Number of neurons in the hidden layer
output_size = 10  # Number of output classes
# Input data
input_data = torch.rand(32, input_size)  # 32 is the batch size
target = torch.empty(32, dtype=torch.long).random_(output_size)
# Create an instance of the Simple_NN model
model = Simple_NN(input_size, hidden_size, output_size)
# Define the loss function (Cross-entropy loss) and optimizer (Adam)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Example training loop
n_epochs = 10  # Define the number of training epochs
for epoch in range(n_epochs):
    # Forward pass
    output = model(input_data)
    loss = criterion(output, target)  # Compute the loss
    # Backward pass and optimization
    optimizer.zero_grad()  # Clear gradients
    loss.backward()  # Backpropagate to compute gradients
    optimizer.step()  # Update the model parameters
    # Print the loss for each epoch
    print(f'Epoch [{epoch + 1}/{n_epochs}], Loss: {loss.item()}')

Code explanation

Line 31: This line creates an instance of the cross-entropy loss function. In PyTorch, nn.CrossEntropyLoss() is often used as the loss function for classification problems, especially when the output is class labels. This loss function combines a softmax activation and a negative log-likelihood loss. It’s suitable for multiclass classification problems.
Line 39: This line calculates the loss. It takes the model’s output (output) and the target or ground truth values (target) and computes the loss value. The output tensor contains the predictions made by the neural network, and target holds the true labels for the input data.

During the training loop:

The model makes predictions using the input data.
The loss is calculated by comparing these predictions to the true targets using the defined criterion (nn.CrossEntropyLoss()).
The optimizer (torch.optim.Adam) then uses this loss to adjust the model's parameters during backpropagation.

This process iterates over multiple epochs, and in each epoch, the loss is calculated, and the model parameters are updated to minimize this loss, ultimately improving the model’s performance. The printed loss values indicate how well the model fits the training data as the training progresses. The aim is to minimize this loss over the training iterations.

Conclusion

The loss function, which is critical in guiding the network during training, informs the model how to adjust its parameters (weights) to improve performance over time. By iteratively reducing the loss, the model becomes better at making accurate predictions, demonstrating the core principle of optimization in machine learning.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.