A neural network is an AI methodology that empowers computers and devices to handle data in a way that draws inspiration from the human brain. It consists of interconnected nodes or neurons arranged in layers that process and transmit information, much like the configuration of the human brain. Neural networks are designed to recognize complex patterns and relationships in data. Through this layered structure, they create an adaptive system that enables computers to learn from mistakes and improve their performance gradually.
Think of a neuron as a tiny decision-maker. It takes input, processes it, and produces an output. In a neural network, we have artificial neurons, as seen in the above animation.
A neural network is organized into layers. The input layer receives data, the hidden layers process it, and the output layer provides the final result. Each layer has many neurons.
Neurons have parameters called weights and bias. Weights adjust the importance of each input, and bias helps shift the activation function. These are crucial for the network to learn effectively.
The first step of building a neural network using PyTorch is to import the torch
library, as shown below:
Then, we define a class that represents our neural network— SimpleNN
in our case. This class acts as a blueprint for the network we want to create. Next, we set up the initial configuration of our neural network by specifying the number of input features (input_size
), hidden neurons (hidden_size
), and output neurons (output_size
). This is done in the __init__
method.
Inside the SimpleNN
class, we create the building blocks of our neural network, i.e., the input layer, the activation function (ReLU), and the output layer.
Imagine our neural network as a series of connected parts.
self.fc1
represents the first part, a linear layer that takes the input data and transforms it using weights and biases to produce some intermediate values.
self.relu
represents the activation function. It is a filter that removes the negative values and keeps the positive ones.
self.fc2
is the second linear layer that takes the filtered data from the activation function and transforms it to produce the final output.
Finally, we need to specify how data flows through the network. This is done by defining the forward pass in the neural network class, as shown below:
We connect the different parts. We first pass the x
input through self.fc1
, then through the activation function, self.relu
, and finally through self.fc2
. Each step transforms the data. We return the final result, which is the output of our network, after passing the input through all these layers.
Training a neural network involves setting it up with the right structure, defining how to measure its mistakes (loss), adjusting its parameters to minimize those mistakes (optimization), and iteratively improving its predictions through backpropagation. PyTorch provides a powerful and accessible framework to accomplish these steps and build intelligent systems.
We need to know how wrong our predictions are. A loss function measures the difference between the predicted output and the actual label. Common loss functions include the mean squared error (MSE) or cross-entropy loss. We select an appropriate loss function based on our task.
To minimize loss, we use an optimization algorithm like gradient descent. It adjusts the weights and biases in a way that reduces the loss, step by step.
This is a crucial step. When we calculate the loss, we propagate this information backward through the network, adjusting the weights and biases accordingly. It is like learning from mistakes and getting better at predicting.
Let’s see the implementation of how to train a neural network using PyTorch.
import torchimport torch.nn as nn# Define the neural network architectureclass SimpleNN(nn.Module):def __init__(self, input_size, hidden_size, output_size):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(input_size, hidden_size) # Fully connected layer 1self.relu = nn.ReLU() # ReLU activation functionself.fc2 = nn.Linear(hidden_size, output_size) # Fully connected layer 2def forward(self, x):out = self.fc1(x) # Apply the first fully connected layerout = self.relu(out) # Apply the ReLU activation functionout = self.fc2(out) # Apply the second fully connected layerreturn out# Define network hyperparametersinput_size = 64 # Number of input featureshidden_size = 128 # Number of neurons in the hidden layeroutput_size = 10 # Number of output classes# Input datainput_data = torch.rand(32, input_size) # 32 is the batch sizetarget = torch.empty(32, dtype=torch.long).random_(output_size)# Create an instance of the SimpleNN modelmodel = SimpleNN(input_size, hidden_size, output_size)# Define the loss function (Cross Entropy Loss) and optimizer (Adam)criterion = nn.CrossEntropyLoss()optimizer = torch.optim.Adam(model.parameters(), lr=0.01)# Example training loopnum_epochs = 10 # Define the number of training epochsfor epoch in range(num_epochs):# Forward passoutputs = model(input_data)loss = criterion(outputs, target) # Compute the loss# Backward pass and optimizationoptimizer.zero_grad() # Clear gradientsloss.backward() # Backpropagate to compute gradientsoptimizer.step() # Update the model parameters# Print the loss for each epochprint(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')
Lines 28–32: We create an instance of the SimpleNN
model with the specified input, hidden, and output sizes. Then, we define the loss function, CrossEntropyLoss()
, to calculate the loss between the actual and the predicted label and the Adam
optimizer to minimize the loss.
Lines 35–39: We set up a training loop to train the neural network for a specified number of epochs, num_epochs
. We pass the input data inside the loop through the model to obtain predictions. The loss is calculated by comparing the model's predictions with the target values.
Line 42: We clear the gradients of the model’s parameters that are stored in the optimizer. The optimizer keeps track of the gradients for each parameter and calls the zero_grad()
function to set these gradients to zero for the current iteration.
Note: During the backward pass, if we don’t clear these gradients before the next backward pass, the new gradients are added to the existing ones. This can lead to incorrect gradient information and make the optimization process ineffective or unstable.
Line 43: We compute the gradients of the loss with respect to the model’s parameters using backpropagation. These gradients are accumulated in the model’s parameters, which are then used for the next step.
Line 44: We update the model’s weights in the direction that minimizes the loss.
Line 47: We print the loss for each epoch to monitor the training progress.
What is the primary function of the torch.nn.Module
class in PyTorch?
Data preprocessing
Visualization of neural networks
Defining and managing neural network layers
Loading pretrained models
Unlock your potential: Neural network series, all in one place!
To continue your exploration of Neural network, check out our series of Answers below:
What are artificial neural networks?
Learn how artificial neural networks (ANNs), inspired by the human brain, perform tasks like classification and prediction through interconnected layers and neurons.
Why do we use neural networks?
Learn how neural networks offer high approximation and representational power, enabling valuable data utilization and excelling in tasks like automated image classification.
Training of a neural network using pytorch
Learn how artificial neural networks mimic brain functions to process data, and how PyTorch simplifies building and training them using layers, weights, loss functions, and backpropagation.
How neural language models work in ChatGPT
Learn how ChatGPT uses transformer architecture with a focus on the decoder, leveraging vast data and attention mechanisms to generate coherent responses.
Benefits and Limitations of Neural Machine Translation in ChatGPT
Learn how ChatGPT's neural machine translation offers efficient, accurate language translations, while acknowledging its limitations due to its novelty.
What is Graph Neural Networks?
Learn how Graph Neural Networks (GNNs) handle non-Euclidean data using graphs, excelling in clustering, visualization, prediction, NLP, molecule structures, cybersecurity, and social network analysis.
What is a neural network-based approach for graph embeddings?
Learn how graph embeddings use neural networks like GCNs to represent graph data as vectors, enabling efficient analysis and tasks like node classification and link prediction.
How to avoid overfitting in neural network
Learn how to use cross-validation, regularization, dropout, early stopping, and data augmentation to effectively avoid overfitting in machine learning models.
How to Do Back Propagation in a Neural Network
Learn how to calculate gradients using backpropagation to update neural network parameters and improve learning from data actions.
PyTorch cheatsheet: Neural network layers
PyTorch provides diverse neural network layers, enabling the design and training of complex models for tasks like image classification, sequence modeling, and reinforcement learning.
Free Resources