Activation functions in neural networks decide whether a neuron should be activated, helping the network learn complex patterns during training.
The hyperbolic tangent function (Tanh) is a popular activation function in neural networks and deep learning. It’s a scaled and shifted version of the Sigmoid function. Like Sigmoid, it’s also s-shaped, but instead of having an output range of
The mathematical formula for the Tanh function is as follows:
It takes any real value as input and output values in the range of
The above graph describes the hyperbolic tangent (Tanh) function, mapping inputs between
Let’s see the implementation of the Tanh activation function in the following neural network using PyTorch.
import torchimport torch.nn as nn# Define the neural network architectureclass SimpleNN(nn.Module):def __init__(self, input_size, hidden_size, output_size):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(input_size, hidden_size) # Fully connected layer 1self.tanh = nn.Tanh() # Tanh activation functionself.fc2 = nn.Linear(hidden_size, output_size) # Fully connected layer 2self.sigmoid = nn.Sigmoid()def forward(self, x):out = self.fc1(x) # Apply the first fully connected layerout = self.tanh(out) # Apply the Tanh activation functionout = self.fc2(out) # Apply the second fully connected layerout = self.sigmoid(out) # Apply the Sigmoid activation function at the endreturn out# Define network parametersinput_size = 64 # Number of input featureshidden_size = 128 # Number of neurons in the hidden layeroutput_size = 2 # Number of output classes# Input datainput_data = torch.rand(32, input_size) # 32 is the batch sizetarget = torch.randint(0, 2, (32, output_size), dtype=torch.float32) # Random binary target values# Create an instance of the SimpleNN modelmodel = SimpleNN(input_size, hidden_size, output_size)# Define the loss function (binary entropy loss) and optimizer (Adam)criterion = nn.BCELoss()optimizer = torch.optim.Adam(model.parameters(), lr=0.01)# Example training loopnum_epochs = 10 # Define the number of training epochsfor epoch in range(num_epochs):# Forward passoutputs = model(input_data)loss = criterion(outputs, target) # Compute the loss# Backward pass and optimizationoptimizer.zero_grad() # Clear gradientsloss.backward() # Backpropagate to compute gradientsoptimizer.step() # Update the model parameters# Print the loss for each epochprint(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')
Line 9: We create an instance of the Tanh activation function using the nn.Tanh()
command, and store it as an attribute of the SimpleNN
class named self.tanh
.
Line 15: We apply the Tanh activation function to the output of the first fully connected layer. It maps input values to a range from -1
to 1
.
For more details on how to build a neural network take a look at this Educative Answer.
Insights for machine learning engineers:
While Tanh is generally superior to the Sigmoid function because it is zero-centered, it still suffers from the vanishing gradient problem for extreme values of input. This can make training deep networks with many layers tricky.
Even though ReLU and its variants are more popular in deep networks due to their efficiency, Tanh is still prevalent in tasks that benefit from considering both positive and negative inputs distinctly, like in certain NLP tasks or when data is strictly bounded and you need to maintain a zero mean.
Free Resources