What is the tanh activation function?

The tanh activation function, also called the hyperbolic tangent activation function, is a mathematical function commonly used in artificial neural networks for their hidden layers. It transforms input values to produce output values between -1 and 1. It is expressed as the ratio of the difference between the exponential of the input value and the exponential of its negation to the sum of these exponentials.

Mathematically, the tanh activation function can be represented as:

tanh(x)=(exex)(ex+ex)tanh(x) = \frac{(e^x - e^{-x})}{(e^x + e^{-x})}

Where:

  • x represents the input value.

  • e is a mathematical constant approximately equal to 2.71828.

The use of the tanh function

In a neural network architecture, the tanh function is advantageous for the following reasons and use cases.

Nonlinear relationship between the input and output data

It is a smooth, differentiable function that incorporates nonlinearity into the network which means that the output of this function is not a simple linear function of its input. This allows the network to learn a more complex, nonlinear relationship between the input and the output data. Let's illustrate this using a code example.

Code example

import numpy as np
import matplotlib.pyplot as plt
# Define input values
x = np.linspace(-5, 5, 100)
# Compute the output of the tanh function for the input values
y = np.tanh(x)
# Plot the output of the tanh function
plt.plot(x, y)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('tanh function output')
plt.show()

Code explanation

  • Line 1: We import the numpy module and assign it the alias np.

  • Line 2: We import the matplotlib.pyplot module and assign it the alias plt.

  • Line 5: We define an array of input values for the tanh function using the np.linspace() function, which generates 100 evenly spaced points between -5 and 5 inclusive.

  • Line 8: We compute the output of the tanh function for each input value in the x array using the np.tanh() function from numpy, and assign it to the variable y.

  • Lines 11–15: We generate a plot of the tanh function using the plt.plot() function, with x and y as the input and output values respectively. We set the x-axis label to Input, the y-axis label to Output, the title of the plot to tanh function output. Finally, we display the plot using the plt.show() function.

Symmetry around the origin

The tanh function is symmetric around the origin, meaning that it outputs negative values for negative input values, and positive values for positive input values. Let's illustrate this using a code example.

Code example

import numpy as np
import matplotlib.pyplot as plt
# Define the tanh function
def tanh(x):
return np.tanh(x)
# Create a range of input values from -5 to 5
x = np.arange(-5, 5, 0.1)
# Compute the output of the tanh function for each input value
y = tanh(x)
# Plot the output of the tanh function
plt.plot(x, y)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('tanh Function')
plt.show()

Code Explanation

  • Line 1: We import the numpy module and assign it the alias np.

  • Line 2: We import the matplotlib.pyplot module and assign it the alias plt.

  • Lines 4–6: We define the tanh function, which takes an input x and returns the hyperbolic tangent of x, computed using the np.tanh() function from numpy.

  • Line 9: We create an array of input values for the tanh function using the np.arange() function, ranging from -5 to 5 with a step size of 0.1.

  • Line 12: We compute the output of the tanh function for each input value in the x array by calling the tanh() function defined earlier, and assign it to the variable y.

  • Lines 15–19: We generate a plot of the tanh() function using the plt.plot() function, with x and y as the input and output values respectively. We set the x-axis label to Input, the y-axis label to Output, and the title of the plot to tanh function. Finally, we display the plot using the plt.show() function.

Limitation of the tanh function

The limitation of the tanh function is that it suffers from the vanishing gradient problem. This simply means that as the input function becomes very small or very large depending on the case, the gradient of the function approaches zero, thus making it difficult for the network to update the weights of the earlier layers to learn from the input data. This is usually a big problem in deep neural networks having many layers, since the gradients can become extremely small by the time they reach the earlier layers. This leads to slow convergence and poor performance. Let's illustrate this using a code example.

Code example

import numpy as np
import matplotlib.pyplot as plt
def tanh(x):
return np.tanh(x)
x = np.linspace(-5, 5, 100)
# Compute the gradients of the tanh function
tanh_grad = 1 - np.tanh(x)**2
# Plot the gradients of the tanh function
plt.plot(x, tanh_grad, label='tanh')
plt.xlabel('Input')
plt.ylabel('Gradient')
plt.title('Gradients of the tanh activation function')
plt.legend()
plt.show()

As we can see from the plot, the gradient of the tanh function approaches zero as the input becomes very large or very small.

Code explanation

  • Lines 1–2: We import the numpy module with the alias np and the matplotlib.pyplot module with the alias plt.

  • Lines 5–6: We define the tanh() function that takes in an array of values x and returns the hyperbolic tangent of that array using the np.tanh() function.

  • Line 8: We define the input values using the np.linspace() function, which returns an array of evenly spaced numbers over a specified interval.

  • Line 11: We compute the gradients of the tanh function using the formula 1 - np.tanh(x)**2 and assign the output to tanh_grad.

  • Lines 14–19: We plot the gradients of the tanh function using the plt.plot() function. We set the input values x as the x-axis values and the tanh_grad as the y-axis values. We label the plot as tanh. We set the x-axis label to Input, the y-axis label to Gradient, and the title to Gradients of the tanh activation function. Finally, we add a legend to the plot using the plt.legend() function and display the plot using the plt.show() function.

A solution to this limitation

To mitigate the issue of the vanishing gradient problem, other learning activation functions such as rectified linear unit (ReLU), Leaky ReLU, exponential linear unit (ELU), Maxout, and other variants of ReLU have shown to work well in neural network architectures and can help to lessen the problem of vanishing gradients that can occur when using the tanh activation function.

Summary

In this answer, we looked at the tanh activation function, a very commonly used function in a neural network architecture. We looked at some of the reasons why the function is usually used: Nonlinear relationship between the input and output data and its symmetry around the origin. The limitation of the tanh function is that it is affected by the vanishing gradient problem which can be solved using other learning activation functions.

Free Resources