The long-short-term memory (LSTM) and gated recurrent unit (GRU) were introduced as variations of recurrent neural networks (RNNs) to tackle the vanishing gradient problem. This occurs when gradients diminish exponentially as they propagate through many layers of a neural network during training. These models were designed to identify relevant information within a paragraph and retain only the necessary details.
LSTMs and GRUs are designed to mitigate the vanishing gradient problem by incorporating gating mechanisms that allow for better information flow and retention over longer sequences. The fundamental mechanism of the LSTM and GRU gates governs what information is kept and what information is discarded. Neural networks tackle the exploding and disappearing gradient problem by using LSTM and GRU.
Let’s understand how they work.
LSTM uses a number of gates that regulate how information in a data sequence enters, is stored in, and exits the network. A typical LSTM contains three gates: forget
, input
, and output
. These gates function as filters, and each has its own neural network. The forget gate will manage which information should be considered and which should be ignored. The input gate adds the information to the cell state, and the output gate is responsible for the output, which is fetched from the current state.
A GRU is just like an LSTM but with fewer parameters. It is a type of recurrent neural network that uses two gates, update
and reset
, which are vectors that decide what information should be passed for the output. A reset gate permits us to control the amount of the past state, which we should keep in mind in any case. Likewise, an update gate permits us to control the amount of the new state that is only a duplicate of the old state.
GRU | LSTM |
|
|
|
|
|
|
|
|
If someone is really concerned about less memory consumption and fast processing, they should consider using a GRU. This is because a GRU can process data by consuming less memory and more quickly, and having less complex architecture is also a considerable point in the computation.
Free Resources