Example-driven implementation of a bidirectional LSTM

When processing a sequence, a bidirectional long short-term memory (LSTM), a kind of recurrent neural network (RNN) architecture, considers information from both past and future states.

Since information only moves from the past to the future, a standard LSTM only considers context up to the current time step. Bidirectional long short-term memory (LSTM) processes two different hidden states for each time step:

  1. Ahead direction: One does the forward-going sequence from past to future.

  2. Draft direction: The reverse-going sequence from future to past.

This allows the model to collect input in both directions and better understand the context of each time step in a series. This is particularly useful for work in natural language processing (NLP) because words that come before and after may impact a word’s meaning.

Implementation

Any RNN or conventional LSTM may detect temporal patterns by advancing from the past to the future. Put another way, the cell state only learns from its history up to a specific time step. It is incapable of seeing into the future. This idea is illustrated in more detail in the following image. The truck of memories (cell state) travels from left to right, as seen in the image.

The use of a bidirectional RNN was a key addition proposed by Schuster and Paliwal (1997). A mirror RNN layer is added to the original RNN layer in a bidirectional RNN. The input sequence is received in its original form by the original layer and in reversed form by the mirror.

As such, all available input data, past and future, can be used to teach the cell states. The figure that follows demonstrates this. The historical data enters the cell state in the top lane from left to right, much like in a conventional LSTM layer. Furthermore, in the bottom lane, data from the future flows from right to left, returning to the cell states.

A bi-directional LSTM
A bi-directional LSTM

The ability of a bidirectional LSTM layer to store collective information from the past and future gives it tremendous power.

We can take a semi-online approach to solving the present sheet break problem. A prediction can be created by observing a window of sensor data. When looking at problems through the prism of LSTM, the majority of them are actually either offline or semi-online. The code below creates a bidirectional LSTM network.

Note: It will take a couple of minutes for the code below to execute completely.

model = Sequential () 
model.add(Input(shape=(TIMESTEPS , N_FEATURES),
                name='input'))
model.add( 
  Bidirectional(
      LSTM(units=16, 
           activation='tanh',
           return_sequences=True), 
      name='bi_lstm_layer_1'))
model.add(Dropout(0.5)) 
model.add(
  Bidirectional( 
      LSTM(units=8,
           activation='tanh', 
           return_sequences=True),
      name='bi_lstm_layer_2')) 
model.add(Flatten())
model.add(Dense(units=1, 
                activation='sigmoid',
                name='output'))

model.summary()

model.compile(optimizer='adam', 
              loss='binary_crossentropy',
              metrics =[ 
                'accuracy',
                tf.keras.metrics.Recall(), 
                pm.F1Score(), 
                pm.FalsePositiveRate()
              ])

history = model.fit(x=X_train_scaled ,
                    y=y_train ,
                    batch_size=128,
                    epochs=100, 
                    validation_data=(X_valid_scaled ,
                                     y_valid), 
                    verbose=1).history

# Plotting loss and saving to a file

plt , fig = sp.plot_metric(history ,
metric='f1_score', ylim=[0., 1.], filename='score.png') 

plt , fig = sp.plot_metric(history ,
metric='loss', filename='score1.png')

plt , fig = sp.plot_model_recall_fpr(history, filename='score2.png')

Bi-directional LSTM network

Here is the implementation of the bidirectional LSTM network code.

  • Lines 1–20: We use the Keras API to create a sequential model. Two bidirectional LSTM layers make up the model, and each is followed by a dropout layer. After being flattened, the output is linked to a dense layer using a sigmoid activation function.

  • Line 22: We display a summary of the model architecture.

  • Lines 24–31: We compile the model with the Adam optimizer, binary cross-entropy loss function, and additional custom metrics such as accuracy, recall, F1 score, and false-positive rate.

  • Line 33–39: We train the model using the training data (X_train_scaled and y_train) for 100 epochs, with a batch size of 128. Validate the model using the validation data (X_valid_scaled and y_valid). The training history is stored in the history variable.

This leads to twice as many parameters for a bidirectional layer as for a regular LSTM layer, as can be seen when we execute the code above.

Conclusion

Sequence data can be processed by bidirectional LSTM networks, both forward and backward, allowing them to capture context from past and future states. This ability is especially helpful for situations where word context is important, such as natural language processing. The example-driven implementation shows how to use TensorFlow and Keras to build and train a bidirectional LSTM model for improved prediction performance.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved