Time series prediction using LSTM

Long short-term memory (LSTM) is a kind of recurrent neural network (RNN) design primarily used for tasks involving sequence prediction problemsSequence prediction problems involve predicting the next element or several elements in a series, based on previously observed elements; examples include forecasting stock prices, predicting the next word in a sentence, or anticipating the future trajectory of a moving object.. It is particularly effective for tasks where the context of the input sequence is required to predict the output.

This Answer will focus on using LSTM for time series prediction, a common sequence prediction problem.

Understanding LSTM

LSTM networks are an example of a RNN designed to remember long-term relationships in sequence data. The core idea behind the LSTM network is that it uses a system of forces that manage the information entering and exiting the memory units in the network. These gates can determine which parts of a sequence should be retained or discarded, thereby boosting the performance of forces on sequence prediction problems.

Note: To learn about Long Short-Term Memory (LSTM) in more detail, refer to this Answer.

Time series prediction

Time series prediction involves predicting future values based on previously observed values. It is widely used in weather forecasting, stock market predictions, and sales forecasting. LSTM networks are well-suited to this task because they can learn the temporal dependenciesTemporal dependencies in time series prediction refer to the relationships between points in the series at different times, where current outcomes often depend on past data. of the input sequence and use this learning to make predictions.

LSTM for time series prediction

Let's consider a simple example of time series prediction using LSTM in Python. We'll use the Keras library to build and train our LSTM model. The dataset used in this example is the international airline passengers dataset, which shows the total number of airline passengers every month from 1949 to 1960.

Let's see how to implement it!

Step 1: Importing necessary libraries

First, we need to import the necessary libraries for our task.

This is the output plot of the above code. The x-axis depicts time in months, while the y-axis shows the passenger count. The plot is a way of visualizing the data to understand its patterns and trends.

In the context of time series forecasting, such plots are useful to observe seasonality (repeating patterns over time), trend (overall direction of the data up or down over time), and noise (random variation in the data).

For instance, if there's a consistent increase in the number of passengers over the years, that's a trend. We can see that there is an increasing trend in the dataset. If there are consistent peaks and valleys in passengers at certain times of the year, that's seasonality.

Step 3: Preprocessing the dataset

Next, we preprocess the dataset by normalizing it and splitting it into training and testing sets.

# Convert to Numpy Array and Normalize
passenger_array = passenger_dataframe.values.astype('float32')
scaler_toolbox = MinMaxScaler(feature_range=(0, 1))
normalized_passenger_data = scaler_toolbox.fit_transform(passenger_array)
# Divide into Training and Test Segments
partition_size = int(len(normalized_passenger_data) * 0.67)
remainder_size = len(normalized_passenger_data) - partition_size
train_partition, test_partition = normalized_passenger_data[0:partition_size,:], normalized_passenger_data[partition_size:len(normalized_passenger_data),:]

def organize_data(sequence_data, history_length=1):
    input_data, target_data = [], []
    for idx in range(len(sequence_data)-history_length-1):
        fragment = sequence_data[idx:(idx+history_length), 0]
        input_data.append(fragment)
        target_data.append(sequence_data[idx + history_length, 0])
    return np.array(input_data), np.array(target_data)
history_length = 1
train_input, train_target = organize_data(train_partition, history_length)
test_input, test_target = organize_data(test_partition, history_length)
train_input = np.reshape(train_input, (train_input.shape[0], 1, train_input.shape[1]))
test_input = np.reshape(test_input, (test_input.shape[0], 1, test_input.shape[1]))

# Make Predictions and Assess Model
train_forecast = flight_model.predict(train_input)
test_forecast = flight_model.predict(test_input)
train_forecast = scaler_toolbox.inverse_transform(train_forecast)
train_target = scaler_toolbox.inverse_transform([train_target])
test_forecast = scaler_toolbox.inverse_transform(test_forecast)
test_target = scaler_toolbox.inverse_transform([test_target])
train_evaluation = np.sqrt(mean_squared_error(train_target[0], train_forecast[:,0]))
print('Training Evaluation: %.2f RMSE' % (train_evaluation))
test_evaluation = np.sqrt(mean_squared_error(test_target[0], test_forecast[:,0]))
print('Testing Evaluation: %.2f RMSE' % (test_evaluation))
# Visualizing Original Data and Forecasts
plt.figure(figsize=(8,4))
plt.plot(scaler_toolbox.inverse_transform(normalized_passenger_data), label='Original Passenger Data')
plt.plot([item for item in train_forecast], label='Training Forecast')
plt.plot([item+len(train_forecast) for item in range(len(test_forecast))], test_forecast, label='Testing Forecast')
plt.legend()
plt.show()

# Import Required Modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# Fetch Passenger Data
passenger_data_url = "international-airline-passengers.csv"
passenger_dataframe = pd.read_csv(passenger_data_url, usecols=[1], engine='python')
# Convert to Numpy Array and Normalize
passenger_array = passenger_dataframe.values.astype('float32')
scaler_toolbox = MinMaxScaler(feature_range=(0, 1))
normalized_passenger_data = scaler_toolbox.fit_transform(passenger_array)
# Divide into Training and Test Segments
partition_size = int(len(normalized_passenger_data) * 0.67)
remainder_size = len(normalized_passenger_data) - partition_size
train_partition, test_partition = normalized_passenger_data[0:partition_size,:], normalized_passenger_data[partition_size:len(normalized_passenger_data),:]
def organize_data(sequence_data, history_length=1):
    input_data, target_data = [], []
    for idx in range(len(sequence_data)-history_length-1):
        fragment = sequence_data[idx:(idx+history_length), 0]
        input_data.append(fragment)
        target_data.append(sequence_data[idx + history_length, 0])
    return np.array(input_data), np.array(target_data)
history_length = 1
train_input, train_target = organize_data(train_partition, history_length)
test_input, test_target = organize_data(test_partition, history_length)
train_input = np.reshape(train_input, (train_input.shape[0], 1, train_input.shape[1]))
test_input = np.reshape(test_input, (test_input.shape[0], 1, test_input.shape[1]))
# Build and Train LSTM Network
flight_model = Sequential()
flight_model.add(LSTM(4, input_shape=(1, history_length)))
flight_model.add(Dense(1))
flight_model.compile(loss='mean_squared_error', optimizer='adam')
flight_model.fit(train_input, train_target, epochs=100, batch_size=1, verbose=2)
# Make Predictions and Assess Model
train_forecast = flight_model.predict(train_input)
test_forecast = flight_model.predict(test_input)
train_forecast = scaler_toolbox.inverse_transform(train_forecast)
train_target = scaler_toolbox.inverse_transform([train_target])
test_forecast = scaler_toolbox.inverse_transform(test_forecast)
test_target = scaler_toolbox.inverse_transform([test_target])
train_evaluation = np.sqrt(mean_squared_error(train_target[0], train_forecast[:,0]))
print('Training Evaluation: %.2f RMSE' % (train_evaluation))
test_evaluation = np.sqrt(mean_squared_error(test_target[0], test_forecast[:,0]))
print('Testing Evaluation: %.2f RMSE' % (test_evaluation))
# Visualizing Original Data and Forecasts
plt.figure(figsize=(8,4))
plt.plot(scaler_toolbox.inverse_transform(normalized_passenger_data), label='Original Passenger Data')
plt.plot([item for item in train_forecast], label='Training Forecast')
plt.plot([item+len(train_forecast) for item in range(len(test_forecast))], test_forecast, label='Testing Forecast')
plt.legend()
plt.show()

Upon clicking the Run button, the first output shows the plot and the second one shows the rest of the results.

Code explanation

Here’s the explanation of the code:

Lines 1–8: Importing necessary libraries. These include libraries for numerical operations (numpy), data manipulation (pandas), plotting (matplotlib), building the LSTM model (keras), data preprocessing (sklearn’s MinMaxScaler), and model evaluation (sklearn’s mean_squared_error).
Line 11: We define the URL of the dataset.
Line 12: Next, we load the dataset from the URL into a pandas DataFrame. Only the second column (indexed as 1) is used, which contains the number of airline passengers.
Lines 15–17: Here, we convert the DataFrame to a numpy array and set the data type to float32. The data is then normalized to fall within the range of 0 and 1 using MinMaxScaler. This is a common preprocessing step for neural networks.
Lines 20–22: We divided the dataset into two parts: one for training and the other for testing. We put 67% of the data in the training set and kept the remaining 33% for testing.
Lines 24–30: Next up, we create a function that converts the time series data into a format that's just right for training the LSTM model. For every data point in the dataset, this function does a neat trick. It grabs the count of passengers at a specific time (let's call it "t") and the count of passengers at the very next time ("t + 1"). Then, it adds these two counts into separate lists. This clever move results in a dataset filled with sequences (passenger count at time "t") and their matching labels (passenger count at time "t + 1").
Lines 32–37: The training and testing data are transformed using the function defined above. The data is then reshaped to the format expected by the LSTM layer, which is [samples, time steps, features].
Lines 40–43: After that, we set up the LSTM model and give it a proper structure before getting it ready. The model keeps things simple, starting with an LSTM layer and then a Dense layer. For fine-tuning, we've chosen the mean squared error as the loss function and gone with the adam optimizer to optimize the model.
Line 44: After that, we train the model using the training data for 100 rounds of learning (epochs).
Lines 47–48: Predictions are made on the training and testing data using the trained model.
Lines 50–53: The predictions and actual values are transformed back to their original scale by applying the inverse transformation of the MinMaxScaler. This is done because the model was trained on normalized data, so the predictions are also on the same scale.
Lines 55–58: Here, we calculate and print the Root Mean Squared Error (RMSE) of the training and testing predictions.
Lines 61–66: Finally, we plot the original data and the predictions. The plot includes the original data, the predictions on the training data, and the predictions on the testing data. This helps visualize the performance of the model.

Results

The plot shows the original time series data of the number of international airline passengers over time, along with the predictions made by the LSTM model. In the visual, the blue line shows the initial data, and the orange line displays the predictions made on the training data. The green line represents the predictions on the testing data. The plot allows us to visually assess how well the LSTM model captures the underlying patterns and trends in the time series data. It helps us understand the model's performance and how closely the predicted values align with the actual data.

Conclusion

LSTM networks are a highly effective tool for predicting time series data. They are capable of capturing patterns over time and can be implemented effortlessly with modern machine learning libraries. By appropriately preprocessing and tuning the models, LSTM networks can produce remarkable results for a diverse range of time series prediction tasks.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources