Long short-term memory (LSTM) is a kind of recurrent neural network (RNN) design primarily used for tasks involving
This Answer will focus on using LSTM for time series prediction, a common sequence prediction problem.
LSTM networks are an example of a RNN designed to remember long-term relationships in sequence data. The core idea behind the LSTM network is that it uses a system of forces that manage the information entering and exiting the memory units in the network. These gates can determine which parts of a sequence should be retained or discarded, thereby boosting the performance of forces on sequence prediction problems.
Note: To learn about Long Short-Term Memory (LSTM) in more detail, refer to this Answer.
Time series prediction involves predicting future values based on previously observed values. It is widely used in weather forecasting, stock market predictions, and sales forecasting. LSTM networks are well-suited to this task because they can learn the
Let's consider a simple example of time series prediction using LSTM in Python. We'll use the Keras library to build and train our LSTM model. The dataset used in this example is the international airline passengers dataset, which shows the total number of airline passengers every month from 1949 to 1960.
Let's see how to implement it!
First, we need to import the necessary libraries for our task.
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom keras.models import Sequentialfrom keras.layers import Dense, LSTMfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import mean_squared_error
We will load the dataset directly from a URL and plot it to visualize it.
# Load the dataset# passenger_data_url = "https://raw.githubusercontent.com/quratulaincodes/SSR/main/international-airline-passengers.csv"passenger_data_url = "international-airline-passengers.csv"passenger_dataframe = pd.read_csv(passenger_data_url, usecols=[1], engine='python')# Plot the original datasetplt.figure(figsize=(8,4))plt.plot(passenger_dataframe, label='Original data')plt.legend()plt.show()
This is the output plot of the above code. The x-axis depicts time in months, while the y-axis shows the passenger count. The plot is a way of visualizing the data to understand its patterns and trends.
In the context of time series forecasting, such plots are useful to observe seasonality (repeating patterns over time), trend (overall direction of the data up or down over time), and noise (random variation in the data).
For instance, if there's a consistent increase in the number of passengers over the years, that's a trend. We can see that there is an increasing trend in the dataset. If there are consistent peaks and valleys in passengers at certain times of the year, that's seasonality.
Next, we preprocess the dataset by normalizing it and splitting it into training and testing sets.
# Convert to Numpy Array and Normalizepassenger_array = passenger_dataframe.values.astype('float32')scaler_toolbox = MinMaxScaler(feature_range=(0, 1))normalized_passenger_data = scaler_toolbox.fit_transform(passenger_array)# Divide into Training and Test Segmentspartition_size = int(len(normalized_passenger_data) * 0.67)remainder_size = len(normalized_passenger_data) - partition_sizetrain_partition, test_partition = normalized_passenger_data[0:partition_size,:], normalized_passenger_data[partition_size:len(normalized_passenger_data),:]
We then prepare the dataset for the LSTM model by reshaping it.
def organize_data(sequence_data, history_length=1):input_data, target_data = [], []for idx in range(len(sequence_data)-history_length-1):fragment = sequence_data[idx:(idx+history_length), 0]input_data.append(fragment)target_data.append(sequence_data[idx + history_length, 0])return np.array(input_data), np.array(target_data)history_length = 1train_input, train_target = organize_data(train_partition, history_length)test_input, test_target = organize_data(test_partition, history_length)train_input = np.reshape(train_input, (train_input.shape[0], 1, train_input.shape[1]))test_input = np.reshape(test_input, (test_input.shape[0], 1, test_input.shape[1]))
We build the LSTM model and train it using our training data.
# Build and Train LSTM Networkflight_model = Sequential()flight_model.add(LSTM(4, input_shape=(1, history_length)))flight_model.add(Dense(1))flight_model.compile(loss='mean_squared_error', optimizer='adam')flight_model.fit(train_input, train_target, epochs=100, batch_size=1, verbose=2)
Finally, we make predictions using our model and evaluate its performance.
# Make Predictions and Assess Modeltrain_forecast = flight_model.predict(train_input)test_forecast = flight_model.predict(test_input)train_forecast = scaler_toolbox.inverse_transform(train_forecast)train_target = scaler_toolbox.inverse_transform([train_target])test_forecast = scaler_toolbox.inverse_transform(test_forecast)test_target = scaler_toolbox.inverse_transform([test_target])train_evaluation = np.sqrt(mean_squared_error(train_target[0], train_forecast[:,0]))print('Training Evaluation: %.2f RMSE' % (train_evaluation))test_evaluation = np.sqrt(mean_squared_error(test_target[0], test_forecast[:,0]))print('Testing Evaluation: %.2f RMSE' % (test_evaluation))# Visualizing Original Data and Forecastsplt.figure(figsize=(8,4))plt.plot(scaler_toolbox.inverse_transform(normalized_passenger_data), label='Original Passenger Data')plt.plot([item for item in train_forecast], label='Training Forecast')plt.plot([item+len(train_forecast) for item in range(len(test_forecast))], test_forecast, label='Testing Forecast')plt.legend()plt.show()
Below is the complete executable code for time series prediction using LSTM model:
# Import Required Modulesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom keras.models import Sequentialfrom keras.layers import Dense, LSTMfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import mean_squared_error# Fetch Passenger Datapassenger_data_url = "international-airline-passengers.csv"passenger_dataframe = pd.read_csv(passenger_data_url, usecols=[1], engine='python')# Convert to Numpy Array and Normalizepassenger_array = passenger_dataframe.values.astype('float32')scaler_toolbox = MinMaxScaler(feature_range=(0, 1))normalized_passenger_data = scaler_toolbox.fit_transform(passenger_array)# Divide into Training and Test Segmentspartition_size = int(len(normalized_passenger_data) * 0.67)remainder_size = len(normalized_passenger_data) - partition_sizetrain_partition, test_partition = normalized_passenger_data[0:partition_size,:], normalized_passenger_data[partition_size:len(normalized_passenger_data),:]def organize_data(sequence_data, history_length=1):input_data, target_data = [], []for idx in range(len(sequence_data)-history_length-1):fragment = sequence_data[idx:(idx+history_length), 0]input_data.append(fragment)target_data.append(sequence_data[idx + history_length, 0])return np.array(input_data), np.array(target_data)history_length = 1train_input, train_target = organize_data(train_partition, history_length)test_input, test_target = organize_data(test_partition, history_length)train_input = np.reshape(train_input, (train_input.shape[0], 1, train_input.shape[1]))test_input = np.reshape(test_input, (test_input.shape[0], 1, test_input.shape[1]))# Build and Train LSTM Networkflight_model = Sequential()flight_model.add(LSTM(4, input_shape=(1, history_length)))flight_model.add(Dense(1))flight_model.compile(loss='mean_squared_error', optimizer='adam')flight_model.fit(train_input, train_target, epochs=100, batch_size=1, verbose=2)# Make Predictions and Assess Modeltrain_forecast = flight_model.predict(train_input)test_forecast = flight_model.predict(test_input)train_forecast = scaler_toolbox.inverse_transform(train_forecast)train_target = scaler_toolbox.inverse_transform([train_target])test_forecast = scaler_toolbox.inverse_transform(test_forecast)test_target = scaler_toolbox.inverse_transform([test_target])train_evaluation = np.sqrt(mean_squared_error(train_target[0], train_forecast[:,0]))print('Training Evaluation: %.2f RMSE' % (train_evaluation))test_evaluation = np.sqrt(mean_squared_error(test_target[0], test_forecast[:,0]))print('Testing Evaluation: %.2f RMSE' % (test_evaluation))# Visualizing Original Data and Forecastsplt.figure(figsize=(8,4))plt.plot(scaler_toolbox.inverse_transform(normalized_passenger_data), label='Original Passenger Data')plt.plot([item for item in train_forecast], label='Training Forecast')plt.plot([item+len(train_forecast) for item in range(len(test_forecast))], test_forecast, label='Testing Forecast')plt.legend()plt.show()
Upon clicking the Run button, the first output shows the plot and the second one shows the rest of the results.
Here’s the explanation of the code:
Lines 1–8: Importing necessary libraries. These include libraries for numerical operations (numpy
), data manipulation (pandas
), plotting (matplotlib
), building the LSTM model (keras
), data preprocessing (sklearn’s MinMaxScaler
), and model evaluation (sklearn’s mean_squared_error
).
Line 11: We define the URL of the dataset.
Line 12: Next, we load the dataset from the URL into a pandas
DataFrame. Only the second column (indexed as 1) is used, which contains the number of airline passengers.
Lines 15–17: Here, we convert the DataFrame to a numpy
array and set the data type to float32
. The data is then normalized to fall within the range of 0 and 1 using MinMaxScaler
. This is a common preprocessing step for neural networks.
Lines 20–22: We divided the dataset into two parts: one for training and the other for testing. We put 67% of the data in the training set and kept the remaining 33% for testing.
Lines 24–30: Next up, we create a function that converts the time series data into a format that's just right for training the LSTM model. For every data point in the dataset, this function does a neat trick. It grabs the count of passengers at a specific time (let's call it "t") and the count of passengers at the very next time ("t + 1"). Then, it adds these two counts into separate lists. This clever move results in a dataset filled with sequences (passenger count at time "t") and their matching labels (passenger count at time "t + 1").
Lines 32–37: The training and testing data are transformed using the function defined above. The data is then reshaped to the format expected by the LSTM
layer, which is [samples, time steps, features].
Lines 40–43: After that, we set up the LSTM model and give it a proper structure before getting it ready. The model keeps things simple, starting with an LSTM layer and then a Dense layer. For fine-tuning, we've chosen the mean squared error as the loss function and gone with the adam optimizer to optimize the model.
Line 44: After that, we train the model using the training data for 100 rounds of learning (epochs).
Lines 47–48: Predictions are made on the training and testing data using the trained model.
Lines 50–53: The predictions and actual values are transformed back to their original scale by applying the inverse transformation of the MinMaxScaler
. This is done because the model was trained on normalized data, so the predictions are also on the same scale.
Lines 55–58: Here, we calculate and print the Root Mean Squared Error (RMSE) of the training and testing predictions.
Lines 61–66: Finally, we plot the original data and the predictions. The plot includes the original data, the predictions on the training data, and the predictions on the testing data. This helps visualize the performance of the model.
The plot shows the original time series data of the number of international airline passengers over time, along with the predictions made by the LSTM model. In the visual, the blue line shows the initial data, and the orange line displays the predictions made on the training data. The green line represents the predictions on the testing data. The plot allows us to visually assess how well the LSTM model captures the underlying patterns and trends in the time series data. It helps us understand the model's performance and how closely the predicted values align with the actual data.
LSTM networks are a highly effective tool for predicting time series data. They are capable of capturing patterns over time and can be implemented effortlessly with modern machine learning libraries. By appropriately preprocessing and tuning the models, LSTM networks can produce remarkable results for a diverse range of time series prediction tasks.
Free Resources