Mastering Stock Price Prediction with LSTM Model in Python

LSTM Model Feature Image

Long Short-Term Memory (LSTM) is a type of neural network that is widely used for sequence-to-sequence modeling. It is especially useful for modeling time series data, such as stock prices or weather forecasts, because it can capture long-term dependencies and has the ability to remember information from previous time steps.

Python offers several libraries for implementing LSTM models, including Keras and TensorFlow. In this article, we will use Keras to build and train an LSTM model for predicting stock prices.

First, we need to import the necessary libraries, including pandas, numpy, scikit-learn, Keras, matplotlib, and yfinance. We will also set the timezone and date range for our data and download stock data using the Yahoo Finance API.

Next, we will preprocess the data by removing missing values, scaling the data using the MinMaxScaler, and splitting the data into training and testing sets. We will also convert the data into sequences of a specified length using a custom function.

After preprocessing the data, we will reshape the data for use in the LSTM model. We will then train the model using Keras, making predictions on the test data and calculating the root mean squared error between the actual test data and predicted test data.

Finally, we will use the trained model to predict stock prices for the next 60 days and plot the actual and predicted stock prices.

In conclusion, LSTM is a powerful tool for modeling time series data, and Python provides several libraries for implementing LSTM models. By following the steps outlined in this article, you can build and train an LSTM model using Keras and make predictions on time series data.

Lets begin Coding,

Firstly, Import the necessary libraries required to run the program. Pandas, Numpy, and Matplotlib are used for data manipulation and visualization. Scikit-learn’s MinMaxScaler is used to scale the data, while Keras is used to build the LSTM model. The yfinance library is used to download stock price data, and pytz is used to set the timezone for the date range.

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime as dt
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
import yfinance as yf
import pytz

Now let’s set timezone and date range for our analysis: We set the timezone to Asia/Kolkata and set the date range for which the stock price data will be downloaded.

    tz = pytz.timezone("Asia/Kolkata")
    start = tz.localize(dt(2010,9,1))
    end = tz.localize(dt(2022,9,1))
    

    Now, lets download stock data: This code block downloads the stock price data for the company TCS.NS within the specified date range.

    tickers = "TCS.NS".split(",")
    data = yf.download(tickers, start, end)

    This code block below preprocesses the downloaded stock price data, The missing values are removed, and the closing price data is extracted and reshaped to a 2D array. The MinMaxScaler is then used to scale the data between 0 and 1.

    data = data.dropna() 
    close_price = data['Close'].values.reshape(-1, 1) 
    scaler = MinMaxScaler(feature_range=(0, 1)) 
    close_price = scaler.fit_transform(close_price)

    Split the data into training and testing sets: This code below splits the scaled data into training and testing sets. The first 80% of the data is used for training, while the remaining 20% is used for testing.

    train_size = int(len(close_price) * 0.8)
    train_data = close_price[:train_size, :]
    test_data = close_price[train_size:, :]

    Convert the data into sequences: This code block below converts the training and testing data into sequences of length 60. Each sequence consists of 60 consecutive closing price values, and the next closing price value is used as the output variable. The create_sequences() function takes as input the data and sequence length and returns two arrays X and y. X contains the input sequences, and y contains the corresponding output values.

    def create_sequences(data, seq_length):
        X, y = [], []
        for i in range(len(data)-seq_length-1):
            X.append(data[i:(i+seq_length), 0])
            y.append(data[i+seq_length, 0])
        return np.array(X), np.array(y)
    
    seq_length = 60 
    train_X, train_y = create_sequences(train_data, seq_length)
    test_X, test_y = create_sequences(test_data, seq_length)
    

    The train_X and test_X data are reshaped into 3D arrays with dimensions (number of samples, sequence length, number of features). In this case, we only have one feature, which is the closing stock price of TCS.NS. Therefore, the input shape for the LSTM model is (sequence length, 1).

    # Reshape the data for LSTM
    train_X = np.reshape(train_X, (train_X.shape[0], train_X.shape[1], 1))
    test_X = np.reshape(test_X, (test_X.shape[0], test_X.shape[1], 1))
    

    Next, we define the architecture of the LSTM model using the Keras library. The model has two LSTM layers, each with 50 units. The first LSTM layer returns sequences, while the second LSTM layer returns a single output. Finally, we add a fully connected layer with one neuron to make a single prediction.

    # Train the LSTM model
    model = Sequential()
    model.add(LSTM(units=50, return_sequences=True, input_shape=(train_X.shape[1], 1)))
    model.add(LSTM(units=50))
    model.add(Dense(units=1))
    

    We compile the model using the adam optimizer and mean squared error as the loss function.

    model.compile(optimizer='adam', loss='mean_squared_error')
    

    We train the model using the fit function. We pass the train_X and train_y data as inputs, and specify the number of epochs and batch size.

    model.fit(train_X, train_y, epochs=50, batch_size=32)
    

    After the model is trained, we make predictions on the test data using the predict function. We then inverse transform the predicted data to obtain the original scale of the closing stock prices.

    # Make predictions on test data
    test_pred = model.predict(test_X)
    test_pred = scaler.inverse_transform(test_pred) # Inverse scaling
    

    We calculate the root mean squared error between the actual test data and predicted test data.

    # Calculate the root mean squared error
    rmse = np.sqrt(np.mean(np.power((test_data[seq_length+1:] - test_pred), 2)))
    print("Root Mean Squared Error:", rmse)
    

    Finally, we make predictions for the closing stock prices for the next 60 days using the predict function. We concatenate the last seq_length days of the test data with zeros to create a sequence of length seq_length + future_days. We then scale this sequence and create overlapping sequences of length seq_length to feed into the LSTM model. We use the predict function to obtain the predicted closing stock prices for the next future_days.

    # Predict stock prices for next 60 days
    future_days = 60
    future_X = np.concatenate([test_data[-seq_length:], np.zeros(future_days).reshape(-1, 1)])
    future_X_scaled = scaler.transform(future_X)
    future_X_seq = np.array([future_X_scaled[i:i+seq_length, 0] for i in range(len(future_X_scaled)-seq_length)])
    future_X_seq = np.reshape(future_X_seq, (future_X_seq.shape[0], future_X_seq.shape[1], 1))
    future_y = model.predict(future_X_seq)
    future_y = scaler.inverse_transform(future_y)

    This code below is used to plot the predicted and actual stock prices for the last 60 days using the LSTM model.

    The first for loop prints the dates and predicted stock prices for the last 60 days. The data.index is the date index of the test data, and test_pred is the predicted stock prices obtained from the LSTM model.

    Then, a plot is created using matplotlib. The ax1 object is used to plot the predicted stock prices (test_pred) for the last future_days using the plot() function. The scatter() function is used to create markers on the predicted stock prices.

    The ax2 object is used to plot the actual stock prices (test_data[seq_length+1:]) for the last future_days, as well as the predicted movement for the next future_days using the plot() function. The scatter() function is used to create markers on the predicted movement for the next future_days.

    The set_xlabel() and set_ylabel() functions set the x-axis and y-axis labels for the plot. The set_xlim() and set_ylim() functions set the limits of the x-axis and y-axis respectively. The xaxis.set_major_locator() function sets the maximum number of x-tick labels to be displayed. Finally, plt.title() and plt.legend() functions set the title and legend of the plot respectively, and plt.show() function displays the plot.

    print("Test Predictions for Next 60 Days:")
    for i in range(-60, 0):
        print(data.index[i], test_pred[i])
    # Plot the actual and predicted stock prices
    fig, ax1 = plt.subplots(figsize=(12,6))
    ax1.plot(data.index[train_size+seq_length+1:], test_pred, label="Predicted data", color="orange")
    ax1.scatter(data.index[train_size+seq_length+1:], test_pred, marker=".", color="orange")
    ax1.set_xlabel("Date")
    ax1.set_ylabel("Stock Price")
    ax1.set_xlim(data.index[-180], data.index[-1])
    ax1.set_ylim(bottom=min(np.min(test_data[seq_length+1:]), np.min(test_pred))-10, 
                 top=max(np.max(test_data[seq_length+1:]), np.max(test_pred))+10)
    ax1.xaxis.set_major_locator(plt.MaxNLocator(6))
    ax1.grid(True)
    
    ax2 = ax1.twinx()
    ax2.plot(data.index[-future_days:], future_y, label=f"Predicted movement for next {future_days} days", color="green")
    ax2.scatter(data.index[-future_days:], future_y, marker=".", color="green")
    ax2.set_ylim(bottom=min(future_y)-10, top=max(future_y)+10)
    ax2.set_ylabel("Stock Price")
    ax2.grid(True)
    
    plt.title("Stock Prices Movement")
    plt.legend()
    plt.show()

    Results

    Test Predictions for Next 60 Days:

    2022-06-06 00:00:00 [3319.2917]
    2022-06-07 00:00:00 [3346.2803]
    2022-06-08 00:00:00 [3348.6687]
    2022-06-09 00:00:00 [3302.6406]
    2022-06-10 00:00:00 [3302.7617]
    2022-06-13 00:00:00 [3327.6484]
    2022-06-14 00:00:00 [3300.0955]
    2022-06-15 00:00:00 [3192.8208]
    2022-06-16 00:00:00 [3135.5945]
    2022-06-17 00:00:00 [3128.055]
    2022-06-20 00:00:00 [3096.9382]
    2022-06-21 00:00:00 [3048.1904]
    2022-06-22 00:00:00 [3043.7888]
    2022-06-23 00:00:00 [3114.6045]
    2022-06-24 00:00:00 [3159.8232]
    2022-06-27 00:00:00 [3219.1038]
    2022-06-28 00:00:00 [3231.1204]
    2022-06-29 00:00:00 [3236.3918]
    2022-06-30 00:00:00 [3238.13]
    2022-07-01 00:00:00 [3220.8428]
    2022-07-04 00:00:00 [3196.7869]
    2022-07-05 00:00:00 [3217.625]
    2022-07-06 00:00:00 [3186.115]
    2022-07-07 00:00:00 [3152.8262]
    2022-07-08 00:00:00 [3170.1455]
    2022-07-11 00:00:00 [3204.1228]
    2022-07-12 00:00:00 [3204.7126]
    2022-07-13 00:00:00 [3103.1963]
    2022-07-14 00:00:00 [3027.834]
    2022-07-15 00:00:00 [2981.922]
    2022-07-18 00:00:00 [2949.711]
    2022-07-19 00:00:00 [2940.6736]
    2022-07-20 00:00:00 [2986.6475]
    2022-07-21 00:00:00 [3020.6865]
    2022-07-22 00:00:00 [3083.151]
    2022-07-25 00:00:00 [3116.8857]
    2022-07-26 00:00:00 [3116.4822]
    2022-07-27 00:00:00 [3105.0222]
    2022-07-28 00:00:00 [3067.1245]
    2022-07-29 00:00:00 [3093.199]
    2022-08-01 00:00:00 [3160.8618]
    2022-08-02 00:00:00 [3217.6785]
    2022-08-03 00:00:00 [3232.413]
    2022-08-04 00:00:00 [3223.6597]
    2022-08-05 00:00:00 [3244.051]
    2022-08-08 00:00:00 [3266.4446]
    2022-08-10 00:00:00 [3280.1973]
    2022-08-11 00:00:00 [3288.5432]
    2022-08-12 00:00:00 [3277.922]
    2022-08-16 00:00:00 [3311.6838]
    2022-08-17 00:00:00 [3319.697]
    2022-08-18 00:00:00 [3311.1047]
    2022-08-19 00:00:00 [3309.882]
    2022-08-22 00:00:00 [3299.0095]
    2022-08-23 00:00:00 [3296.1455]
    2022-08-24 00:00:00 [3278.012]
    2022-08-25 00:00:00 [3225.7368]
    2022-08-26 00:00:00 [3183.5935]
    2022-08-29 00:00:00 [3150.0742]
    2022-08-30 00:00:00 [3144.557]

    Data Visualization of Stock Forecast

    Stock Price Forecast of TCS
    Stock Price Forecast of TCS
    %d bloggers like this: