Predicting Stock Prices with AdaBoost Algorithm: A Comprehensive Guide Using Python

Ada boost featured Image

AdaBoost, which stands for Adaptive Boosting, is a popular machine learning algorithm used for classification and regression tasks. It belongs to the family of ensemble learning algorithms that combines multiple base models to achieve better performance than any of the individual models. AdaBoost is particularly known for its ability to improve the accuracy of weak classifiers by weighting their predictions based on their individual accuracy. This allows it to create a strong classifier from a collection of weak classifiers.

How does AdaBoost Algorithm work?

In AdaBoost Algorithm, a set of weak classifiers is first trained on a given dataset. Weak classifiers are classifiers that have an accuracy slightly better than random guessing. After the initial training of the weak classifiers, AdaBoost assigns higher weights to the data points that were misclassified by the weak classifiers. The next round of training is then focused on these misclassified data points. The process is repeated for a set number of rounds, and the final model is obtained by combining the weak classifiers based on their individual accuracy and weighted according to their performance.

Example of using AdaBoost Algorithm for stock price prediction

In this article, we will explore how to use AdaBoost Algorithm to predict the stock price of RELIANCE.NS, the stock of Reliance Industries Limited, an Indian multinational conglomerate. We will use the yfinance Python package to retrieve the historical stock data from Yahoo Finance. We will then use AdaBoost Algorithm to predict the stock price for the next 60 days.

We start by importing the required libraries and setting the time zone to India Standard Time. We then retrieve the historical stock data from Yahoo Finance using the yfinance package. We split the dataset into train and test sets, with the last 60 days of the dataset used as the test set. We then convert the data into arrays for training and testing the model.

import pytz
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostRegressor
from datetime import datetime as dt
from sklearn.tree import DecisionTreeRegressor

tz = pytz.timezone("Asia/Kolkata")
start = tz.localize(dt(2001,8,1))
#end = tz.localize(dt.today())
end = tz.localize(dt(2022,8,1))

tickers = "RELIANCE.NS".split(",")
df = yf.download(tickers, start, end)

train_data = df[:len(df)-60]
test_data = df[len(df)-60:]

X_train = np.array(range(0,len(train_data))).reshape(-1, 1)
y_train = train_data['Adj Close'].values

X_test = np.array(range(len(train_data),len(df))).reshape(-1, 1)
y_test = test_data['Adj Close'].values

We then use AdaBoostRegressor from the scikit-learn package to train the model with a DecisionTreeRegressor as the base estimator. We set the number of estimators to 300 and the maximum depth of the decision tree to 4.

regr = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4), n_estimators=300, random_state=42)
regr.fit(X_train, y_train)

We use the trained model to predict the stock price for the next 60 days and store the predictions in a pandas DataFrame. After downloading the stock prices data, we split the data into training and testing sets, where the last 60 days are used as the testing set. We then use the AdaBoost algorithm, which is an ensemble learning technique that combines multiple weak learners to create a strong learner. The AdaBoost algorithm assigns weights to each sample in the training set based on their classification accuracy. Misclassified samples are given higher weights so that subsequent weak learners can focus on correctly classifying those samples.

In our case, we use the AdaBoostRegressor function from the scikit-learn package, which uses the AdaBoost algorithm to train a regression model. We use DecisionTreeRegressor as the base estimator for the AdaBoostRegressor function. The number of estimators is set to 300, which means that 300 decision trees are trained sequentially. The maximum depth of each decision tree is set to 4, which helps prevent overfitting of the training data.

Once the model is trained, we use it to predict the stock prices for the next 60 days, which we store in the next_days dataframe. We then plot the actual stock prices for the last 300 days along with the predicted stock prices for the next 60 days.

Finally, we print the predicted stock price for the last day of the test data using the tail function on the next_days dataframe. This gives us an idea of how well the model has performed.

next_days = pd.DataFrame(index=test_data.index, columns=test_data.columns)
next_days['Adj Close'] = regr.predict(X_test)

plt.figure(figsize=(10,5))
plt.plot(train_data.index[-300:], train_data['Adj Close'].tail(300), label='Train')
plt.plot(test_data.index[-300:], test_data['Adj Close'].tail(300), label='Test')
plt.plot(next_days.index, next_days['Adj Close'], label='Forecast')
plt.legend(loc='best')
plt.title('AdaBoost Algorithm')
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

print(next_days.tail(1))
Ada Boost Algorithm to predict Stock Price
Ada Boost Algorithm to Predict Stock Price: RELIANCE.NS
%d bloggers like this: