Time series analysis is a crucial aspect of data science that is used to understand the patterns and trends in data that changes over time. Seasonality is one of the most significant challenges in time series analysis, as it can obscure the underlying patterns of the data. The seasonal autoregressive integrated moving average SARIMA model, also known as the Seasonal ARIMA model, is an advanced statistical method that helps in analyzing and forecasting time series data that exhibit seasonality. In this article, we will explore the SARIMA model and its applications in trend analysis using Python.
Time series data is a collection of observations taken at regular intervals over time. It can be used to study patterns and trends in data, such as stock prices, weather data, and economic indicators. The SARIMA model is used to identify and model the underlying patterns and trends in the data.
The SARIMA model is based on four key components: the autoregressive (AR) component, the integrated (I) component, the moving average (MA) component, and the seasonal (S) component. The AR component models the relationship between the current value of the time series and its past values. The I component models the trend in the data, by differencing the time series to make it stationary. The MA component models the relationship between the current value of the time series and its past errors. Finally, the S component models the seasonality in the data.
The SARIMA model is typically denoted as SARIMA(p,d,q)(P,D,Q)m, where p, d, and q are the parameters for the AR, I, and MA components, P, D, and Q are the parameters for the seasonal AR, I, and MA components, and m is the frequency of the seasonality. The parameters for the model are determined through a process called parameter estimation, which involves selecting the best combination of values that minimize the difference between the model’s predictions and the actual data.
One of the key benefits of the SARIMA model is its ability to capture seasonal patterns in the data. This is particularly useful for forecasting time series data with a seasonal component, such as sales data or stock prices. By modeling the seasonal patterns, the SARIMA model can provide more accurate forecasts than other methods that do not account for seasonality.
Another benefit of the SARIMA model is its flexibility. It can be used to model a wide range of time series data, from daily stock prices to monthly economic indicators. The SARIMA model can also be extended to include exogenous variables, such as weather data or demographic information, to improve the accuracy of the forecasts.
The SARIMA model is a powerful tool for analyzing and forecasting time series data with seasonality. By incorporating the autoregressive, integrated, moving average, and seasonal components, the SARIMA model can capture the underlying patterns and trends in the data. With its flexibility and ability to capture seasonality, the SARIMA model is widely used in fields such as finance, economics, and meteorology.
Let’s Code and Analyze historical stock data
This code is used to analyze the historical stock data of the company “AAPL” (Apple Inc.) for the past 5 years (from 2018-03-14 to 2023-03-14) using time series analysis. The analysis is conducted in several steps.
Step 1: Importing necessary libraries In the first few lines, the necessary libraries for data analysis and visualization are imported, including pandas, matplotlib, statsmodels, pmdarima, and yfinance.
Step 2: Getting historical stock data The next step is to retrieve the historical stock data for the company “AAPL” for the specified time period (from 2018-03-14 to 2023-03-14) using the yfinance library. The data is stored in the “stock_data” dataframe.
Step 3: Renaming columns The column names in the “stock_data” dataframe are renamed by replacing spaces with underscores.
Step 4: Plotting data The “Close” column of the “stock_data” dataframe is plotted using the matplotlib library to visualize the stock price trends over the specified time period.
Step 5: Checking for stationarity The ADF (Augmented Dickey-Fuller) test is used to check the stationarity of the time series data. The ADF test is a statistical hypothesis test that determines whether a unit root is present in a time series dataset. A unit root indicates non-stationarity, which means that the mean, variance, and autocorrelation of the series are not constant over time. The ADF test is used to check whether the data needs to be differenced (i.e., made stationary) before modeling. In this code, the adf_test.should_diff() function is used to perform the ADF test and determine whether the data needs to be differenced.
Step 6: Decomposing data The time series data is decomposed into its various components (trend, seasonality, and residual) using the seasonal_decompose() function from the statsmodels library. This function decomposes the time series data into its components using either additive or multiplicative models.
Step 7: Plotting decomposed data The decomposed data is plotted using the matplotlib library. The decomposed data includes the observed data, trend, seasonality, and residual components. These plots can be used to visualize the individual components of the time series data and understand how they contribute to the overall trend.
Step 8: Fitting SARIMA model Finally, a Seasonal Autoregressive Integrated Moving Average (SARIMA) model is fitted to the time series data using the auto_arima() function from the pmdarima library. SARIMA models are commonly used for time series analysis as they can capture both the trend and seasonality in the data. The auto_arima() function automatically selects the best SARIMA model parameters based on the AIC (Akaike Information Criterion) score.
Benefits of using the SARIMA model: The SARIMA model is a powerful tool for analyzing time series data as it can capture both the trend and seasonality in the data. This model is commonly used in the financial industry to forecast stock prices, as it can capture the complex patterns and trends in the stock market. Additionally, the auto_arima() function used in this code automatically selects the best SARIMA model parameters based on the AIC score, making it easy to fit the model to the data without manual tuning of the parameters.
Python Code
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from pmdarima.arima import ADFTest
from pmdarima.arima import auto_arima
import yfinance as yf
# Get historical stock data for 5 years of "AAPL"
ticker = "AAPL"
start_date = "2018-03-14"
end_date = "2023-03-14"
stock_data = yf.download(ticker, start=start_date, end=end_date)
# Rename columns
stock_data.columns = [col.replace(' ', '_') for col in stock_data.columns]
# Plot data
stock_data['Close'].plot(figsize=(10,5))
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('AAPL Stock Data')
# Check for stationarity
adf_test = ADFTest(alpha=0.05)
adf_result = adf_test.should_diff(stock_data['Close'].values)
print("ADF Test Result")
print(adf_result)
# Decompose data
decomposition = seasonal_decompose(stock_data['Close'], model='additive', period=1)
# Plot decomposed data
fig, axes = plt.subplots(ncols=1, nrows=4, sharex=True, figsize=(10,8))
decomposition.observed.plot(ax=axes[0], legend=False)
axes[0].set_ylabel('Observed')
decomposition.trend.plot(ax=axes[1], legend=False)
axes[1].set_ylabel('Trend')
decomposition.seasonal.plot(ax=axes[2], legend=False)
axes[2].set_ylabel('Seasonal')
decomposition.resid.plot(ax=axes[3], legend=False)
axes[3].set_ylabel('Residual')
plt.tight_layout()
plt.show()
# Fit SARIMA model
model = auto_arima(stock_data['Close'], seasonal=True, m=12, suppress_warnings=True)
print(model.summary())
Results
The ADF test result shows that the p-value is greater than the critical value, indicating that we cannot reject the null hypothesis that the time series is non-stationary. This suggests that we need to difference the time series in order to make it stationary.
The SARIMA model summary shows that the model selected is a SARIMA(0,1,1) model, meaning that the first order difference of the time series was used to make it stationary. The model includes an intercept term and a moving average term with a coefficient of -0.0590. The Ljung-Box test for autocorrelation at lag 1 shows that the model residuals do not exhibit significant autocorrelation, and the Jarque-Bera test for normality shows that the residuals are not normally distributed. The model also exhibits heteroskedasticity, indicating that the variance of the residuals is not constant over time.
In terms of AAPL, these results suggest that the stock price of AAPL is likely to exhibit some degree of volatility over time, as indicated by the heteroskedasticity. The SARIMA(0,1,1) model suggests that changes in the stock price are influenced by its previous values, but not by any seasonal patterns. However, the non-normality of the residuals and the need to difference the time series suggest that the model may not fully capture the underlying dynamics of the stock price. It is important to note that these results are based on a limited time period and may not be generalizable to other time periods.
You must log in to post a comment.