Time Series Forecasting: Moving Average, Exponential Smoothing and SARIMA

Javed Afroz
6 min readJul 10, 2023

In my previous article, we went through different techniques used for data analysis and cleaning to prepare the data for forecasting. Upon concluding the analysis and refining of the data, we successfully achieve our objective of obtaining a pristine dataset that is free from any missing values, outliers, or other anomalies.

Now, we will continue working with the same dataset that was prepared in the previous article and explore further steps in the forecasting process.

Train Test Split

Let’s split the data between test and train. We will use 85% of the data as train data and remaining as test data.

With this, we have successfully splitted the data into train and test subsets. We will use these subsets in the forecasting methods below.

Forecasting Method 1: Simple Moving Average

The Simple Moving Average (SMA) is a commonly used method for time series forecasting. It is a statistical technique that calculates the average value of a series over a specified number of periods or time intervals. The SMA method assumes that past values are representative of future patterns and uses this assumption to forecast future values

In the above code I have taken the moving average window of 6 months. Below is the plot of test, train and forecasted data sets.

Now, let’s check the error to evaluate the accuracy.

The model is less than 50% accurate which means it is not a good model for this data set. It’s important to note that the SMA method is a simple and intuitive approach, but it has some limitations. It may not capture complex patterns or changes in the underlying data. Additionally, it gives equal weight to all data points within the window, which may not be suitable for all time series. Other advanced methods like exponential smoothing, ARIMA, or machine learning algorithms can be used to overcome these limitations and provide more accurate forecasts.

Forecasting Method 2: Simple Exponential Smoothing

Simple Exponential Smoothing (SES) is a basic time series forecasting method that assigns exponentially decreasing weights to past observations.

Consider below code. Here we are importing SimpleExpSmoothing from the statsmodel package to create the model

Above I have used smoothing level of 0.001 to get the best accuracy. Below is the plot of test, train and forecasted data sets.

Now, let’s check the error to evaluate the accuracy.

The model is roughly 53.2% accurate which is certaining better than before but still not a good model.

Forecasting Method 3: Seasonal Autoregressive Integrated Moving Average (SARIMA)

SARIMA, which stands for Seasonal Autoregressive Integrated Moving Average, is a time series forecasting model that combines the ARIMA model with seasonality. It is an extension of the ARIMA model and is designed to handle time series data that exhibit seasonal patterns.

ARIMA models are effective for forecasting stationary time series data (data with a constant mean and variance). However, many real-world time series exhibit seasonality, where patterns repeat at regular intervals, such as daily, monthly, or yearly. SARIMA addresses this by incorporating seasonal differencing and seasonal components into the ARIMA model.

Confirm Stationarity

To confirm the stationarity of a time series data, there are two commonly used methods: the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. These tests examine different aspects of stationarity and can be used to complement each other in assessing the stationarity of a dataset.

  1. Augmented Dickey-Fuller (ADF) Test. The ADF test examines the presence of a unit root, indicating non-stationarity. If the p-value is below a chosen significance level, the series is considered stationary.
ADF Test

In above code dataFinal is the data that we prepared in the last article. The code gives p-value of 0.016 which is less than the critical value of 0.05 which means according to this test series is stationary

2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test: The KPSS test focuses on the null hypothesis of stationarity. If the p-value is above the significance level, the series is stationary.

KPSS Test

The code gives p-value of 0.1 which is more than the critical value of 0.05 which means that null hypothesis is reject and according to this, the series is stationary.

Now Let’s apply the SARIMA method to get the model parameters which will be used for fitting

Below is the output which shows multiple steps done to evaluate the model and provide the paramters for the same.

Let’s check model summary.

Now, with the model parameters in hand, let’s fit the model and do the prediction for the test data

The forecast is now ready, let’s plot the same to check how it looks with train and test data sets.

The above looks pretty good and better than previous models. Let’s check the error and accuracy of the model.

The model is showing the accuracy of roughly 98.5% which makes this model the best accuracy of all models. Therefore this is our final model which can be used for forecasting wind power generation.

Conclusion

Our analysis of the wind power data set involved testing three different models: Simple Moving Average, Simple Exponential Smoothing, and SARIMA. The results revealed significant variations in accuracy among the models.

The Simple Moving Average method, utilizing a moving window of 6 months, yielded an accuracy of less than 50%. This approach may not be suitable for capturing the complex patterns and fluctuations present in the wind power data set.

The Simple Exponential Smoothing method, on the other hand, performed slightly better with an accuracy of 46%. While an improvement over the Simple Moving Average, it still fell short in accurately forecasting the wind power values.

Finally, the SARIMA model demonstrated the highest accuracy, reaching an impressive 98.5%. This method incorporates the seasonal, autoregressive, and moving average components, allowing it to capture the inherent patterns and dependencies within the time series data.

It is important to note that other models, such as Artificial Neural Network (ANN) or Recurrent Neural Network (RNN) based LSTM models, may offer even higher accuracy depending on the specific use case and characteristics of the data set. These models are known for their ability to capture complex relationships and temporal dependencies, making them particularly suitable for time series analysis.

In future analyses, it would be worthwhile to explore the potential of ANN or LSTM models on the wind power data set, as they may provide further improvements in accuracy. However, it is essential to carefully evaluate the specific requirements and characteristics of the data before selecting the most appropriate modeling approach.

--

--

Javed Afroz

Javed is a solution architect with 15 years of experience in diverse technology domains.