Effective machine learning model combination based on selective ensemble strategy for time series forecasting

doi:10.1016/j.ins.2022.09.002

Information Sciences

Volume 612, October 2022, Pages 994-1023

https://doi.org/10.1016/j.ins.2022.09.002 Get rights and content

Highlights

•
Selective ensemble time series forecasting technique named SMLE is proposed.
•
SMLE is composed of time series cross-validation (TSCV) and soft model selection algorithm (SSA).
•
TSCV is designed for assessing the ensemble performance.
•
SSA is designed to intelligently select high-performing models.
•
Extensive experiments and real-life case study are carried out.

Abstract

The success of ensemble forecasting heavily depends on the selection and combination of component models as proven by numerous studies that show the superior performance of ensemble forecasting in modeling volatile time series. This work develops a selective machine learning ensemble (SMLE) model that evaluates the validation accuracy of the model ensemble by applying a time series cross-validation method, and establish an efficient model combination process by using a novel soft selection algorithm for selecting and weighing several machine learning models, such as support vector regression, feedforward neural network, random forest, and gradient boosting decision tree. The well-known NN3 time series forecasting experiment and a real-world crude oil price forecasting application are used to verify the performance of SMLE. Numerical results and in-depth analysis indicate that the proposed model can provide more accurate and reliable predictions compared with those obtained via individual forecasts, advanced forecasting techniques, and ensemble strategies.

Introduction

Data mining and knowledge discovery are becoming increasingly popular as a result of advancements in information processing and data storage technologies. Time series forecasting, a data mining field that focuses on the modeling and analysis of temporal records, has also received much research attention in recent years. Many real-world applications also greatly depend on an accurate and efficient time series data forecasting [12], [22], [41].

Many time series forecasting methods have been developed up to this point, mainly falling under two categories, namely, statistical models and machine learning models. The development of statistical methods heavily depends on traditional statistical theories [23], which have strong hypotheses regarding the distribution of the sample and the paradigms of linear models, thereby only having a limited impact on non-stationary and non-linear time series. Due to these limitations, related studies have recommended the use of machine learning techniques to handle highly volatile time series with strong non-linear properties. Given that machine learning models can extract non-linear patterns adaptively and only have few prior assumptions about the sample distribution, they are particularly reliable for time series with unstable properties. Related studies have shown that machine learning models are considerably superior to traditional statistical methods [6]. Nevertheless, the application of machine learning models may still face challenges. For instance, selecting the incorrect model can lead to the issues of insufficient modeling and local optimum. Moreover, the best-trained model may not be the greatest for predicting future data given that temporal patterns can vary over time, and obtaining the ideal model structure can take much effort.

In this situation, ensemble forecasting, or combining the predictions of multiple models, presents a suitable option to overcome the limitations in using individual models. Ensemble forecasting improves the overall performance by balancing the biases introduced by several different models [7], and also streamlines the model building, model selection, and forecasting processes as a whole [16]. Related studies have also demonstrated an increasing preference for ensemble forecasting over single models in recent years [37], [40].

An effective model selection and model combination is crucial to the success of ensemble forecasting. Accordingly, scholars have focused on enhancing the performance of the ensemble model from these two aspects. However, the ensemble model still faces a number of obstacles that need to be addressed.

Many scholars agree that the ensemble model can perform better provided that its components can offer diverse and accurate predictions. For enhancing component diversity, heterogeneous ensemble strategies which combine different forecasting models have been widely considered [11], [18]. Apart from guaranteeing component diversity, selecting the candidate model owing high accuracy performance is equally important, and the quantitative performance evaluation criteria are thus crucial.

The information criteria, such as Akaike information criterion (AIC) and Bayesian information criterion (BIC), are adopted for model selection by using both the fitting accuracy and model complexity. Based on a weighted combination of these criteria and other accuracy measures, some scholars have also proposed their own sets of hybrid information criteria [49]. However, the model that is chosen based on these information criteria may not have the highest out-of-sample accuracy. Most traditional information criteria also require the fitting error to adhere to a Gaussian distribution, which is challenging to meet in practical situations.

Cross-validation evaluation methods, such as hold-out and K-fold, have been widely used in selecting suitable machine learning models given their ability to accurately reflect the generalizability of candidate models without requiring a probabilistic assumption. In the hold-out method, a portion at the end of a time series is chosen as an independent validation set for the model evaluation, whereas the time series observed before this validation set are used for the model generation. Given that the training time series comes before the validation time series in this method, the nature dependency of the time series (i.e., the future depends on the past) is respected [3], [8]. However, by ignoring the cross-use of data, the hold-out method can result in an unbalanced model evaluation. In other words, this method only uses one part of the samples to compute the error, and the properties of neither the remaining time series nor the unobserved future data can be well captured by this error alone [4]. By contrast, the K-fold method provides a more robust error estimation by using every component of the samples in evaluating the model and averaging all individual error measure values to obtain the final estimation of the candidate model. However, due to the issue of data leakage or modeling from hindsight (i.e., the past data are used for evaluating the model, whereas the future data are used for training the model), the K-fold method might be ineffective for time series samples. When left unaddressed, this problem may cause the produced model to absorb future patterns, thereby reducing its capacity to generalize those samples that have not yet been observed.

As a kind of special cross-validation method, time series cross-validation (TSCV) has recently been employed in quantitatively evaluating models based on time series data [4], [35]. This data-driven model selection approach makes no assumptions about the probability of the samples, and solves the problem of time series leakage by applying an effective sequential partition and evaluation technique. The majority of the current TSCV research have mainly focused on simple data preprocessing or single model evaluation. The potential of TSCV to be extended to the field of ensemble forecasting warrants further research.

The linear combination paradigm, which focuses on weighting the contributions of component models, has become the dominant technique in model combination, and scholars have mostly relied on a simple averaging technique that gives each model the same weight [37]. Given that different models contribute differently to the ensemble performance, several scholars suggest that the weights assigned to them should be unequally distributed, and typically, local performance or heuristic algorithms are used to determine the uneven weights [26]. In addition to linear model combination, recent studies have also considered the meta-learning paradigm, which exhibits a non-linear combination structure. This paradigm inputs into the second-stage non-linear model the metadata from the component forecasts and then generates the ultimate ensemble forecasting results by employing the stacking strategy [27].

Although model combination has attracted much research interest, only few studies have looked at the possibility of performance degradation as a result of a potential disconnect between model combination and selection. Because the creation and combination of components in ensemble forecasting are often conducted individually, the component models are thus typically selected based on their local performance instead of the performance of the ensemble model, which is thought to be more significant. The component models are also joined together without further selection because the model selection is completed prior to the model combination. As mentioned above, the component models are constructed based on their local accuracy, which may lead to the issue of excessive ensemble because the performance of the ensemble model may not be significantly improved by some of the selected models. The risk of excessive ensemble is particularly high when the ensemble model aggregates a large number of members, e.g., bootstrap aggregation (bagging) [20]. To address this problem, some scholars have tried to concurrently select multiple component models based on their contributions to ensemble performance [45]. However, relevant researches frequently ignore the impact of model weights and less consider to integrate it into the procedure of sub-model set selection, thereby leading to a local optimum.

To close the aforementioned gaps, the selective ensemble strategy has become a promising technique for addressing the risk inherent in the available ensemble forecasting approaches by integrating the model selection and model weights optimization procedures. It has been evidenced that the selective ensemble strategy can help build effective forecasting performance with an efficient ensemble structure [24], [43], while researches and applications of the selective ensemble strategy on time series forecasting are still in infancy so far.

Section snippets

Main work and contribution

To improve time series forecasting performance, this study develops a novel technique called selective machine learning ensemble (SMLE), which comprises the TSCV strategy and soft selection algorithm (SSA). This technique offers several contributions to research as highlighted below:

(a)
By rationally taking into account the optimization of ensemble model performance, the SMLE technique achieves a high-quality integration of model selection and model combination, in which the choice of component

Machine learning models for time series forecasting

Time series forecasting has been performed in previous studies using several machine learning techniques, including support vector regression (SVR), artificial neural network (ANN), and tree-based models.

SVR is a type of statistical learning regression model that nonlinearly maps the original input space to a higher-dimensional feature space to produce a maximum margin hyperplane. SVR models have been widely used in time series forecasting due to their stable convergence and quick training. For

Ensemble forecasting paradigm

The objective of ensemble forecasting is to provide an h-step-ahead time series prediction by utilizing M component models and a window of l past observations given a univariate time series with previous values up to time t (i.e., [y₁,y₂,…,y_t]). Ensemble forecasting can be formulated as Eq. (1), where x_t = [y_t_-_l,y_t_-_l₊₁,…,y_t] denotes an autoregressive vector comprising the l past observations of future value [y_t₊₁,…, y_t₊_h], w_m represents the weight assigned to the m^th model, and f_m represents

Empirical experiment

The modeling and forecasting effectiveness of SMLE is empirically tested in this section by using the classical NN3 time series dataset, which has also been widely used in the literature to evaluate the existing time series forecasting methods for ensemble models. The experiment compares the results of the proposed SMLE with those of single forecasting, ensemble forecasting, and other advanced techniques from the literature.

Model discussion

This section analyzes the computational complexity of SMLE and tests the effectiveness of TSCV and SSA to determine its superiority.

Extended case study: Crude oil price forecasting

The proposed SMLE is subjected to an extended case study to further evaluate its capacity in addressing real-world time series forecasting problems. The case study uses a monthly time series of WTI crude oil spot prices. The following subsections discuss the experiment process, analyzes the forecasting results, and compares the performance of the proposed SMLE with some popular benchmarks.

Conclusions and future research

Many scholars employ ensemble techniques to boost the performance of their time series forecasting models. To achieve greater time series forecasting efficiency, this study proposes SMLE, which integrates TSCV (to assess the ensemble model performance) and SSA (to select and weigh the models based on the optimized TSCV-based ensemble accuracy) to selectively combines various machine learning models, such as SVR, FNN, RF, and GBDT. An NN3 time series forecasting experiment and a WTI crude oil

CRediT authorship contribution statement

Sheng-Xiang Lv: Conceptualization, Methodology, Software, Investigation, Writing – original draft. Lu Peng: Methodology, Investigation, Validation. Huanling Hu: Software, Visualization, Validation. Lin Wang: Conceptualization, Methodology, Investigation, Writing - review & editing, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is partially supported by the National Natural Science Foundation of China (72001046; 71771095) and the National Social Science Foundation of China (20&ZD126).

References (49)

Y. Bai et al.
A manufacturing quality prediction model based on AdaBoost-LSTM with rough knowledge
Comput. Ind. Eng.
(2021)
D.K. Barrow et al.
Cross-validation aggregation for combining autoregressive neural network forecasts
Int. J. Forecast.
(2016)
C. Bergmeir et al.
A note on the validity of cross-validation for evaluating autoregressive time series prediction
Comput. Stat. Data Anal.
(2018)
A. Callens et al.
Using Random forest and Gradient boosting trees to improve wave forecast at a specific location
Appl. Ocean Res.
(2020)
M.A. Castán-Lascorz et al.
A new hybrid method for predicting univariate and multivariate time series based on pattern forecasting
Inf. Sci.
(2022)
W. Chen et al.
A novel method for time series prediction based on error decomposition and nonlinear combination of forecasters
Neurocomputing
(2021)
G. Chitalia et al.
Robust short-term electrical load forecasting framework for commercial buildings using deep recurrent neural networks
Appl. Energy
(2020)
J.F.L. de Oliveira et al.
A hybrid optimized error correction system for time series forecasting
Appl. Soft Comput.
(2020)
S. Ding et al.
An optimized twin support vector regression algorithm enhanced by ensemble empirical mode decomposition and gated recurrent unit
Inf. Sci.
(2022)
Y. Dong et al.
Electrical load forecasting: A deep learning approach based on K-nearest neighbors
Appl. Soft Comput.
(2021)

L. Du et al.

Bayesian optimization based dynamic ensemble for time series forecasting

Inf. Sci.

(2022)

Y. Fan et al.

A backpropagation learning algorithm with graph regularization for feedforward neural networks

Inf. Sci.

(2022)

M.N. Fekri et al.

Deep learning for load forecasting with smart meter data: Online adaptive recurrent neural network

Appl. Energy

(2021)

S. González et al.

A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities

Inf. Fusion

(2020)

Q. Gu et al.

Forecasting Nickel futures price based on the empirical wavelet transform and gradient boosting decision trees

Appl. Soft Comput.

(2021)

W. Guo et al.

Machine-Learning based methods in short-term load forecasting

The Electricity Journal

(2021)

B.A. Hassan et al.

Operational framework for recent advances in backtracking search optimisation algorithm: A systematic review and performance evaluation

Appl. Math. Comput.

(2020)

E. Hillebrand et al.

Bagging weak predictors

Int. J. Forecast.

(2021)

R.J. Hyndman

A brief history of forecasting competitions

Int. J. Forecast.

(2020)

P. Jiang et al.

A developed hybrid forecasting system for energy consumption structure forecasting based on fuzzy time series and information granularity

Energy

(2021)

W. Jiang et al.

Holt-Winters smoothing enhanced by fruit fly optimization algorithm to forecast monthly electricity consumption

Energy

(2020)

H. Jin et al.

Probabilistic wind power forecasting using selective ensemble of finite mixture Gaussian process regression models

Renewable Energy

(2021)

G. Lin et al.

Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient

Inf. Sci.

(2022)

H. Liu et al.

A hybrid neural network model for short-term wind speed forecasting based on decomposition, multi-learner ensemble, and adaptive multiple error corrections

Renewable Energy

(2021)

Cited by (33)

A combination prediction model based on Theil coefficient and induced continuous aggregation operator for the prediction of Shanghai composite index
2024, Expert Systems with Applications
This paper proposes an interval combination prediction model for Shanghai composite index, utilizing the Theil coefficient and the induced continuous generalized ordered weighted logarithmic harmonic averaging (ICGOWLHA) operator. The effectiveness of the proposed model under specific weight conditions and the existence of its analytical solution are demonstrated. Shanghai composite index's case analysis demonstrates that, in terms of interval root mean squared error (IRMSE), interval mean absolute error (IMAE), interval mean absolute percentage error (IMAPE), and interval mean squared percentage error (IMSPE), the proposed model's predictive performance improvements over the best-performing single prediction model are 29.33%, 25.72%, 26.10%, and 28.86%, respectively. At the same time, the theoretical properties of the model are verified in the results of the case analysis, and the model's convergence is reflected in sensitivity analysis. Through extensive model comparisons, it is observed that the model proposed in this paper exhibits strong generalization, without specific limitations on data size or feature count. It demonstrates good aggregation prediction performance for interval data. Moreover, it is applicable to various fields, including finance, environment, and others.
A novel hierarchical feature selection with local shuffling and models reweighting for stock price forecasting
2024, Expert Systems with Applications
Stock price forecasting is a challenging task due to the complexity of financial markets and the high volatility of stocks. Because of the strong nonlinear representation ability of neural network models, such as long-short memory network (LSTM) and deep learning, they are used extensively in recent years for stock price forecasting. However, due to the volatility of financial data, neural network models often suffer from overfitting or instability problems. In addition, quantitative trading with feature mining tools can generate a growing number of features for financial data. Therefore, selecting effective features for financial data is an urgent problem. To address these problems, we propose a novel hierarchical feature selection with local shuffling (HFSLS) and models reweighting (MR) based on LSTM, named HFSLSMR-LSTM, for stock price forecasting. Specifically, for each layer, local shuffling perturbs each feature to re-predict, and its predicted value is compared with the true value to calculate the feature importance, and the important features are selected and returned to the next layer. Besides, a proximity reweighting scheme is presented to adjust the weight for each layer model that learns from hierarchical features. The HFSLSMR-LSTM model is still effective for financial data with multiple features and frequent fluctuations. Experimental results on stock index dataset and Dow Jones Industrial Average dataset demonstrate that the HFSLSMR-LSTM outperforms Informer, DoubleEnsemble, LSTM, GRU, BI-GRU and BI-LSTM on the metrics MSE, RMSE, MAE, MAPE and $R^{2}$ .
A wind speed forecasting system for the construction of a smart grid with two-stage data processing based on improved ELM and deep learning strategies
2024, Expert Systems with Applications
The operation and scheduling management of smart grids are important aspects, and wind speed forecasting modules are indispensable in wind power system management. Researchers have contributed significantly to the development of accurate forecasting models. However, predicting the ideal performance remains a daunting task. Data preprocessing strategies are widely used to process the original wind speed sequences. To develop the utility of the data preprocessing module, a novel forecasting framework based on a two-stage data processing method is designed in this study. The designed system combines singular spectrum analysis and variational mode decomposition methods to decompose the trend term and multiple components of the residual term of the sequence to effectively capture its inherent characteristics. In addition, a multi-objective optimization strategy was applied to determine the weights of the prediction sequences obtained using deep learning techniques and improved extreme learning machine algorithms to obtain accurate forecasting results. The experimental results verified that the proposed wind forecasting framework is better than other benchmark comparison models, thereby establishing a feasible solution for wind speed forecasting and a powerful tool for power grid operation management.
Grid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting
2024, Applied Soft Computing
Financial time series forecasting is a difficult task due to the complexity and volatility of financial markets. Machine learning models have been applied to tackle this task, but finding their optimal hyper-parameters with less time and ensuring the prediction accuracy of models are still significant challenges. Existing methods such as GridSearch with cross-validation (GridsearchCV) for optimizing the hyper-parameters are time-consuming for complex models or large search spaces, and they do not ensure that the model has excellent predictive accuracy. To address these challenges, we propose a novel method called GridsearchWEF that uses grid search with a weighted error function. This method aims to reduce the time cost of hyper-parameter optimization for machine learning models and guarantee their prediction performance. We conduct an empirical analysis of crude oil return forecasting using four machine learning models: RF, GBDT, SVR, and LASSO. We compare the performance of these models using GridsearchCV, random search with cross-validation (RandomizedSearchCV), Bayes optimization with cross-validation (BayesSearchCV), and GridsearchWEF. The results show that GridsearchWEF outperforms the other methods in terms of hyper-parameter optimization, modeling efficiency, prediction accuracy, and economic values. In particular, the time of all models using GridSearchWEF is less than 30 s, which is much less than other algorithms. GridsearchWEF is a more efficient and superior method for hyper-parameter optimization in financial time series forecasting.
Progressive neural network for multi-horizon time series forecasting
2024, Information Sciences
In this paper, we introduce ProNet, an novel deep learning approach designed for multi-horizon time series forecasting, adaptively blending autoregressive (AR) and non-autoregressive (NAR) strategies. Our method involves dividing the forecasting horizon into segments, predicting the most crucial steps in each segment non-autoregressively, and the remaining steps autoregressively. The segmentation process relies on latent variables, which effectively capture the significance of individual time steps through variational inference. In comparison to AR models, ProNet showcases remarkable advantages, requiring fewer AR iterations, resulting in faster prediction speed, and mitigating error accumulation. On the other hand, when compared to NAR models, ProNet takes into account the interdependency of predictions in the output space, leading to improved forecasting accuracy. Our comprehensive evaluation, encompassing four large datasets, and an ablation study, demonstrate the effectiveness of ProNet, highlighting its superior performance in terms of accuracy and prediction speed, outperforming state-of-the-art AR and NAR forecasting models.
International oil shocks and the volatility forecasting of Chinese stock market based on machine learning combination models
2024, North American Journal of Economics and Finance
This paper aims to forecast the volatility of Chinese stock market under the effects of international crude oil shocks. Eight individual models, including multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), support vector regression (SVR), artificial neural network (ANN), recurrent neural network (RNN), long short-term memory (LSTM) network, gated recurrent unit (GRU) and bidirectional gated recurrent unit (BiGRU) are constructed. The realized volatilities of the CSI 300 index and ten primary sector indices are taken as explained variables, respectively. Four oil shock indicators and the autoregressive terms of the realized volatilities are taken as explanatory variables. The SHAP method is used to analyze their effects on the stock indices. Based on eight individual models, four kinds of combination models, i.e., a mean combination (Mean), a median combination (Median), a trimmed mean combination (Trimmed Mean), and two discount mean squared forecasting error combinations (DMSPE (1) and DMSPE (0.9)) are proposed. We compare forecasting performance between combination and individual ones. Empirical results show that the effects of international crude oil shocks on Chinese stock market are significant and have strong predictability. The effects on the energy, industry, optional consumption, and public sectors are greater than those on the CSI 300 and other sectors. Most of the combination models can effectively improve forecasting accuracy. In addition, by changing the benchmark model, the lengths of the rolling window, and the historical lengths of oil shock indicators, we find that most of the combination models are robust in volatility forecasting. This study is of guiding significance for individual and institutional investors to understand the operating mechanism of Chinese stock markets.

View all citing articles on Scopus

View full text

Effective machine learning model combination based on selective ensemble strategy for time series forecasting

Highlights

Abstract

Introduction

Section snippets

Main work and contribution

Machine learning models for time series forecasting

Ensemble forecasting paradigm

Empirical experiment

Model discussion

Extended case study: Crude oil price forecasting

Conclusions and future research

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Comput. Ind. Eng.

Int. J. Forecast.

Comput. Stat. Data Anal.

Appl. Ocean Res.

Inf. Sci.

Neurocomputing

Appl. Energy

Appl. Soft Comput.

Inf. Sci.

Appl. Soft Comput.

Inf. Sci.

Inf. Sci.

Appl. Energy

Inf. Fusion

Appl. Soft Comput.

The Electricity Journal

Appl. Math. Comput.

Int. J. Forecast.

Int. J. Forecast.

Energy

Energy

Renewable Energy

Inf. Sci.

Renewable Energy