Effective machine learning model combination based on selective ensemble strategy for time series forecasting
Introduction
Data mining and knowledge discovery are becoming increasingly popular as a result of advancements in information processing and data storage technologies. Time series forecasting, a data mining field that focuses on the modeling and analysis of temporal records, has also received much research attention in recent years. Many real-world applications also greatly depend on an accurate and efficient time series data forecasting [12], [22], [41].
Many time series forecasting methods have been developed up to this point, mainly falling under two categories, namely, statistical models and machine learning models. The development of statistical methods heavily depends on traditional statistical theories [23], which have strong hypotheses regarding the distribution of the sample and the paradigms of linear models, thereby only having a limited impact on non-stationary and non-linear time series. Due to these limitations, related studies have recommended the use of machine learning techniques to handle highly volatile time series with strong non-linear properties. Given that machine learning models can extract non-linear patterns adaptively and only have few prior assumptions about the sample distribution, they are particularly reliable for time series with unstable properties. Related studies have shown that machine learning models are considerably superior to traditional statistical methods [6]. Nevertheless, the application of machine learning models may still face challenges. For instance, selecting the incorrect model can lead to the issues of insufficient modeling and local optimum. Moreover, the best-trained model may not be the greatest for predicting future data given that temporal patterns can vary over time, and obtaining the ideal model structure can take much effort.
In this situation, ensemble forecasting, or combining the predictions of multiple models, presents a suitable option to overcome the limitations in using individual models. Ensemble forecasting improves the overall performance by balancing the biases introduced by several different models [7], and also streamlines the model building, model selection, and forecasting processes as a whole [16]. Related studies have also demonstrated an increasing preference for ensemble forecasting over single models in recent years [37], [40].
An effective model selection and model combination is crucial to the success of ensemble forecasting. Accordingly, scholars have focused on enhancing the performance of the ensemble model from these two aspects. However, the ensemble model still faces a number of obstacles that need to be addressed.
Many scholars agree that the ensemble model can perform better provided that its components can offer diverse and accurate predictions. For enhancing component diversity, heterogeneous ensemble strategies which combine different forecasting models have been widely considered [11], [18]. Apart from guaranteeing component diversity, selecting the candidate model owing high accuracy performance is equally important, and the quantitative performance evaluation criteria are thus crucial.
The information criteria, such as Akaike information criterion (AIC) and Bayesian information criterion (BIC), are adopted for model selection by using both the fitting accuracy and model complexity. Based on a weighted combination of these criteria and other accuracy measures, some scholars have also proposed their own sets of hybrid information criteria [49]. However, the model that is chosen based on these information criteria may not have the highest out-of-sample accuracy. Most traditional information criteria also require the fitting error to adhere to a Gaussian distribution, which is challenging to meet in practical situations.
Cross-validation evaluation methods, such as hold-out and K-fold, have been widely used in selecting suitable machine learning models given their ability to accurately reflect the generalizability of candidate models without requiring a probabilistic assumption. In the hold-out method, a portion at the end of a time series is chosen as an independent validation set for the model evaluation, whereas the time series observed before this validation set are used for the model generation. Given that the training time series comes before the validation time series in this method, the nature dependency of the time series (i.e., the future depends on the past) is respected [3], [8]. However, by ignoring the cross-use of data, the hold-out method can result in an unbalanced model evaluation. In other words, this method only uses one part of the samples to compute the error, and the properties of neither the remaining time series nor the unobserved future data can be well captured by this error alone [4]. By contrast, the K-fold method provides a more robust error estimation by using every component of the samples in evaluating the model and averaging all individual error measure values to obtain the final estimation of the candidate model. However, due to the issue of data leakage or modeling from hindsight (i.e., the past data are used for evaluating the model, whereas the future data are used for training the model), the K-fold method might be ineffective for time series samples. When left unaddressed, this problem may cause the produced model to absorb future patterns, thereby reducing its capacity to generalize those samples that have not yet been observed.
As a kind of special cross-validation method, time series cross-validation (TSCV) has recently been employed in quantitatively evaluating models based on time series data [4], [35]. This data-driven model selection approach makes no assumptions about the probability of the samples, and solves the problem of time series leakage by applying an effective sequential partition and evaluation technique. The majority of the current TSCV research have mainly focused on simple data preprocessing or single model evaluation. The potential of TSCV to be extended to the field of ensemble forecasting warrants further research.
The linear combination paradigm, which focuses on weighting the contributions of component models, has become the dominant technique in model combination, and scholars have mostly relied on a simple averaging technique that gives each model the same weight [37]. Given that different models contribute differently to the ensemble performance, several scholars suggest that the weights assigned to them should be unequally distributed, and typically, local performance or heuristic algorithms are used to determine the uneven weights [26]. In addition to linear model combination, recent studies have also considered the meta-learning paradigm, which exhibits a non-linear combination structure. This paradigm inputs into the second-stage non-linear model the metadata from the component forecasts and then generates the ultimate ensemble forecasting results by employing the stacking strategy [27].
Although model combination has attracted much research interest, only few studies have looked at the possibility of performance degradation as a result of a potential disconnect between model combination and selection. Because the creation and combination of components in ensemble forecasting are often conducted individually, the component models are thus typically selected based on their local performance instead of the performance of the ensemble model, which is thought to be more significant. The component models are also joined together without further selection because the model selection is completed prior to the model combination. As mentioned above, the component models are constructed based on their local accuracy, which may lead to the issue of excessive ensemble because the performance of the ensemble model may not be significantly improved by some of the selected models. The risk of excessive ensemble is particularly high when the ensemble model aggregates a large number of members, e.g., bootstrap aggregation (bagging) [20]. To address this problem, some scholars have tried to concurrently select multiple component models based on their contributions to ensemble performance [45]. However, relevant researches frequently ignore the impact of model weights and less consider to integrate it into the procedure of sub-model set selection, thereby leading to a local optimum.
To close the aforementioned gaps, the selective ensemble strategy has become a promising technique for addressing the risk inherent in the available ensemble forecasting approaches by integrating the model selection and model weights optimization procedures. It has been evidenced that the selective ensemble strategy can help build effective forecasting performance with an efficient ensemble structure [24], [43], while researches and applications of the selective ensemble strategy on time series forecasting are still in infancy so far.
Section snippets
Main work and contribution
To improve time series forecasting performance, this study develops a novel technique called selective machine learning ensemble (SMLE), which comprises the TSCV strategy and soft selection algorithm (SSA). This technique offers several contributions to research as highlighted below:
- (a)
By rationally taking into account the optimization of ensemble model performance, the SMLE technique achieves a high-quality integration of model selection and model combination, in which the choice of component
Machine learning models for time series forecasting
Time series forecasting has been performed in previous studies using several machine learning techniques, including support vector regression (SVR), artificial neural network (ANN), and tree-based models.
SVR is a type of statistical learning regression model that nonlinearly maps the original input space to a higher-dimensional feature space to produce a maximum margin hyperplane. SVR models have been widely used in time series forecasting due to their stable convergence and quick training. For
Ensemble forecasting paradigm
The objective of ensemble forecasting is to provide an h-step-ahead time series prediction by utilizing M component models and a window of l past observations given a univariate time series with previous values up to time t (i.e., [y1,y2,…,yt]). Ensemble forecasting can be formulated as Eq. (1), where xt = [yt-l,yt-l+1,…,yt] denotes an autoregressive vector comprising the l past observations of future value [yt+1,…, yt+h], wm represents the weight assigned to the mth model, and fm represents
Empirical experiment
The modeling and forecasting effectiveness of SMLE is empirically tested in this section by using the classical NN3 time series dataset, which has also been widely used in the literature to evaluate the existing time series forecasting methods for ensemble models. The experiment compares the results of the proposed SMLE with those of single forecasting, ensemble forecasting, and other advanced techniques from the literature.
Model discussion
This section analyzes the computational complexity of SMLE and tests the effectiveness of TSCV and SSA to determine its superiority.
Extended case study: Crude oil price forecasting
The proposed SMLE is subjected to an extended case study to further evaluate its capacity in addressing real-world time series forecasting problems. The case study uses a monthly time series of WTI crude oil spot prices. The following subsections discuss the experiment process, analyzes the forecasting results, and compares the performance of the proposed SMLE with some popular benchmarks.
Conclusions and future research
Many scholars employ ensemble techniques to boost the performance of their time series forecasting models. To achieve greater time series forecasting efficiency, this study proposes SMLE, which integrates TSCV (to assess the ensemble model performance) and SSA (to select and weigh the models based on the optimized TSCV-based ensemble accuracy) to selectively combines various machine learning models, such as SVR, FNN, RF, and GBDT. An NN3 time series forecasting experiment and a WTI crude oil
CRediT authorship contribution statement
Sheng-Xiang Lv: Conceptualization, Methodology, Software, Investigation, Writing – original draft. Lu Peng: Methodology, Investigation, Validation. Huanling Hu: Software, Visualization, Validation. Lin Wang: Conceptualization, Methodology, Investigation, Writing - review & editing, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is partially supported by the National Natural Science Foundation of China (72001046; 71771095) and the National Social Science Foundation of China (20&ZD126).
References (49)
- et al.
A manufacturing quality prediction model based on AdaBoost-LSTM with rough knowledge
Comput. Ind. Eng.
(2021) - et al.
Cross-validation aggregation for combining autoregressive neural network forecasts
Int. J. Forecast.
(2016) - et al.
A note on the validity of cross-validation for evaluating autoregressive time series prediction
Comput. Stat. Data Anal.
(2018) - et al.
Using Random forest and Gradient boosting trees to improve wave forecast at a specific location
Appl. Ocean Res.
(2020) - et al.
A new hybrid method for predicting univariate and multivariate time series based on pattern forecasting
Inf. Sci.
(2022) - et al.
A novel method for time series prediction based on error decomposition and nonlinear combination of forecasters
Neurocomputing
(2021) - et al.
Robust short-term electrical load forecasting framework for commercial buildings using deep recurrent neural networks
Appl. Energy
(2020) - et al.
A hybrid optimized error correction system for time series forecasting
Appl. Soft Comput.
(2020) - et al.
An optimized twin support vector regression algorithm enhanced by ensemble empirical mode decomposition and gated recurrent unit
Inf. Sci.
(2022) - et al.
Electrical load forecasting: A deep learning approach based on K-nearest neighbors
Appl. Soft Comput.
(2021)
Bayesian optimization based dynamic ensemble for time series forecasting
Inf. Sci.
A backpropagation learning algorithm with graph regularization for feedforward neural networks
Inf. Sci.
Deep learning for load forecasting with smart meter data: Online adaptive recurrent neural network
Appl. Energy
A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities
Inf. Fusion
Forecasting Nickel futures price based on the empirical wavelet transform and gradient boosting decision trees
Appl. Soft Comput.
Machine-Learning based methods in short-term load forecasting
The Electricity Journal
Operational framework for recent advances in backtracking search optimisation algorithm: A systematic review and performance evaluation
Appl. Math. Comput.
Bagging weak predictors
Int. J. Forecast.
A brief history of forecasting competitions
Int. J. Forecast.
A developed hybrid forecasting system for energy consumption structure forecasting based on fuzzy time series and information granularity
Energy
Holt-Winters smoothing enhanced by fruit fly optimization algorithm to forecast monthly electricity consumption
Energy
Probabilistic wind power forecasting using selective ensemble of finite mixture Gaussian process regression models
Renewable Energy
Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient
Inf. Sci.
A hybrid neural network model for short-term wind speed forecasting based on decomposition, multi-learner ensemble, and adaptive multiple error corrections
Renewable Energy
Cited by (33)
A combination prediction model based on Theil coefficient and induced continuous aggregation operator for the prediction of Shanghai composite index
2024, Expert Systems with ApplicationsA novel hierarchical feature selection with local shuffling and models reweighting for stock price forecasting
2024, Expert Systems with ApplicationsA wind speed forecasting system for the construction of a smart grid with two-stage data processing based on improved ELM and deep learning strategies
2024, Expert Systems with ApplicationsGrid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting
2024, Applied Soft ComputingProgressive neural network for multi-horizon time series forecasting
2024, Information SciencesInternational oil shocks and the volatility forecasting of Chinese stock market based on machine learning combination models
2024, North American Journal of Economics and Finance