Abstract
Evaluating predictive models is a crucial task in predictive analytics. This process is especially challenging with time series data because observations are not independent. Several studies have analyzed how different performance estimation methods compare with each other for approximating the true loss incurred by a given forecasting model. However, these studies do not address how the estimators behave for model selection: the ability to select the best solution among a set of alternatives. This paper addresses this issue. The goal of this work is to compare a set of estimation methods for model selection in time series forecasting tasks. This objective is split into two main questions: (i) analyze how often a given estimation method selects the best possible model; and (ii) analyze what is the performance loss when the best model is not selected. Experiments were carried out using a case study that contains 3111 time series. The accuracy of the estimators for selecting the best solution is low, despite being significantly better than random selection. Moreover, the overall forecasting performance loss associated with the model selection process ranges from 0.28 to 0.58%. Yet, no considerable differences between different approaches were found. Besides, the sample size of the time series is an important factor in the relative performance of the estimators.
Similar content being viewed by others
Data Availibility
All experiments and data are publicly available (c.f. footnote 1)
Notes
The model_selection module from the scikit-learn Python library designates this method as TimeSeriesSplits.
References
Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The x-random case. International statistical review/revue internationale de Statistique pp. 291–319
Arlot S, Celisse A et al (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Bergmeir C, Benítez JM (2012) On the use of cross-validation for time series predictor evaluation. Inf Sci 191:192–213
Bergmeir C, Hyndman RJ, Koo B (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput Stat Data Anal 120:70–83
Cerqueira V, Torgo L, Mozetič I (2020) Evaluating time series forecasting models: an empirical study on performance estimation methods. Mach Learn 109:1–32
Tashman LJ (2000) Out-of-sample tests of forecasting accuracy: an analysis and review. Int J Forecast 16(4):437–450
Mozetič I, Torgo L, Cerqueira V, Smailović J (2018) How to evaluate sentiment classifiers for twitter time-ordered data? PLoS ONE 13(3):e0194,317
Yang Y (2007) Consistency of cross validation for comparing regression procedures. Ann Stat 35(6):2450–2473
Dawid AP (1984) Present position and potential developments: Some personal views statistical theory the prequential approach. J R Stat Soc Ser A (General) 147(2):278–290
Opsomer J, Wang Y, Yang Y (2001) Nonparametric regression with correlated errors. Stat Sci 16(2):134–153
Snijders TA (1988) On model uncertainty and its statistical implications. Springer, pp 56–69
McQuarrie AD, Tsai CL (1998) Regression and time series model selection. World Scientific
Racine J (2000) Consistent cross-validatory model-selection for dependent data: hv-block cross-validation. J Econ 99(1):39–61
Gama J, Rodrigues PP, Sebastião R (2009) In: Proceedings of the 2009 ACM symposium on Applied Computing, pp 1496–1500
Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE 13(3):e0194,889
Chatfield C (2000) Time-series forecasting. CRC press
Gardner ES Jr (1985) Exponential smoothing: the state of the art. J Forecast 4(1):1–28
Spiliotis E, Makridakis S, Semenoglou AA, Assimakopoulos V (2022) Comparison of statistical and machine learning methods for daily sku demand forecasting. Oper Res 22(3):3037–3061
Cerqueira V, Torgo L, Soares C (2022) A case study comparing machine learning with statistical methods for time series forecasting: size matters. J Intell Inf Syst 59:1–19
Makridakis S, Spiliotis E, Assimakopoulos V (2020) The m5 accuracy competition: results, findings and conclusions. Int J Forecast 38:1346
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) In: Advances in neural information processing systems, pp 3146–3154
Cerqueira V, Torgo L, Oliveira M, Pfahringer B (2017) In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (IEEE, 2017), pp 242–251
Cerqueira V, Torgo L, Pinto F, Soares C (2019) Arbitrage of forecasting experts. Mach Learn 108(6):913–944
Corani G, Benavoli A, Augusto J, Zaffalon M (2020) Automatic forecasting using gaussian processes. arXiv preprint arXiv:2009.08102
Oreshkin BN, Carpov D, Chapados N, Bengio Y (2019) N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437
Salinas D, Flunkert V, Gasthaus J, Januschowski T (2020) Deepar: Probabilistic forecasting with autoregressive recurrent networks. Int J Forecast 36(3):1181–1191
Smyl S (2020) A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. Int J Forecast 36(1):75–85
Lim B, Arık SÖ, Loeff N, Pfister T (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecast 37(4):1748–1764
Chen MR, Zeng GQ, Lu KD, Weng J (2019) A two-layer nonlinear combination method for short-term wind speed prediction based on elm, enn, and lstm. IEEE Internet Things J 6(4):6997–7010
Zhao F, Zeng GQ, Lu KD (2019) Enlstm-wpeo: Short-term traffic flow prediction by ensemble lstm, nnct weight integration, and population extremal optimization. IEEE Trans Veh Technol 69(1):101–113
Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45
Triebe O, Hewamalage H, Pilyugina P, Laptev N, Bergmeir C, Rajagopal R (2021) Neuralprophet: Explainable forecasting at scale. arXiv preprint arXiv:2111.15397
Bandara K, Hewamalage H, Liu YH, Kang Y, Bergmeir C (2021) Improving the accuracy of global forecasting models using time series data augmentation. Pattern Recogn 120:108,148
Hewamalage H, Bergmeir C, Bandara K (2022) Global models for time series forecasting: A simulation study. Pattern Recogn 124:108,441
Kennel MB, Brown R, Abarbanel HD (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical Rev A 45(6):3403
Brazdil PB, Soares C (2000) European conference on machine learning. Springer, pp 63–75
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152–161
Abdulrahman SM, Brazdil P, van Rijn JN, Vanschoren J (2018) Speeding up algorithm selection using average ranking and active testing by introducing runtime. Mach Learn 107(1):79–108
Makridakis S, Spiliotis E, Assimakopoulos V (2020) The m4 competition: 100,000 time series and 61 forecasting methods. Int J Forecast 36(1):54–74
Hyndman R, Yang Y (2019) tsdl: Time series data library. https://finyang.github.io/tsdl/, https://github.com/FinYang/tsdl
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab-an s4 package for kernel methods in r. J Stat Softw 11(9):1–20
Milborrow S (2012) earth: multivariate adaptive regression spline models
Wright MN (2015) ranger: a fast implementation of random forests. R package
Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76(376):817–823
Kuhn M, Weston S, Keefer C (2014) N.C.C. code for Cubist by Ross Quinlan, Cubist: rule- and instance-based regression modeling. R package version 0.0.18
Cannon AJ (2017) monmlp: Multi-layer perceptron neural network with optional monotonicity constraints. https://CRAN.R-project.org/package=monmlp. R package version 1.1.5
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Mevik BH, Wehrens R, Liland KH (2016) pls: partial least squares and principal component regression. https://CRAN.R-project.org/package=pls. R package version 2.6-0
Chen T, Guestrin C (2016) In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Picard RR, Cook RD (1984) Cross-validation of regression models. J Am Stat Assoc 79(387):575–583
Jain CL (2017) Answers to your forecasting questions. J Bus Forecast 36(1):3
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Funding
The work of L. Torgo was undertaken, in part, thanks to funding from the Canada Research Chairs program; the work of Carlos Soares was partially funded by projects ConnectedHealth (no. 46858), supported by Competitiveness and Internationalisation Operational Programme (POCI) and Lisbon Regional Operational Programme (LISBOA 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), by the project Safe Cities - Inovação para Construir Cidades Seguras, with the reference POCI-01-0247-FEDER-041435, co-funded by the European Regional Development Fund (ERDF), through the Operational Programme for Competitiveness and Internationalization (COMPETE 2020), under the PORTU- GAL 2020 Partnership Agreement, by project NextGenAI - Center for Responsible AI (2022-C05i0102-02), supported by IAPMEI, and also by FCT plurianual funding for 2020–2023 of LIACC (UIDB/00027/2020_UIDP/00027/202
Author information
Authors and Affiliations
Contributions
All authors contributed to writing and research.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose
Consent to participate
Not applicable
Consent for publication
Not applicable
Ethics approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cerqueira, V., Torgo, L. & Soares, C. Model Selection for Time Series Forecasting An Empirical Analysis of Multiple Estimators. Neural Process Lett 55, 10073–10091 (2023). https://doi.org/10.1007/s11063-023-11239-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11239-8