Abstract
Model validation for time series models has always been a challenge due to a lot of complexities. The presence of auto-correlation in the data creates a challenge to the conventional cross validation techniques like k-fold cross validation to be implemented for time-series models. In this paper, two weighted k-fold time series split cross-validation techniques are proposed for this purpose. The proposed techniques were validated using the opening price data of cryptocurrency. Mean squared error (MSE), Mean absolute error (MAE) and Mean absolute percentage error (MAPE) were the selected metrics to validate the proposed techniques. Both the techniques were found to give robust results; however, the Exponential weighted K-fold time series split cross validation (EWKCV) technique was seen to perform better than Generally weighted K-fold time series split cross validation (GWKCV) technique. The results of the proposed techniques, along with the results of simple train-test split for the time-series models, is seen to give better result.



Similar content being viewed by others
Data Availability
The data considered for the analysis in this paper is taken the Kaggle website (https://www.kaggle.com/datasets/varpit94/bitcoin-data-updated-till-26jun2021). Also, the first forty observations of the 2832 observations are shown in the Annexure of this paper just for immediate and quick reference.
Code Availability
Submitted.
References
Naylor TH, Seaks TG, Wichern DW (1972) Box-Jenkins methods: an alternative to econometric models. Int Stat Review/Revue Int de Statistique 40(2):123–137
Feizabadi J (2022) Machine learning demand forecasting and supply chain performance. Int J Logistics Res Appl 25(2):119–142
Jardet C, Meunier B (2022) Nowcasting world GDP growth with high-frequency data. J Forecast 41(6):1181–1200
Tan CV, Singh S, Lai CH, Zamri ASSM, Dass SC, Aris TB, ... Gill BS (2022) Forecasting COVID-19 case trends using SARIMA models during the third wave of COVID-19 in Malaysia. Int J Environ Res Public Health 19(3):1504
Adenomon MO, Maijamaa B, John DO (2022) The effects of Covid-19 outbreak on the Nigerian Stock Exchange performance: evidence from GARCH Models. J Stat Model Analytics (JOSMA) 4(1)
Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Philosophical Trans Royal Soc A 379(2194):20200209
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Berrar D (2018) Cross-validation. Encycl Bioinform Comput Biol 1(Elsevier):542–545
Bergmeir C, Hyndman RJ, Koo B (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput Stat Data Anal 120:70–83
Hwang S (2010) Cross-validation of short-term productivity forecasting methodologies. J Constr Eng Manag 136(9):1037–1046
Bergmeir C, Benítez JM (2012) On the use of cross-validation for time series predictor evaluation. Inf Sci 191:192–213
Donate JP, Cortez P, Sanchez GG, De Miguel AS (2013) Time series forecasting using a weighted cross-validation evolutionary artificial neural network ensemble. Neurocomputing 109:27–32
Fonseca-Delgado R, Gomez-Gil P (2013) An assessment of ten-fold and Monte Carlo cross validations for time series forecasting. In 2013 10th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) (pp. 215–220). IEEE
Barrow DK, Crone SF (2016) Cross-validation aggregation for combining autoregressive neural network forecasts. Int J Forecast 32(4):1120–1137
Jiang G, Wang W (2017) Markov cross-validation for time series model evaluations. Inf Sci 375:219–233
Cerqueira V, Torgo L, Smailović J, Mozetič I (2017) A comparative study of performance estimation methods for time series forecasting. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (pp. 529–538). IEEE
Malki Z, Atlam ES, Hassanien AE, Dagnew G, Elhosseini MA, Gad I (2020) Association between weather data and COVID-19 pandemic predicting mortality rate: machine learning approaches. Chaos Solitons Fractals 138:110137
Malki Z, Atlam ES, Ewis A, Dagnew G, Alzighaibi AR, ELmarhomy G, ... Gad I (2021) ARIMA models for predicting the end of COVID-19 pandemic and the risk of second rebound. Neural Comput Appl 33:2929–2948
Kaur J, Parmar KS, Singh S (2023) Autoregressive models in environmental forecasting time series: a theoretical and application review. Environ Sci Pollut Res 30(8):19617–19641
Bürkner PC, Gabry J, Vehtari A (2020) Approximate leave-future-out cross-validation for bayesian time series models. J Stat Comput Simul 90(14):2499–2523
Funding
None.
Author information
Authors and Affiliations
Contributions
Vamsikrishna A carried our analysis and wrote the draft. Gijo EV was the respectful guide and carried out multiple detailed reviews of the manuscript to bring it to near perfect.
Corresponding author
Ethics declarations
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Annexure: First Forty Observations of 2832 Observations
Annexure: First Forty Observations of 2832 Observations
Date | Open | High | Low | Close | Adj Close | Volume |
---|---|---|---|---|---|---|
17-09-2014 | 465.864 | 468.174 | 452.422 | 457.334 | 457.334 | 21,056,800 |
18-09-2014 | 456.86 | 456.86 | 413.104 | 424.44 | 424.44 | 34,483,200 |
19-09-2014 | 424.103 | 427.835 | 384.532 | 394.796 | 394.796 | 37,919,700 |
20-09-2014 | 394.673 | 423.296 | 389.883 | 408.904 | 408.904 | 36,863,600 |
21-09-2014 | 408.085 | 412.426 | 393.181 | 398.821 | 398.821 | 26,580,100 |
22-09-2014 | 399.1 | 406.916 | 397.13 | 402.152 | 402.152 | 24,127,600 |
23-09-2014 | 402.092 | 441.557 | 396.197 | 435.791 | 435.791 | 45,099,500 |
24-09-2014 | 435.751 | 436.112 | 421.132 | 423.205 | 423.205 | 30,627,700 |
25-09-2014 | 423.156 | 423.52 | 409.468 | 411.574 | 411.574 | 26,814,400 |
26-09-2014 | 411.429 | 414.938 | 400.009 | 404.425 | 404.425 | 21,460,800 |
27-09-2014 | 403.556 | 406.623 | 397.372 | 399.52 | 399.52 | 15,029,300 |
28-09-2014 | 399.471 | 401.017 | 374.332 | 377.181 | 377.181 | 23,613,300 |
29-09-2014 | 376.928 | 385.211 | 372.24 | 375.467 | 375.467 | 32,497,700 |
30-09-2014 | 376.088 | 390.977 | 373.443 | 386.944 | 386.944 | 34,707,300 |
01-10-2014 | 387.427 | 391.379 | 380.78 | 383.615 | 383.615 | 26,229,400 |
02-10-2014 | 383.988 | 385.497 | 372.946 | 375.072 | 375.072 | 21,777,700 |
03-10-2014 | 375.181 | 377.695 | 357.859 | 359.512 | 359.512 | 30,901,200 |
04-10-2014 | 359.892 | 364.487 | 325.886 | 328.866 | 328.866 | 47,236,500 |
05-10-2014 | 328.916 | 341.801 | 289.296 | 320.51 | 320.51 | 83,308,096 |
06-10-2014 | 320.389 | 345.134 | 302.56 | 330.079 | 330.079 | 79,011,800 |
07-10-2014 | 330.584 | 339.247 | 320.482 | 336.187 | 336.187 | 49,199,900 |
08-10-2014 | 336.116 | 354.364 | 327.188 | 352.94 | 352.94 | 54,736,300 |
09-10-2014 | 352.748 | 382.726 | 347.687 | 365.026 | 365.026 | 83,641,104 |
10-10-2014 | 364.687 | 375.067 | 352.963 | 361.562 | 361.562 | 43,665,700 |
11-10-2014 | 361.362 | 367.191 | 355.951 | 362.299 | 362.299 | 13,345,200 |
12-10-2014 | 362.606 | 379.433 | 356.144 | 378.549 | 378.549 | 17,552,800 |
13-10-2014 | 377.921 | 397.226 | 368.897 | 390.414 | 390.414 | 35,221,400 |
14-10-2014 | 391.692 | 411.698 | 391.324 | 400.87 | 400.87 | 38,491,500 |
15-10-2014 | 400.955 | 402.227 | 388.766 | 394.773 | 394.773 | 25,267,100 |
16-10-2014 | 394.518 | 398.807 | 373.07 | 382.556 | 382.556 | 26,990,000 |
17-10-2014 | 382.756 | 385.478 | 375.389 | 383.758 | 383.758 | 13,600,700 |
18-10-2014 | 383.976 | 395.158 | 378.971 | 391.442 | 391.442 | 11,416,800 |
19-10-2014 | 391.254 | 393.939 | 386.457 | 389.546 | 389.546 | 5,914,570 |
20-10-2014 | 389.231 | 390.084 | 378.252 | 382.845 | 382.845 | 16,419,000 |
21-10-2014 | 382.421 | 392.646 | 380.834 | 386.475 | 386.475 | 14,188,900 |
22-10-2014 | 386.118 | 388.576 | 382.249 | 383.158 | 383.158 | 11,641,300 |
23-10-2014 | 382.962 | 385.048 | 356.447 | 358.417 | 358.417 | 26,456,900 |
24-10-2014 | 358.591 | 364.345 | 353.305 | 358.345 | 358.345 | 15,585,700 |
25-10-2014 | 358.611 | 359.861 | 342.877 | 347.271 | 347.271 | 18,127,500 |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vamsikrishna, A., Gijo, E.V. New Techniques to Perform Cross-Validation for Time Series Models. Oper. Res. Forum 5, 51 (2024). https://doi.org/10.1007/s43069-024-00334-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43069-024-00334-8