Abstract
Cross validation is widely used to assess the performance of prediction models for unseen data. Leave-k-out and m-fold are among the most popular cross validation criteria, which have complementary strengths and limitations. Leave-k-out (with leave-1-out being the most common special case) is exhaustive and more reliable but computationally prohibitive when \(k > 2\); whereas m-fold is much more tractable at the cost of uncertain performance due to non-exhaustive random sampling. We propose a new cross validation criterion, leave-worst-k-out, which attempts to combine the strengths and avoid limitations of leave-k-out and m-fold. The leave-worst-k-out criterion is defined as the largest validation error out of \(C_{n^{k}}\) possible ways to partition n data points into a subset of \((n-k)\) for training a prediction model and the remaining k for validation. In contrast, the leave-k-out criterion takes the average of the \(C_{n^{k}}\) validation errors from the aforementioned partitions, and m-fold samples m random (but non-independent) such validation errors. We prove that, for the special case of multiple linear regression model under the \({\mathcal {L}}_1\) norm, the leave-worst-k-out criterion can be computed by solving a mixed integer linear program. We also present a random sampling algorithm for approximately computing the criterion for general prediction models under general norms. Results of two computational experiments suggested that the leave-worst-k-out criterion clearly outperformed leave-k-out and m-fold in assessing the generalizability of prediction models; moreover, leave-worst-k-out can be approximately computed using the random sampling algorithm almost as efficiently as leave-1-out and m-fold, and the effectiveness of the approximated criterion may be as high as, or even higher than, the exactly computed criterion.


Similar content being viewed by others
References
Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)
Trippa, L., Waldron, L., Huttenhower, C., Parmigiani, G., et al.: Bayesian nonparametric cross-study validation of prediction methods. Ann. Appl. Stat. 9(1), 402–428 (2015)
Burnham, K.P., Anderson, D.R., Huyvaert, K.P.: Aic model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav. Ecol. Sociobiol. 65(1), 23–35 (2011)
Candes, E., Tao, T., et al.: The dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)
Bartlett, P.L., Long, P.M., Lugosi, G., Tsigler, A.: Benign overfitting in linear regression. In: Proceedings of the National Academy of Sciences. (2020)
Falkner, B., Schröder, G.F.: Cross-validation in cryo-EM-based structural modeling. Proc. Natl. Acad. Sci. 110(22), 8930–8935 (2013)
Scheres, S.H., Chen, S.: Prevention of overfitting in cryo-EM structure determination. Nat. Methods 9(9), 853 (2012)
Vehtari, A., Gelman, A., Gabry, J.: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27(5), 1413–1432 (2017)
Celisse, A., et al.: Optimal cross-validation in density estimation with the L\(\{{2}\}\) -loss. Ann. Stat. 42(5), 1879–1910 (2014)
Airola, A., Pahikkala, T., Waegeman, W., De Baets, B., Salakoski, T.: An experimental comparison of cross-validation techniques for estimating the area under the roc curve. Comput. Stat. Data Anal. 55(4), 1828–1844 (2011)
Cawley, G.C., Talbot, N.L.: Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers. Pattern Recogn. 36(11), 2585–2592 (2003)
Homrighausen, D., McDonald, D.J.: Leave-one-out cross-validation is risk consistent for lasso. Mach. Learn. 97(1–2), 65–78 (2014)
Rodriguez, J.D., Perez, A., Lozano, J.A.: Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 569–575 (2009)
Fushiki, T.: Estimation of prediction error by using k-fold cross-validation. Stat. Comput. 21(2), 137–146 (2011)
Blum, A., Kalai, A., and Langford, J. Beating the hold-out: bounds for k-fold and progressive cross-validation. In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 203–208. (1999)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, pp. 1137–1145. (1995)
Magnusson, M., Vehtari, A., Jonasson, J., Andersen, M.: Leave-one-out cross-validation for Bayesian model comparison in large data. In: International Conference on Artificial Intelligence and Statistics, pp. 341–351. PMLR (2020)
Xu, L., Hu, O., Guo, Y., Zhang, M., Lu, D., Cai, C.-B., Xie, S., Goodarzi, M., Fu, H.-Y., She, Y.-B.: Representative splitting cross validation. Chemom. Intell. Lab. Syst. 183, 29–35 (2018)
Jung, Y.: Multiple predicting k-fold cross-validation for model selection. J. Nonparametric Stat. 30(1), 197–215 (2018)
Ramezan, A., Warner, A., Maxwell, A.: Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification. Remote Sens. 11(2), 185 (2019)
Duarte, E., Wainer, J.: Empirical comparison of cross-validation and internal metrics for tuning svm hyperparameters. Pattern Recogn. Lett. 88, 6–11 (2017)
Sampath, R., Indumathi, J.: Earlier detection of Alzheimer disease using n-fold cross validation approach. J. Med. Syst. 42(11), 1–11 (2018)
Horvat, T., Havaš, L., Srpak, D.: The impact of selecting a validation method in machine learning on predicting basketball game outcomes. Symmetry 12(3), 431 (2020)
Cossio, P.: Need for cross-validation of single particle cryo-EM. J. Chem. Inf. Model. 60(5), 2413–2418 (2020)
Adnan, R.M., Liang, Z., Yuan, X., Kisi, O., Akhlaq, M., Li, B.: Comparison of lssvr, m5rt, nf-gp, and nf-sc models for predictions of hourly wind speed and wind power based on cross-validation. Energies 12(2), 329 (2019)
Bénichou, M., Gauthier, J.-M., Girodet, P., Hentges, G., Ribière, G., Vincent, O.: Experiments in mixed-integer linear programming. Math. Program. 1(1), 76–94 (1971)
Codato, G., Fischetti, M.: Combinatorial benders’ cuts for mixed-integer linear programming. Oper. Res. 54(4), 756–766 (2006)
Testa, A., Rucco, A., Notarstefano, G.: Distributed mixed-integer linear programming via cut generation and constraint exchange. IEEE Trans. Autom. Control 65, 1456–1467 (2019)
Cplex, I.I.: V12. 1: User’s manual for cplex. Int. Bus. Mach. Corp. 46(53), 157 (2009)
Gurobi Optimization, I. Gurobi Optimizer Reference Manual. URL http://www. gurobi. com. (2018)
Comparative Evaluation of Prediction Algorithms, C. http://www.coepra.org/CoEPrA_regr.html. (2006)
Mitteroecker, P., Cheverud, J., Pavlicev, M.: Multivariate analysis of genotype-phenotype association. Genetics 202(4), 1345–1363 (2016)
Acknowledgements
This work was partially supported by the National Science Foundation under the LEAP HI and GOALI programs (Grant Number 1830478) and under the EAGER program (Grant Number 1842097) and the Plant Sciences Institute at Iowa State University. This manuscript was greatly improved thanks to constructive and insightful feedback from the Associate Editor and an anonymous reviewer. The author is grateful to Dr. Qing Li and Lijie Liu for the suggestion of the CoEPrA data source and to Dr. Guiping Hu and Dr. Dan Nettleton for inspiring conversations about the proposed LWKO criterion.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, L. The leave-worst-k-out criterion for cross validation. Optim Lett 17, 545–560 (2023). https://doi.org/10.1007/s11590-022-01894-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-022-01894-6