Abstract
In this paper we are interested in dealing with missing values in a machine learning context, and more especially when training a neural network. We focus on improving neural network training by reducing the potential biases that can occur during the training phase on artificially imputed datasets. We do so by taking into account the between-variance that can be observed between multiple imputations. We propose two new imputation frameworks, S-HOT and M-HOT, that can be used to train neural networks on completed data in a less biased way, leading to models able of more generalization, and so, to better inference results. We perform extensive comparative experiments and statistically assess the results on both benchmark and real-world datasets. We show that our frameworks compete against and even outperform existing imputation frameworks, while being both useful in different settings. We make our entire code publicly accessible to facilitate reproduction of our experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
References
van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Hameed, W.M., Ali, N.A.: Enhancing imputation techniques performance utilizing uncertainty aware predictors and adversarial learning. Period. Eng. Nat. Sci. (PEN) 10(3), 350–367 (2022)
Josse, J., Prost, N., Scornet, E., Varoquaux, G.: On the consistency of supervised learning with missing values. arXiv (2019)
Le Morvan, M., Josse, J., Scornet, E., Varoquaux, G.: What’s a good imputation to predict with missing values? In: Advances in Neural Information Processing Systems, vol. 34, pp. 11530–11540. Curran Associates Inc. (2021)
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. JMLR 11, 2287–2322 (2010)
Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: Proceedings of the 37th International Conference on Machine Learning, pp. 7130–7140. PMLR (2020). ISSN 2640-3498
Rubin, D.B., Schenker, N.: Multiple imputation in health-care databases: an overview and some applications. Stat. Med. 10(4), 585–598 (1991)
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, Hoboken (2004)
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)
Yan, L., Zhang, H.-T., et al.: An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2(5), 283–288 (2020)
Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Proceedings of the 35th International Conference on Machine Learning, p. 5689. PMLR (2018). ISSN 2640-3498
Yuan, Y.: Multiple Imputation for Missing Data: Concepts and New Development. SAS Institute Inc. (2005)
Acknowledgments
This research is supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No 875171, project QUALITOP (Monitoring multidimensional aspects of QUAlity of Life after cancer ImmunoTherapy - an Open smart digital Platform for personalized prevention and patient management).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ranvier, T., Elghazel, H., Coquery, E., Benabdeslem, K. (2023). Accounting for Imputation Uncertainty During Neural Network Training. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2023. Lecture Notes in Computer Science, vol 14148. Springer, Cham. https://doi.org/10.1007/978-3-031-39831-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-39831-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39830-8
Online ISBN: 978-3-031-39831-5
eBook Packages: Computer ScienceComputer Science (R0)