Accounting for Imputation Uncertainty During Neural Network Training

Ranvier, Thomas; Elghazel, Haytham; Coquery, Emmanuel; Benabdeslem, Khalid

doi:10.1007/978-3-031-39831-5_24

Thomas Ranvier ORCID: orcid.org/0000-0001-9250-9530¹²,
Haytham Elghazel¹²,
Emmanuel Coquery¹² &
…
Khalid Benabdeslem¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14148))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

678 Accesses

Abstract

In this paper we are interested in dealing with missing values in a machine learning context, and more especially when training a neural network. We focus on improving neural network training by reducing the potential biases that can occur during the training phase on artificially imputed datasets. We do so by taking into account the between-variance that can be observed between multiple imputations. We propose two new imputation frameworks, S-HOT and M-HOT, that can be used to train neural networks on completed data in a less biased way, leading to models able of more generalization, and so, to better inference results. We perform extensive comparative experiments and statistically assess the results on both benchmark and real-world datasets. We show that our frameworks compete against and even outperform existing imputation frameworks, while being both useful in different settings. We make our entire code publicly accessible to facilitate reproduction of our experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploration of Neural Network Imputation Methods for Medical Datasets

MISNN: Multiple Imputation via Semi-parametric Neural Networks

Handling Missing Values for the CN2 Algorithm

Notes

References

van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)
Article Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Chapter Google Scholar
Hameed, W.M., Ali, N.A.: Enhancing imputation techniques performance utilizing uncertainty aware predictors and adversarial learning. Period. Eng. Nat. Sci. (PEN) 10(3), 350–367 (2022)
Google Scholar
Josse, J., Prost, N., Scornet, E., Varoquaux, G.: On the consistency of supervised learning with missing values. arXiv (2019)
Google Scholar
Le Morvan, M., Josse, J., Scornet, E., Varoquaux, G.: What’s a good imputation to predict with missing values? In: Advances in Neural Information Processing Systems, vol. 34, pp. 11530–11540. Curran Associates Inc. (2021)
Google Scholar
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. JMLR 11, 2287–2322 (2010)
MathSciNet MATH Google Scholar
Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: Proceedings of the 37th International Conference on Machine Learning, pp. 7130–7140. PMLR (2020). ISSN 2640-3498
Google Scholar
Rubin, D.B., Schenker, N.: Multiple imputation in health-care databases: an overview and some applications. Stat. Med. 10(4), 585–598 (1991)
Article Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet MATH Google Scholar
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, Hoboken (2004)
Google Scholar
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)
Article Google Scholar
Yan, L., Zhang, H.-T., et al.: An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2(5), 283–288 (2020)
Google Scholar
Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Proceedings of the 35th International Conference on Machine Learning, p. 5689. PMLR (2018). ISSN 2640-3498
Google Scholar
Yuan, Y.: Multiple Imputation for Missing Data: Concepts and New Development. SAS Institute Inc. (2005)
Google Scholar

Download references

Acknowledgments

This research is supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No 875171, project QUALITOP (Monitoring multidimensional aspects of QUAlity of Life after cancer ImmunoTherapy - an Open smart digital Platform for personalized prevention and patient management).

Author information

Authors and Affiliations

Univ Lyon, UCBL, CNRS, INSA Lyon, LIRIS, UMR5205, 43 bd du 11 Novembre 1918, 69622, Villeurbanne, France
Thomas Ranvier, Haytham Elghazel, Emmanuel Coquery & Khalid Benabdeslem

Authors

Thomas Ranvier
View author publications
You can also search for this author in PubMed Google Scholar
Haytham Elghazel
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Coquery
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Benabdeslem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Ranvier .

Editor information

Editors and Affiliations

Poznań University of Technology, Poznan, Poland
Robert Wrembel
Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
Johann Gamper
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ranvier, T., Elghazel, H., Coquery, E., Benabdeslem, K. (2023). Accounting for Imputation Uncertainty During Neural Network Training. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2023. Lecture Notes in Computer Science, vol 14148. Springer, Cham. https://doi.org/10.1007/978-3-031-39831-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-39831-5_24
Published: 10 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39830-8
Online ISBN: 978-3-031-39831-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accounting for Imputation Uncertainty During Neural Network Training