Skip to main content

Accounting for Imputation Uncertainty During Neural Network Training

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2023)

Abstract

In this paper we are interested in dealing with missing values in a machine learning context, and more especially when training a neural network. We focus on improving neural network training by reducing the potential biases that can occur during the training phase on artificially imputed datasets. We do so by taking into account the between-variance that can be observed between multiple imputations. We propose two new imputation frameworks, S-HOT and M-HOT, that can be used to train neural networks on completed data in a less biased way, leading to models able of more generalization, and so, to better inference results. We perform extensive comparative experiments and statistically assess the results on both benchmark and real-world datasets. We show that our frameworks compete against and even outperform existing imputation frameworks, while being both useful in different settings. We make our entire code publicly accessible to facilitate reproduction of our experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/ThomasRanvier/Accounting_for_Imputation_Uncertainty_During_Neural_Network_Training.

  2. 2.

    https://scikit-learn.org/stable/modules/generated/sklearets.load_iris.html.

  3. 3.

    https://archive.ics.uci.edu.

  4. 4.

    https://rioultf.users.greyc.fr/uci/files/pima-indians-diabetes.

  5. 5.

    https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html.

References

  1. van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)

    Article  Google Scholar 

  2. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  3. Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21

    Chapter  Google Scholar 

  4. Hameed, W.M., Ali, N.A.: Enhancing imputation techniques performance utilizing uncertainty aware predictors and adversarial learning. Period. Eng. Nat. Sci. (PEN) 10(3), 350–367 (2022)

    Google Scholar 

  5. Josse, J., Prost, N., Scornet, E., Varoquaux, G.: On the consistency of supervised learning with missing values. arXiv (2019)

    Google Scholar 

  6. Le Morvan, M., Josse, J., Scornet, E., Varoquaux, G.: What’s a good imputation to predict with missing values? In: Advances in Neural Information Processing Systems, vol. 34, pp. 11530–11540. Curran Associates Inc. (2021)

    Google Scholar 

  7. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. JMLR 11, 2287–2322 (2010)

    MathSciNet  MATH  Google Scholar 

  8. Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: Proceedings of the 37th International Conference on Machine Learning, pp. 7130–7140. PMLR (2020). ISSN 2640-3498

    Google Scholar 

  9. Rubin, D.B., Schenker, N.: Multiple imputation in health-care databases: an overview and some applications. Stat. Med. 10(4), 585–598 (1991)

    Article  Google Scholar 

  10. Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  11. Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, Hoboken (2004)

    Google Scholar 

  12. Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)

    Article  Google Scholar 

  13. Yan, L., Zhang, H.-T., et al.: An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2(5), 283–288 (2020)

    Google Scholar 

  14. Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Proceedings of the 35th International Conference on Machine Learning, p. 5689. PMLR (2018). ISSN 2640-3498

    Google Scholar 

  15. Yuan, Y.: Multiple Imputation for Missing Data: Concepts and New Development. SAS Institute Inc. (2005)

    Google Scholar 

Download references

Acknowledgments

This research is supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No 875171, project QUALITOP (Monitoring multidimensional aspects of QUAlity of Life after cancer ImmunoTherapy - an Open smart digital Platform for personalized prevention and patient management).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Ranvier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ranvier, T., Elghazel, H., Coquery, E., Benabdeslem, K. (2023). Accounting for Imputation Uncertainty During Neural Network Training. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2023. Lecture Notes in Computer Science, vol 14148. Springer, Cham. https://doi.org/10.1007/978-3-031-39831-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39831-5_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39830-8

  • Online ISBN: 978-3-031-39831-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics