Skip to main content

Automatic Delta-Adjustment Method Applied to Missing Not At Random Imputation

  • Conference paper
  • First Online:
Computational Science – ICCS 2023 (ICCS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14073))

Included in the following conference series:

  • 779 Accesses

Abstract

Missing data can be described by the absence of values in a dataset, which can be a critical issue in domains such as healthcare. A common solution for this problem is imputation, where the missing values are replaced by estimations. Most imputation methods are suitable for the Missing Completely At Random (MCAR) and Missing At Random (MAR) mechanisms but produce biased results for Missing Not At Random (MNAR) values. An effective approach to mitigate this bias effect is to use the delta-adjustment method. This method assumes the imputation is performed for the MAR mechanism and adjusts the imputed values to become valid under MNAR assumptions by applying a correction factor. Such adjustment is usually defined manually by a domain expert, which often makes this method unfeasible. In this work, we propose an automatic procedure to find an approximate delta adjustment value for every feature of the dataset, which we call Automatic Delta-Adjustment Method. The proposed procedure is validated in an experimental setup comprising 10 datasets of the healthcare domain injected with MNAR values. The results from seven state-of-the-art imputation methods are compared with and without the adjustment, and applying the correction provides a significantly lower imputation error for all methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    PCA is a feature extraction technique often used for dimensionality reduction. It computes the principal components of the data and returns the first n, which is a user-defined parameter [1].

  2. 2.

    https://github.com/jsyoon0823/GAIN.

  3. 3.

    https://github.com/ricardodcpereira/ADAM.

  4. 4.

    https://archive.ics.uci.edu/ml.

References

  1. Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev.: Comput. Stat. 2(4), 433–459 (2010)

    Article  Google Scholar 

  2. Austin, P.C., White, I.R., Lee, D.S., van Buuren, S.: Missing data in clinical research: a tutorial on multiple imputation. Can. J. Cardiol. 37(9), 1322–1331 (2020)

    Article  Google Scholar 

  3. Beaulieu-Jones, B.K., Lavage, D.R., Snyder, J.W., Moore, J.H., Pendergrass, S.A., Bauer, C.R.: Characterizing and managing missing structured data in electronic health records: data analysis. JMIR Med. Inf. 6(1), e11 (2018)

    Article  Google Scholar 

  4. Van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2010)

    Google Scholar 

  5. Carreras, G., et al.: Missing not at random in end of life care studies: multiple imputation and sensitivity analysis on data from the action study. BMC Med. Res. Methodol. 21(1), 1–12 (2021)

    Article  MathSciNet  Google Scholar 

  6. García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010)

    Article  Google Scholar 

  7. Gondara, L., Wang, K.: Recovering loss to followup information using denoising autoencoders. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1936–1945 (2017)

    Google Scholar 

  8. Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21

    Chapter  Google Scholar 

  9. Leacy, F.P., Floyd, S., Yates, T.A., White, I.R.: Analyses of sensitivity to the missing-at-random assumption using multiple imputation with delta adjustment: application to a tuberculosis/HIV prevalence survey with incomplete HIV-status data. Am. J. Epidemiol. 185(4), 304–315 (2017)

    Google Scholar 

  10. Leurent, B., Gomes, M., Faria, R., Morris, S., Grieve, R., Carpenter, J.R.: Sensitivity analysis for not-at-random missing data in trial-based cost-effectiveness analysis: a tutorial. Pharmacoeconomics 36(8), 889–901 (2018)

    Article  Google Scholar 

  11. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. John Wiley & Sons, New York (2019)

    MATH  Google Scholar 

  12. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287–2322 (2010)

    MathSciNet  MATH  Google Scholar 

  13. McCoy, J.T., Kroon, S., Auret, L.: Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51(21), 141–146 (2018)

    Article  Google Scholar 

  14. Peek, N., Rodrigues, P.P.: Three controversies in health data science. Int. J. Data Sci. Anal. 6(3), 261–269 (2018). https://doi.org/10.1007/s41060-018-0109-y

    Article  Google Scholar 

  15. Pereira, R.C., Abreu, P.H., Rodrigues, P.P.: Partial multiple imputation with variational autoencoders: tackling not at randomness in healthcare data. IEEE J. Biomed. Health Inf. 26(8), 4218–4227 (2022)

    Article  Google Scholar 

  16. Pereira, R.C., Santos, M.S., Rodrigues, P.P., Abreu, P.H.: Reviewing autoencoders for missing data imputation: technical trends, applications and outcomes. J. Artif. Intell. Res. 69, 1255–1285 (2020)

    Article  MathSciNet  Google Scholar 

  17. Qiu, Y.L., Zheng, H., Gevaert, O.: Genomic data imputation with variational auto-encoders. GigaScience 9(8) (2020)

    Google Scholar 

  18. Rezvan, P.H., Lee, K.J., Simpson, J.A.: Sensitivity analysis within multiple imputation framework using delta-adjustment: application to longitudinal study of australian children. Longitudinal Life Course Stud. 9(3), 259–278 (2018)

    Article  Google Scholar 

  19. Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  20. Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys, vol. 81. John Wiley & Sons, New York (2004)

    MATH  Google Scholar 

  21. Santos, M.S., Abreu, P.H., García-Laencina, P.J., Simão, A., Carvalho, A.: A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J. Biomed. Inf. 58, 49–59 (2015)

    Article  Google Scholar 

  22. Santos, M.S., Pereira, R.C., Costa, A.F., Soares, J.P., Santos, J., Abreu, P.H.: Generating synthetic missing data: a review by missing mechanism. IEEE Access 7, 11651–11667 (2019)

    Article  Google Scholar 

  23. Tan, P.T., Cro, S., Van Vogt, E., Szigeti, M., Cornelius, V.R.: A review of the use of controlled multiple imputation in randomised controlled trials with missing outcome data. BMC Med. Res. Methodol. 21(1), 1–17 (2021)

    Article  Google Scholar 

  24. Twala, B.: An empirical comparison of techniques for handling incomplete data using decision trees. Appl. Artif. Intell. 23(5), 373–405 (2009)

    Article  Google Scholar 

  25. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine learning, pp. 1096–1103 (2008)

    Google Scholar 

  26. White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30(4), 377–399 (2011)

    Article  MathSciNet  Google Scholar 

  27. Xia, J., et al.: Adjusted weight voting algorithm for random forests in handling missing values. Pattern Recogn. 69, 52–60 (2017)

    Article  Google Scholar 

  28. Yoon, J., Jordon, J., Schaar, M.: Gain: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning, pp. 5689–5698. PMLR (2018)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by the FCT - Foundation for Science and Technology, I.P., Research Grant SFRH/BD/149018/2019. This work is also funded by the FCT - Foundation for Science and Technology, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R &D Unit - UIDB/00326/2020 or project code UIDP/00326/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Cardoso Pereira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pereira, R.C., Rodrigues, P.P., Figueiredo, M.A.T., Abreu, P.H. (2023). Automatic Delta-Adjustment Method Applied to Missing Not At Random Imputation. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14073. Springer, Cham. https://doi.org/10.1007/978-3-031-35995-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35995-8_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35994-1

  • Online ISBN: 978-3-031-35995-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics