Abstract
Missing data can be described by the absence of values in a dataset, which can be a critical issue in domains such as healthcare. A common solution for this problem is imputation, where the missing values are replaced by estimations. Most imputation methods are suitable for the Missing Completely At Random (MCAR) and Missing At Random (MAR) mechanisms but produce biased results for Missing Not At Random (MNAR) values. An effective approach to mitigate this bias effect is to use the delta-adjustment method. This method assumes the imputation is performed for the MAR mechanism and adjusts the imputed values to become valid under MNAR assumptions by applying a correction factor. Such adjustment is usually defined manually by a domain expert, which often makes this method unfeasible. In this work, we propose an automatic procedure to find an approximate delta adjustment value for every feature of the dataset, which we call Automatic Delta-Adjustment Method. The proposed procedure is validated in an experimental setup comprising 10 datasets of the healthcare domain injected with MNAR values. The results from seven state-of-the-art imputation methods are compared with and without the adjustment, and applying the correction provides a significantly lower imputation error for all methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
PCA is a feature extraction technique often used for dimensionality reduction. It computes the principal components of the data and returns the first n, which is a user-defined parameter [1].
- 2.
- 3.
- 4.
References
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev.: Comput. Stat. 2(4), 433–459 (2010)
Austin, P.C., White, I.R., Lee, D.S., van Buuren, S.: Missing data in clinical research: a tutorial on multiple imputation. Can. J. Cardiol. 37(9), 1322–1331 (2020)
Beaulieu-Jones, B.K., Lavage, D.R., Snyder, J.W., Moore, J.H., Pendergrass, S.A., Bauer, C.R.: Characterizing and managing missing structured data in electronic health records: data analysis. JMIR Med. Inf. 6(1), e11 (2018)
Van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2010)
Carreras, G., et al.: Missing not at random in end of life care studies: multiple imputation and sensitivity analysis on data from the action study. BMC Med. Res. Methodol. 21(1), 1–12 (2021)
García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010)
Gondara, L., Wang, K.: Recovering loss to followup information using denoising autoencoders. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1936–1945 (2017)
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Leacy, F.P., Floyd, S., Yates, T.A., White, I.R.: Analyses of sensitivity to the missing-at-random assumption using multiple imputation with delta adjustment: application to a tuberculosis/HIV prevalence survey with incomplete HIV-status data. Am. J. Epidemiol. 185(4), 304–315 (2017)
Leurent, B., Gomes, M., Faria, R., Morris, S., Grieve, R., Carpenter, J.R.: Sensitivity analysis for not-at-random missing data in trial-based cost-effectiveness analysis: a tutorial. Pharmacoeconomics 36(8), 889–901 (2018)
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. John Wiley & Sons, New York (2019)
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287–2322 (2010)
McCoy, J.T., Kroon, S., Auret, L.: Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51(21), 141–146 (2018)
Peek, N., Rodrigues, P.P.: Three controversies in health data science. Int. J. Data Sci. Anal. 6(3), 261–269 (2018). https://doi.org/10.1007/s41060-018-0109-y
Pereira, R.C., Abreu, P.H., Rodrigues, P.P.: Partial multiple imputation with variational autoencoders: tackling not at randomness in healthcare data. IEEE J. Biomed. Health Inf. 26(8), 4218–4227 (2022)
Pereira, R.C., Santos, M.S., Rodrigues, P.P., Abreu, P.H.: Reviewing autoencoders for missing data imputation: technical trends, applications and outcomes. J. Artif. Intell. Res. 69, 1255–1285 (2020)
Qiu, Y.L., Zheng, H., Gevaert, O.: Genomic data imputation with variational auto-encoders. GigaScience 9(8) (2020)
Rezvan, P.H., Lee, K.J., Simpson, J.A.: Sensitivity analysis within multiple imputation framework using delta-adjustment: application to longitudinal study of australian children. Longitudinal Life Course Stud. 9(3), 259–278 (2018)
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys, vol. 81. John Wiley & Sons, New York (2004)
Santos, M.S., Abreu, P.H., García-Laencina, P.J., Simão, A., Carvalho, A.: A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J. Biomed. Inf. 58, 49–59 (2015)
Santos, M.S., Pereira, R.C., Costa, A.F., Soares, J.P., Santos, J., Abreu, P.H.: Generating synthetic missing data: a review by missing mechanism. IEEE Access 7, 11651–11667 (2019)
Tan, P.T., Cro, S., Van Vogt, E., Szigeti, M., Cornelius, V.R.: A review of the use of controlled multiple imputation in randomised controlled trials with missing outcome data. BMC Med. Res. Methodol. 21(1), 1–17 (2021)
Twala, B.: An empirical comparison of techniques for handling incomplete data using decision trees. Appl. Artif. Intell. 23(5), 373–405 (2009)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine learning, pp. 1096–1103 (2008)
White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30(4), 377–399 (2011)
Xia, J., et al.: Adjusted weight voting algorithm for random forests in handling missing values. Pattern Recogn. 69, 52–60 (2017)
Yoon, J., Jordon, J., Schaar, M.: Gain: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning, pp. 5689–5698. PMLR (2018)
Acknowledgements
This work is supported in part by the FCT - Foundation for Science and Technology, I.P., Research Grant SFRH/BD/149018/2019. This work is also funded by the FCT - Foundation for Science and Technology, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R &D Unit - UIDB/00326/2020 or project code UIDP/00326/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pereira, R.C., Rodrigues, P.P., Figueiredo, M.A.T., Abreu, P.H. (2023). Automatic Delta-Adjustment Method Applied to Missing Not At Random Imputation. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14073. Springer, Cham. https://doi.org/10.1007/978-3-031-35995-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-35995-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35994-1
Online ISBN: 978-3-031-35995-8
eBook Packages: Computer ScienceComputer Science (R0)