Automatic Delta-Adjustment Method Applied to Missing Not At Random Imputation

Pereira, Ricardo Cardoso; Rodrigues, Pedro Pereira; Figueiredo, Mário A. T.; Abreu, Pedro Henriques

doi:10.1007/978-3-031-35995-8_34

Ricardo Cardoso Pereira¹³,
Pedro Pereira Rodrigues¹⁴,
Mário A. T. Figueiredo^15,16 &
…
Pedro Henriques Abreu¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14073))

Included in the following conference series:

International Conference on Computational Science

779 Accesses

Abstract

Missing data can be described by the absence of values in a dataset, which can be a critical issue in domains such as healthcare. A common solution for this problem is imputation, where the missing values are replaced by estimations. Most imputation methods are suitable for the Missing Completely At Random (MCAR) and Missing At Random (MAR) mechanisms but produce biased results for Missing Not At Random (MNAR) values. An effective approach to mitigate this bias effect is to use the delta-adjustment method. This method assumes the imputation is performed for the MAR mechanism and adjusts the imputed values to become valid under MNAR assumptions by applying a correction factor. Such adjustment is usually defined manually by a domain expert, which often makes this method unfeasible. In this work, we propose an automatic procedure to find an approximate delta adjustment value for every feature of the dataset, which we call Automatic Delta-Adjustment Method. The proposed procedure is validated in an experimental setup comprising 10 datasets of the healthcare domain injected with MNAR values. The results from seven state-of-the-art imputation methods are compared with and without the adjustment, and applying the correction provides a significantly lower imputation error for all methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
PCA is a feature extraction technique often used for dimensionality reduction. It computes the principal components of the data and returns the first n, which is a user-defined parameter [1].
2.
https://github.com/jsyoon0823/GAIN.
3.
https://github.com/ricardodcpereira/ADAM.
4.
https://archive.ics.uci.edu/ml.

References

Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev.: Comput. Stat. 2(4), 433–459 (2010)
Article Google Scholar
Austin, P.C., White, I.R., Lee, D.S., van Buuren, S.: Missing data in clinical research: a tutorial on multiple imputation. Can. J. Cardiol. 37(9), 1322–1331 (2020)
Article Google Scholar
Beaulieu-Jones, B.K., Lavage, D.R., Snyder, J.W., Moore, J.H., Pendergrass, S.A., Bauer, C.R.: Characterizing and managing missing structured data in electronic health records: data analysis. JMIR Med. Inf. 6(1), e11 (2018)
Article Google Scholar
Van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2010)
Google Scholar
Carreras, G., et al.: Missing not at random in end of life care studies: multiple imputation and sensitivity analysis on data from the action study. BMC Med. Res. Methodol. 21(1), 1–12 (2021)
Article MathSciNet Google Scholar
García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010)
Article Google Scholar
Gondara, L., Wang, K.: Recovering loss to followup information using denoising autoencoders. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1936–1945 (2017)
Google Scholar
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Chapter Google Scholar
Leacy, F.P., Floyd, S., Yates, T.A., White, I.R.: Analyses of sensitivity to the missing-at-random assumption using multiple imputation with delta adjustment: application to a tuberculosis/HIV prevalence survey with incomplete HIV-status data. Am. J. Epidemiol. 185(4), 304–315 (2017)
Google Scholar
Leurent, B., Gomes, M., Faria, R., Morris, S., Grieve, R., Carpenter, J.R.: Sensitivity analysis for not-at-random missing data in trial-based cost-effectiveness analysis: a tutorial. Pharmacoeconomics 36(8), 889–901 (2018)
Article Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. John Wiley & Sons, New York (2019)
MATH Google Scholar
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287–2322 (2010)
MathSciNet MATH Google Scholar
McCoy, J.T., Kroon, S., Auret, L.: Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51(21), 141–146 (2018)
Article Google Scholar
Peek, N., Rodrigues, P.P.: Three controversies in health data science. Int. J. Data Sci. Anal. 6(3), 261–269 (2018). https://doi.org/10.1007/s41060-018-0109-y
Article Google Scholar
Pereira, R.C., Abreu, P.H., Rodrigues, P.P.: Partial multiple imputation with variational autoencoders: tackling not at randomness in healthcare data. IEEE J. Biomed. Health Inf. 26(8), 4218–4227 (2022)
Article Google Scholar
Pereira, R.C., Santos, M.S., Rodrigues, P.P., Abreu, P.H.: Reviewing autoencoders for missing data imputation: technical trends, applications and outcomes. J. Artif. Intell. Res. 69, 1255–1285 (2020)
Article MathSciNet Google Scholar
Qiu, Y.L., Zheng, H., Gevaert, O.: Genomic data imputation with variational auto-encoders. GigaScience 9(8) (2020)
Google Scholar
Rezvan, P.H., Lee, K.J., Simpson, J.A.: Sensitivity analysis within multiple imputation framework using delta-adjustment: application to longitudinal study of australian children. Longitudinal Life Course Stud. 9(3), 259–278 (2018)
Article Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet MATH Google Scholar
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys, vol. 81. John Wiley & Sons, New York (2004)
MATH Google Scholar
Santos, M.S., Abreu, P.H., García-Laencina, P.J., Simão, A., Carvalho, A.: A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J. Biomed. Inf. 58, 49–59 (2015)
Article Google Scholar
Santos, M.S., Pereira, R.C., Costa, A.F., Soares, J.P., Santos, J., Abreu, P.H.: Generating synthetic missing data: a review by missing mechanism. IEEE Access 7, 11651–11667 (2019)
Article Google Scholar
Tan, P.T., Cro, S., Van Vogt, E., Szigeti, M., Cornelius, V.R.: A review of the use of controlled multiple imputation in randomised controlled trials with missing outcome data. BMC Med. Res. Methodol. 21(1), 1–17 (2021)
Article Google Scholar
Twala, B.: An empirical comparison of techniques for handling incomplete data using decision trees. Appl. Artif. Intell. 23(5), 373–405 (2009)
Article Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine learning, pp. 1096–1103 (2008)
Google Scholar
White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30(4), 377–399 (2011)
Article MathSciNet Google Scholar
Xia, J., et al.: Adjusted weight voting algorithm for random forests in handling missing values. Pattern Recogn. 69, 52–60 (2017)
Article Google Scholar
Yoon, J., Jordon, J., Schaar, M.: Gain: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning, pp. 5689–5698. PMLR (2018)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the FCT - Foundation for Science and Technology, I.P., Research Grant SFRH/BD/149018/2019. This work is also funded by the FCT - Foundation for Science and Technology, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R &D Unit - UIDB/00326/2020 or project code UIDP/00326/2020.

Author information

Authors and Affiliations

Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, 3030-290, Coimbra, Portugal
Ricardo Cardoso Pereira & Pedro Henriques Abreu
Center for Health Technology and Services Research, Faculty of Medicine (MEDCIDS), University of Porto, 4200-319, Porto, Portugal
Pedro Pereira Rodrigues
Instituto Superior Técnico, University of Lisbon, 1049-001, Lisbon, Portugal
Mário A. T. Figueiredo
Instituto de Telecomunicações, 1049-001, Lisbon, Portugal
Mário A. T. Figueiredo

Authors

Ricardo Cardoso Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Pereira Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Mário A. T. Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Henriques Abreu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Cardoso Pereira .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jiří Mikyška
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pereira, R.C., Rodrigues, P.P., Figueiredo, M.A.T., Abreu, P.H. (2023). Automatic Delta-Adjustment Method Applied to Missing Not At Random Imputation. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14073. Springer, Cham. https://doi.org/10.1007/978-3-031-35995-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-35995-8_34
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35994-1
Online ISBN: 978-3-031-35995-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Delta-Adjustment Method Applied to Missing Not At Random Imputation