Evaluating Imputation Methods for Missing Data in a MCI Dataset

Batanero, Alba Gómez-Valadés; Zamorano, Mariano Rincón; Tomás, Rafael Martínez; Martín, Juan Guerrero

doi:10.1007/978-3-031-06242-1_44

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13258))

Included in the following conference series:

International Work-Conference on the Interplay Between Natural and Artificial Computation

1311 Accesses

Abstract

Missing data is a recurrent problem in experimental studies, mostly in clinical and sociodemographic longitudinal studies due to the dropout and the negative of some subjects to answer or perform some tests. To address this problem different strategies have been designed to deal with missing values, but incorrect treatment of missing data can result in the database being biased in one or more parameters, compromising the viability of the database and future studies. To solve this problem different imputation techniques have been developed over the last decades. However, there are no regulations or clear guidelines to deal with these situations. In this study, we will analyze and impute a real, incomplete database for the early detection of MCI, where the loss of values on 3 main variables is strongly correlated with the years of studies. The imputation will follow two strategies: assuming that those people would have got a bad scoring if they had taken the test, defining a ceiling score, and a multiple imputation by fully conditional specification. To determine if any kind of bias in mean and variance has been introduced during the imputation, the original database was compared with the imputed databases. Taking a p-value = 0.1 threshold, the database imputed by the multiple imputation method is the one that best preserved the information of the original database, making it the more appropriate imputation method for this MCI database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nguyen, C.D., Carlin, J.B., Lee, K.J.: Model checking in multiple imputation: an overview and case study. Emerg. Themes Epidemiol. 14(1), 8 (2017)
Article Google Scholar
Sterne, J.A.C.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009)
Google Scholar
Jakobsen, J.C., Gluud, C., Wetterslev, J., Winkel, P.: When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med. Res. Methodol. 17(1), 162 (2017)
Article Google Scholar
Groenwold, R.H.H., Moons, K.G.M., Vandenbroucke, J.P.: Randomized trials with missing outcome data: how to analyze and what to report. Can. Med. Assoc. J. 186(15), 1153–1157 (2014)
Article Google Scholar
Hughes, R.A., Heron, J., Sterne, J.A.C., Tilling, K.: Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int. J. Epidemiol. 48(4), 1294–1304 (2019)
Article Google Scholar
Rubin, D.R.: Inference and missing data. Biometrika 63(3), 581–590 (1976)
Article Google Scholar
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Book Google Scholar
Dziura, J.D., Post, L.A., Zhao, Q., Fu, Z., Peduzzi, P.: Strategies for dealing with Missing data in clinical trials: from design to analysis. Yale J. Biol. Med. 86, 343–8358 (2013)
PubMed PubMed Central Google Scholar
Choi, J., Dekkers, O.M., le Cessie, S.: A comparison of different methods to handle missing data in the context of propensity score analysis. Eur. J. Epidemiol. 34(1), 23–36 (2018). https://doi.org/10.1007/s10654-018-0447-z
Article CAS PubMed PubMed Central Google Scholar
Marlin, B.M., Roweis, S.T., Zemel, R.S.: Unsupervised Learning with Non-Ignorable Missing. AISTATS (2005)
Google Scholar
Liu, Y., De, A.: Multiple imputation by fully conditional specification for dealing with missing data in a large epidemiologic study. Int. J. Stat. Med. Res. 4(3), 287–295 (2019)
Article Google Scholar
van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16(3), 219–242 (2007)
Article Google Scholar
Murray, J.S.: Multiple imputation: a review of practical and theoretical findings. Stat. Sci. 33(2), 142–159 (2018)
Article Google Scholar
Peraita, H., García-Herranz, S., Díaz-Mardomingo, M.C.: Evolution of specific cognitive subprofiles of mild cognitive impairment in a three-year longitudinal study. Curr. Aging Sci. 4, 171–182 (2011)
Article CAS Google Scholar
García-Herranz, S., Díaz-Mardomingo, M.C., Venero, C., Peraita, H.: Accuracy of verbal fluency tests in the discrimination of mild cognitive impairment and probable Alzheimer’s disease in older Spanish monolingual individuals. Neuropsychol. Dev. Cogn. Section B, Aging, Neuropsychol. Cogn. 27(6), 826–840 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Nacional de Educación a Distancia, 28040, Madrid, Spain
Alba Gómez-Valadés Batanero, Mariano Rincón Zamorano, Rafael Martínez Tomás & Juan Guerrero Martín

Authors

Alba Gómez-Valadés Batanero
View author publications
You can also search for this author in PubMed Google Scholar
Mariano Rincón Zamorano
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Martínez Tomás
View author publications
You can also search for this author in PubMed Google Scholar
Juan Guerrero Martín
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alba Gómez-Valadés Batanero .

Editor information

Editors and Affiliations

Universidad Politécnica de Cartagena, Cartagena, Spain
José Manuel Ferrández Vicente
Universidad Nacional de Educación a Distancia, Madrid, Spain
José Ramón Álvarez-Sánchez
Universidad Nacional de Educación a Distancia, Madrid, Spain
Félix de la Paz López
Ohio State University, Columbus, OH, USA
Hojjat Adeli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batanero, A.GV., Zamorano, M.R., Tomás, R.M., Martín, J.G. (2022). Evaluating Imputation Methods for Missing Data in a MCI Dataset. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Adeli, H. (eds) Artificial Intelligence in Neuroscience: Affective Analysis and Health Applications. IWINAC 2022. Lecture Notes in Computer Science, vol 13258. Springer, Cham. https://doi.org/10.1007/978-3-031-06242-1_44

Download citation

DOI: https://doi.org/10.1007/978-3-031-06242-1_44
Published: 24 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06241-4
Online ISBN: 978-3-031-06242-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Imputation Methods for Missing Data in a MCI Dataset