Abstract
One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this problem, but these techniques are without recourse to the characteristics that cause the missingness. In this paper, we investigated the causes of missing data in a medical dataset and proposed multiple imputation technique to solving the problem of missing data. A 5-fold-iteration multiple imputation was employed. The whole missing values in the dataset was regenerated 100%. The imputed datasets were validated using extreme learning machine (ELM) classifier. The results show improvement on the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with different classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Simpson, J.A., Moreno-Betancur, M., Lee, K.J., et al.: Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study. BMC Med. Res. Methodol. 19, 1–14 (2019). https://doi.org/10.1186/s12874-018-0653-0
Zhang, Z.: Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann. Transl. Med. 4, 30 (2016). https://doi.org/10.3978/j.issn.2305-5839.2015.12.63
Choi, J., Dekkers, O.M., le Cessie, S.: A comparison of different methods to handle missing data in the context of propensity score analysis. Eur. J. Epidemiol. 34, 23–36 (2019). https://doi.org/10.1007/s10654-018-0447-z
Zhang, F., Tian, S., Chen, S., et al.: Voxel-based morphometry: improving the diagnosis of Alzheimer’s disease based on an extreme learning machine method from the ADNI cohort. Neuroscience 414, 273–279 (2019). https://doi.org/10.1016/j.neuroscience.2019.05.014
Armina, R., Mohd Zain, A., Ali, N.A., Sallehuddin, R.: A review on missing value estimation using imputation algorithm. J. Phys. Conf. Ser. 892(1), 012004 (2017). https://doi.org/10.1088/1742-6596/892/1/012004
Tsai, C.F., Chang, F.Y.: Combining instance selection for better missing value imputation. J. Syst. Softw. 122, 63–71 (2016). https://doi.org/10.1016/j.jss.2016.08.093
Austin, P.C., Escobar, M.D.: Bayesian modeling of missing data in clinical research. Comput. Stat. Data Anal. 49, 821–836 (2005). https://doi.org/10.1016/j.csda.2004.06.006
Jakobsen, J.C., Gluud, C., Wetterslev, J., Winkel, P.: When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med. Res. Methodol. 17, 1–10 (2017). https://doi.org/10.1186/s12874-017-0442-1
Sovilj, D., Eirola, E., Miche, Y., et al.: Extreme learning machine for missing data using multiple imputations. Neurocomputing 174, 220–231 (2015). https://doi.org/10.1016/j.neucom.2015.03.108
Falcaro, M., Carpenter, J.R.: Correcting bias due to missing stage data in the non-parametric estimation of stage-specific net survival for colorectal cancer using multiple imputation. Int. J. Cancer Epidemiol. Detect. Prev. 48, 16–21 (2017). https://doi.org/10.1016/j.canep.2017.02.005
Tran, C.T., Zhang, M., Andreae, P., et al.: An ensemble of rule-based classifiers for incomplete data. In: Proceedings 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems IES, pp. 7–12. IEEE (2017)
Rodwell, L., Lee, K.J., Romaniuk, H., Carlin, J.B.: Comparison of methods for imputing limited-range variables: a simulation study. BMC Med. Res. Methodol. 14, 1–11 (2014). https://doi.org/10.1186/1471-2288-14-57
Yin, Y., Zhao, Y., Zhang, B., et al.: Enhancing ELM by Markov boundary based feature selection. Neurocomputing 261, 57–69 (2017). https://doi.org/10.1016/j.neucom.2016.09.119
Strack, B., Deshazo, J.P., Gennings, C., et al.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. Biomed. Res. Int. 2014, 11 (2014)
Little, R.J.A.: A test of missing completely at random. J. Am. Stat. Assoc. 83, 1198–1202 (1988)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Alade, O.A., Sallehuddin, R., Radzi, N.H.M., Selamat, A. (2020). Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-33582-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33581-6
Online ISBN: 978-3-030-33582-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)