Skip to main content

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

  • Conference paper
  • First Online:
Emerging Trends in Intelligent Computing and Informatics (IRICT 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1073))

  • 1656 Accesses

Abstract

One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this problem, but these techniques are without recourse to the characteristics that cause the missingness. In this paper, we investigated the causes of missing data in a medical dataset and proposed multiple imputation technique to solving the problem of missing data. A 5-fold-iteration multiple imputation was employed. The whole missing values in the dataset was regenerated 100%. The imputed datasets were validated using extreme learning machine (ELM) classifier. The results show improvement on the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with different classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Simpson, J.A., Moreno-Betancur, M., Lee, K.J., et al.: Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study. BMC Med. Res. Methodol. 19, 1–14 (2019). https://doi.org/10.1186/s12874-018-0653-0

    Article  Google Scholar 

  2. Zhang, Z.: Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann. Transl. Med. 4, 30 (2016). https://doi.org/10.3978/j.issn.2305-5839.2015.12.63

    Article  Google Scholar 

  3. Choi, J., Dekkers, O.M., le Cessie, S.: A comparison of different methods to handle missing data in the context of propensity score analysis. Eur. J. Epidemiol. 34, 23–36 (2019). https://doi.org/10.1007/s10654-018-0447-z

    Article  Google Scholar 

  4. Zhang, F., Tian, S., Chen, S., et al.: Voxel-based morphometry: improving the diagnosis of Alzheimer’s disease based on an extreme learning machine method from the ADNI cohort. Neuroscience 414, 273–279 (2019). https://doi.org/10.1016/j.neuroscience.2019.05.014

    Article  Google Scholar 

  5. Armina, R., Mohd Zain, A., Ali, N.A., Sallehuddin, R.: A review on missing value estimation using imputation algorithm. J. Phys. Conf. Ser. 892(1), 012004 (2017). https://doi.org/10.1088/1742-6596/892/1/012004

    Article  Google Scholar 

  6. Tsai, C.F., Chang, F.Y.: Combining instance selection for better missing value imputation. J. Syst. Softw. 122, 63–71 (2016). https://doi.org/10.1016/j.jss.2016.08.093

    Article  Google Scholar 

  7. Austin, P.C., Escobar, M.D.: Bayesian modeling of missing data in clinical research. Comput. Stat. Data Anal. 49, 821–836 (2005). https://doi.org/10.1016/j.csda.2004.06.006

    Article  MathSciNet  MATH  Google Scholar 

  8. Jakobsen, J.C., Gluud, C., Wetterslev, J., Winkel, P.: When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med. Res. Methodol. 17, 1–10 (2017). https://doi.org/10.1186/s12874-017-0442-1

    Article  Google Scholar 

  9. Sovilj, D., Eirola, E., Miche, Y., et al.: Extreme learning machine for missing data using multiple imputations. Neurocomputing 174, 220–231 (2015). https://doi.org/10.1016/j.neucom.2015.03.108

    Article  Google Scholar 

  10. Falcaro, M., Carpenter, J.R.: Correcting bias due to missing stage data in the non-parametric estimation of stage-specific net survival for colorectal cancer using multiple imputation. Int. J. Cancer Epidemiol. Detect. Prev. 48, 16–21 (2017). https://doi.org/10.1016/j.canep.2017.02.005

    Article  Google Scholar 

  11. Tran, C.T., Zhang, M., Andreae, P., et al.: An ensemble of rule-based classifiers for incomplete data. In: Proceedings 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems IES, pp. 7–12. IEEE (2017)

    Google Scholar 

  12. Rodwell, L., Lee, K.J., Romaniuk, H., Carlin, J.B.: Comparison of methods for imputing limited-range variables: a simulation study. BMC Med. Res. Methodol. 14, 1–11 (2014). https://doi.org/10.1186/1471-2288-14-57

    Article  Google Scholar 

  13. Yin, Y., Zhao, Y., Zhang, B., et al.: Enhancing ELM by Markov boundary based feature selection. Neurocomputing 261, 57–69 (2017). https://doi.org/10.1016/j.neucom.2016.09.119

    Article  Google Scholar 

  14. Strack, B., Deshazo, J.P., Gennings, C., et al.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. Biomed. Res. Int. 2014, 11 (2014)

    Article  Google Scholar 

  15. Little, R.J.A.: A test of missing completely at random. J. Am. Stat. Assoc. 83, 1198–1202 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oyekale Abel Alade .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alade, O.A., Sallehuddin, R., Radzi, N.H.M., Selamat, A. (2020). Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_9

Download citation

Publish with us

Policies and ethics