Skip to main content

Abstract

The paper refers to the problem of decision making and choosing appropriate ways for decreasing the level of input information uncertainty related to absence or unavailability some values of mixed data sets. Approaches to addressing missing data and evaluating their performance are discussed. The generalized strategy to managing data with missing values is proposed. The study based on real pregnancy-related records of 186 patients from 12 to 42 weeks of gestation. Three missing data techniques: complete ignoring, case deletion, and random forest (RF) missing data imputation were applied to the medical data of various types, under a missing completely at random assumption for solving classification task and softening the negative impact of input information uncertainty. The efficiency of approaches to deal with missingness was evaluated. Results demonstrated that case deletion and ignoring missing values were the less suitable to handle mixed types of missing data and suggested RF imputation as a useful approach for imputing complex pregnancy-related data sets with missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Magnani, M.: Techniques for dealing with missing data in knowledge discovery tasks, 15(01), 2007 (2004). http://magnanim.web.cs.unibo.it/index.html

  2. Skarga-Bandurova, I., Biloborodova, T.: Exploratory data analysis to identifying meaningful factors of hypoxic fetal injuries. Inf. Model. 44(1216), 122–135 (2016). Herald of the NTU “KhPI”. NTU “KhPI”, Kharkov. https://doi.org/10.20998/2411-0558.2016.44.09

    Google Scholar 

  3. Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys, vol. 81. Wiley, Hoboken (2004)

    MATH  Google Scholar 

  4. Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5), e0155119 (2016)

    Article  Google Scholar 

  5. Conroy, B., Eshelman, L., Potes, C., Xu-Wilson, M.: A dynamic ensemble approach to robust classification in the presence of missing data. Mach. Learn. 102(3), 443–463 (2016). https://doi.org/10.1007/s10994-015-5530-z

    Article  MathSciNet  MATH  Google Scholar 

  6. Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell.: Int. J. 17(5–6), 519–533 (2003)

    Article  Google Scholar 

  7. Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biom. Biostat. 6(224), 1 (2015). https://doi.org/10.4172/2155-6180.1000224

    Article  Google Scholar 

  8. Ibrahim, J.G., Molenberghs, G.: Missing data methods in longitudinal studies: a review. Test 18(1), 1–43 (2009)

    Article  MathSciNet  Google Scholar 

  9. He, Y.: Missing data analysis using multiple imputation. Circ.: Cardiovasc. Qual. Outcomes 3(1), 98–105 (2010)

    MathSciNet  Google Scholar 

  10. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, 2088–2096 (2003). https://doi.org/10.1093/bioinformatics/btg287

    Article  Google Scholar 

  11. Calikli, G., Bener, A.: An algorithmic approach to missing data problem in modeling human aspects in software development. In: Proceedings of 9th International Conference on Predictive Models in Software Engineering, p. 10. ACM, New York (2013)

    Google Scholar 

  12. Fu, Y.Z.: Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses. Commun. Stat.-Theory Methods 45(20), 5918–5932 (2016)

    Article  MathSciNet  Google Scholar 

  13. Finch, W.H.: Imputation methods for missing categorical questionnaire data: a comparison of approaches. J. Data Sci. 8, 361–378 (2010)

    Google Scholar 

  14. Yelipea, U.R., Porikab, S., Gollaa M.: An efficient approach for imputation and classification of medical data values using class-based clustering of medical records. Comput. Electr. Eng. In Press. https://doi.org/10.1016/j.compeleceng.2017.11.030

  15. Tang, F., Ishwaran, H.: Random forest missing data algorithms. Stat. Anal. Data Min.: ASA Data Sci. J. 10, 363–377 (2017). https://doi.org/10.1002/sam.11348

    Article  MathSciNet  Google Scholar 

  16. Breiman, L., Cutler, A.: Manual on Setting Up, Using, and Understanding Random Forests V3.1. University of California, Berkeley (2002). http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf

  17. Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J.Epidemiol. 179(6), 764–774 (2014)

    Article  Google Scholar 

  18. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)

    MATH  Google Scholar 

  19. García-Laencina, P.J., Morales-Sánchez, J., Verdú-Monedero, R., Larrey-Ruiz, J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Classification with incomplete data. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques: Algorithms, Methods, and Techniques, pp. 147–175 (2009)

    Google Scholar 

  20. Hair, J.F., et al.: Multivariate Data Analysis. Prentice Hall, Upper Saddle River (2016)

    Google Scholar 

  21. Scheffer, J.: Dealing with missing data. Res. Lett. Inf. Math. Sci. 3, 153–160 (2002)

    Google Scholar 

  22. Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognit. 41(12), 3692–3705 (2008)

    Article  Google Scholar 

  23. Peugh, J.L., Enders, C.K.: Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev. Educ. Res. 74(4), 525–556 (2004)

    Article  Google Scholar 

  24. Huisman, M.: Imputation of missing network data: some simple procedures. J. Soc. Struct. 10(1), 1–29 (2009)

    MathSciNet  Google Scholar 

  25. Doidge, J.C.: Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random. Stat. Methods Med. Res., 1–15 (2016). https://doi.org/10.1177/0962280216628902

  26. Cheema, J.R.: Some general guidelines for choosing missing data handling methods in educational research. J. Modern Appl. Stat. Methods 13(2), 53–75 (2014)

    Article  Google Scholar 

  27. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)

    Google Scholar 

  28. Frank, E., Hall, M.A., Witten, I.H.: The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4 edn. Morgan Kaufmann (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inna Skarga-Bandurova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Skarga-Bandurova, I., Biloborodova, T., Dyachenko, Y. (2018). Strategy to Managing Mixed Datasets with Missing Items. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91476-3_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91475-6

  • Online ISBN: 978-3-319-91476-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics