Abstract
The paper refers to the problem of decision making and choosing appropriate ways for decreasing the level of input information uncertainty related to absence or unavailability some values of mixed data sets. Approaches to addressing missing data and evaluating their performance are discussed. The generalized strategy to managing data with missing values is proposed. The study based on real pregnancy-related records of 186 patients from 12 to 42 weeks of gestation. Three missing data techniques: complete ignoring, case deletion, and random forest (RF) missing data imputation were applied to the medical data of various types, under a missing completely at random assumption for solving classification task and softening the negative impact of input information uncertainty. The efficiency of approaches to deal with missingness was evaluated. Results demonstrated that case deletion and ignoring missing values were the less suitable to handle mixed types of missing data and suggested RF imputation as a useful approach for imputing complex pregnancy-related data sets with missing data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Magnani, M.: Techniques for dealing with missing data in knowledge discovery tasks, 15(01), 2007 (2004). http://magnanim.web.cs.unibo.it/index.html
Skarga-Bandurova, I., Biloborodova, T.: Exploratory data analysis to identifying meaningful factors of hypoxic fetal injuries. Inf. Model. 44(1216), 122–135 (2016). Herald of the NTU “KhPI”. NTU “KhPI”, Kharkov. https://doi.org/10.20998/2411-0558.2016.44.09
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys, vol. 81. Wiley, Hoboken (2004)
Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5), e0155119 (2016)
Conroy, B., Eshelman, L., Potes, C., Xu-Wilson, M.: A dynamic ensemble approach to robust classification in the presence of missing data. Mach. Learn. 102(3), 443–463 (2016). https://doi.org/10.1007/s10994-015-5530-z
Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell.: Int. J. 17(5–6), 519–533 (2003)
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biom. Biostat. 6(224), 1 (2015). https://doi.org/10.4172/2155-6180.1000224
Ibrahim, J.G., Molenberghs, G.: Missing data methods in longitudinal studies: a review. Test 18(1), 1–43 (2009)
He, Y.: Missing data analysis using multiple imputation. Circ.: Cardiovasc. Qual. Outcomes 3(1), 98–105 (2010)
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, 2088–2096 (2003). https://doi.org/10.1093/bioinformatics/btg287
Calikli, G., Bener, A.: An algorithmic approach to missing data problem in modeling human aspects in software development. In: Proceedings of 9th International Conference on Predictive Models in Software Engineering, p. 10. ACM, New York (2013)
Fu, Y.Z.: Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses. Commun. Stat.-Theory Methods 45(20), 5918–5932 (2016)
Finch, W.H.: Imputation methods for missing categorical questionnaire data: a comparison of approaches. J. Data Sci. 8, 361–378 (2010)
Yelipea, U.R., Porikab, S., Gollaa M.: An efficient approach for imputation and classification of medical data values using class-based clustering of medical records. Comput. Electr. Eng. In Press. https://doi.org/10.1016/j.compeleceng.2017.11.030
Tang, F., Ishwaran, H.: Random forest missing data algorithms. Stat. Anal. Data Min.: ASA Data Sci. J. 10, 363–377 (2017). https://doi.org/10.1002/sam.11348
Breiman, L., Cutler, A.: Manual on Setting Up, Using, and Understanding Random Forests V3.1. University of California, Berkeley (2002). http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf
Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J.Epidemiol. 179(6), 764–774 (2014)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)
García-Laencina, P.J., Morales-Sánchez, J., Verdú-Monedero, R., Larrey-Ruiz, J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Classification with incomplete data. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques: Algorithms, Methods, and Techniques, pp. 147–175 (2009)
Hair, J.F., et al.: Multivariate Data Analysis. Prentice Hall, Upper Saddle River (2016)
Scheffer, J.: Dealing with missing data. Res. Lett. Inf. Math. Sci. 3, 153–160 (2002)
Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognit. 41(12), 3692–3705 (2008)
Peugh, J.L., Enders, C.K.: Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev. Educ. Res. 74(4), 525–556 (2004)
Huisman, M.: Imputation of missing network data: some simple procedures. J. Soc. Struct. 10(1), 1–29 (2009)
Doidge, J.C.: Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random. Stat. Methods Med. Res., 1–15 (2016). https://doi.org/10.1177/0962280216628902
Cheema, J.R.: Some general guidelines for choosing missing data handling methods in educational research. J. Modern Appl. Stat. Methods 13(2), 53–75 (2014)
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)
Frank, E., Hall, M.A., Witten, I.H.: The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4 edn. Morgan Kaufmann (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Skarga-Bandurova, I., Biloborodova, T., Dyachenko, Y. (2018). Strategy to Managing Mixed Datasets with Missing Items. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-91476-3_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)