Strategy to Managing Mixed Datasets with Missing Items

Skarga-Bandurova, Inna; Biloborodova, Tetiana; Dyachenko, Yuriy

doi:10.1007/978-3-319-91476-3_50

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 854))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1109 Accesses
1 Citations

Abstract

The paper refers to the problem of decision making and choosing appropriate ways for decreasing the level of input information uncertainty related to absence or unavailability some values of mixed data sets. Approaches to addressing missing data and evaluating their performance are discussed. The generalized strategy to managing data with missing values is proposed. The study based on real pregnancy-related records of 186 patients from 12 to 42 weeks of gestation. Three missing data techniques: complete ignoring, case deletion, and random forest (RF) missing data imputation were applied to the medical data of various types, under a missing completely at random assumption for solving classification task and softening the negative impact of input information uncertainty. The efficiency of approaches to deal with missingness was evaluated. Results demonstrated that case deletion and ignoring missing values were the less suitable to handle mixed types of missing data and suggested RF imputation as a useful approach for imputing complex pregnancy-related data sets with missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Magnani, M.: Techniques for dealing with missing data in knowledge discovery tasks, 15(01), 2007 (2004). http://magnanim.web.cs.unibo.it/index.html
Skarga-Bandurova, I., Biloborodova, T.: Exploratory data analysis to identifying meaningful factors of hypoxic fetal injuries. Inf. Model. 44(1216), 122–135 (2016). Herald of the NTU “KhPI”. NTU “KhPI”, Kharkov. https://doi.org/10.20998/2411-0558.2016.44.09
Google Scholar
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys, vol. 81. Wiley, Hoboken (2004)
MATH Google Scholar
Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5), e0155119 (2016)
Article Google Scholar
Conroy, B., Eshelman, L., Potes, C., Xu-Wilson, M.: A dynamic ensemble approach to robust classification in the presence of missing data. Mach. Learn. 102(3), 443–463 (2016). https://doi.org/10.1007/s10994-015-5530-z
Article MathSciNet MATH Google Scholar
Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell.: Int. J. 17(5–6), 519–533 (2003)
Article Google Scholar
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biom. Biostat. 6(224), 1 (2015). https://doi.org/10.4172/2155-6180.1000224
Article Google Scholar
Ibrahim, J.G., Molenberghs, G.: Missing data methods in longitudinal studies: a review. Test 18(1), 1–43 (2009)
Article MathSciNet Google Scholar
He, Y.: Missing data analysis using multiple imputation. Circ.: Cardiovasc. Qual. Outcomes 3(1), 98–105 (2010)
MathSciNet Google Scholar
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, 2088–2096 (2003). https://doi.org/10.1093/bioinformatics/btg287
Article Google Scholar
Calikli, G., Bener, A.: An algorithmic approach to missing data problem in modeling human aspects in software development. In: Proceedings of 9th International Conference on Predictive Models in Software Engineering, p. 10. ACM, New York (2013)
Google Scholar
Fu, Y.Z.: Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses. Commun. Stat.-Theory Methods 45(20), 5918–5932 (2016)
Article MathSciNet Google Scholar
Finch, W.H.: Imputation methods for missing categorical questionnaire data: a comparison of approaches. J. Data Sci. 8, 361–378 (2010)
Google Scholar
Yelipea, U.R., Porikab, S., Gollaa M.: An efficient approach for imputation and classification of medical data values using class-based clustering of medical records. Comput. Electr. Eng. In Press. https://doi.org/10.1016/j.compeleceng.2017.11.030
Tang, F., Ishwaran, H.: Random forest missing data algorithms. Stat. Anal. Data Min.: ASA Data Sci. J. 10, 363–377 (2017). https://doi.org/10.1002/sam.11348
Article MathSciNet Google Scholar
Breiman, L., Cutler, A.: Manual on Setting Up, Using, and Understanding Random Forests V3.1. University of California, Berkeley (2002). http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf
Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J.Epidemiol. 179(6), 764–774 (2014)
Article Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)
MATH Google Scholar
García-Laencina, P.J., Morales-Sánchez, J., Verdú-Monedero, R., Larrey-Ruiz, J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Classification with incomplete data. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques: Algorithms, Methods, and Techniques, pp. 147–175 (2009)
Google Scholar
Hair, J.F., et al.: Multivariate Data Analysis. Prentice Hall, Upper Saddle River (2016)
Google Scholar
Scheffer, J.: Dealing with missing data. Res. Lett. Inf. Math. Sci. 3, 153–160 (2002)
Google Scholar
Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognit. 41(12), 3692–3705 (2008)
Article Google Scholar
Peugh, J.L., Enders, C.K.: Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev. Educ. Res. 74(4), 525–556 (2004)
Article Google Scholar
Huisman, M.: Imputation of missing network data: some simple procedures. J. Soc. Struct. 10(1), 1–29 (2009)
MathSciNet Google Scholar
Doidge, J.C.: Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random. Stat. Methods Med. Res., 1–15 (2016). https://doi.org/10.1177/0962280216628902
Cheema, J.R.: Some general guidelines for choosing missing data handling methods in educational research. J. Modern Appl. Stat. Methods 13(2), 53–75 (2014)
Article Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)
Google Scholar
Frank, E., Hall, M.A., Witten, I.H.: The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4 edn. Morgan Kaufmann (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Volodymyr Dahl East Ukrainian National University, Severodonetsk, Ukraine
Inna Skarga-Bandurova, Tetiana Biloborodova & Yuriy Dyachenko

Authors

Inna Skarga-Bandurova
View author publications
You can also search for this author in PubMed Google Scholar
Tetiana Biloborodova
View author publications
You can also search for this author in PubMed Google Scholar
Yuriy Dyachenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inna Skarga-Bandurova .

Editor information

Editors and Affiliations

Universidad de Cádiz, Cádiz, Cadiz, Spain
Jesús Medina
Universidad de Málaga, Málaga, Málaga, Spain
Manuel Ojeda-Aciego
Universidad de Granada, Granada, Spain
José Luis Verdegay
Universidad de Granada, Granada, Spain
David A. Pelta
Universidad de Málaga, Málaga, Málaga, Spain
Inma P. Cabrera
LIP6, Université Pierre et Marie Curie, CNRS, Paris, France
Bernadette Bouchon-Meunier
Iona College, New Rochelle, New York, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Skarga-Bandurova, I., Biloborodova, T., Dyachenko, Y. (2018). Strategy to Managing Mixed Datasets with Missing Items. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-91476-3_50
Published: 18 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics