Abstract
Missing value imputation is a problem often meet when working with medical and biometric data sets. Prior to working on these datasets, missing values have to be eliminated. It could be done by imputing estimated values. However, imputation should not bias data, nor alter the class balance. This paper presents an innovative approach to the problem of imputation of missing values in the training data for the classification. Method uses the k-NN classifier on a separate features to impute missing values. The unique approach used in this method allows using data from incomplete vectors to impute another incomplete vectors, unlike in conventional methods, where only complete vectors could be used in the imputation process. The paper also describes a test protocol, where the Cross Validation with a Set Substitution method is used as an evaluation tool for scoring missing value imputation methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aeberhard, S., Coomans, D., De Vel, O.: Comparison of classifiers in high dimensional settings. Department of Mathematics and Statistics, James Cook University of North Queensland, Australia, Technical report 92-02 (1992)
Antal, B., Hajdu, A.: An ensemble-based system for automatic screening of diabetic retinopathy. Knowl. Based Sys. 60, 20–27 (2014)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Benavoli, A., Corani, G., Demsar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. arXiv e-prints (June 2016). https://arxiv.org/abs/1606.04316
Benavoli, A., Mangili, F., Corani, G., Zaffalon, M., Ruggeri, F.: A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2014, pp. 1–9 (2014). http://www.idsia.ch/~alessio/benavoli2014a.pdf
Ayres-de Campos, D., Bernardes, J., Garrido, A., Marques-de Sa, J., Pereira-Leite, L.: SisPorto 2.0: a program for automated analysis of cardiotocograms. J. Matern. Fetal Med. 9(5), 311–318 (2000)
Chhabra, G., Vashisht, V., Ranjan, J.: A review on missing data value estimation using imputation algorithm. J. Adv. Res. Dyn. Control Sys. 11, 312–318 (2019)
Donders, A.R.T., Van Der Heijden, G.J., Stijnen, T., Moons, K.G.: A gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)
Dong, Y., Peng, C.Y.J.: Principled missing data methods for researchers. Springerplus 2(1), 222 (2013)
Efron, B.: The Jackknife, the Bootstrap, and Other Resampling Plans, vol. 38 (1982)
Enders, C.K.: Applied Missing Data Analysis. Guilford Press (2010)
Er, O., Tanrikulu, A.C., Abakay, A., Temurtas, F.: An approach based on probabilistic neural network for diagnosis of Mesothelioma’s disease. Comput. Electr. Eng. 38(1), 75–81 (2012)
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)
Hox, J.J.: A review of current software for handling missing data. Kwantitatieve methoden 20, 123–138 (1999)
Khozeimeh, F., Alizadehsani, R., Roshanzamir, M., Khosravi, A., Layegh, P., Nahavandi, S.: An expert system for selecting wart treatment method. Comput. Biol. Med. 81, 167–175 (2017)
Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666
Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 53(2), 1487–1509 (2019). https://doi.org/10.1007/s10462-019-09709-4
Little, R., Rubin, D.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley (1987). https://books.google.pl/books?id=w40QAQAAIAAJ
Mangasarian, O.L., Street, W.N., Wolberg, W.H.: Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4), 570–577 (1995)
Porwik, P., Doroz, R., Wrobel, K.: A new signature similarity measure. In: 2009 World Congress on Nature and Biologically Inspired Computing (NaBIC), pp. 1022–1027. IEEE (2009)
Razavi-Far, R., Cheng, B., Saif, M., Ahmadi, M.: Similarity-learning information-fusion schemes for missing data imputation. Knowl. Based Sys. 187, 104805 (2020)
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biom. Biostat. 6(1), 1 (2015)
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4(1), 9 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Orczyk, T., Doroz, R., Porwik, P. (2021). Missing Value Imputation Method Using Separate Features Nearest Neighbors Algorithm. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-77967-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)