Missing Value Imputation Method Using Separate Features Nearest Neighbors Algorithm

Orczyk, Tomasz; Doroz, Rafał; Porwik, Piotr

doi:10.1007/978-3-030-77967-2_12

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12744))

Included in the following conference series:

International Conference on Computational Science

1066 Accesses
3 Citations

Abstract

Missing value imputation is a problem often meet when working with medical and biometric data sets. Prior to working on these datasets, missing values have to be eliminated. It could be done by imputing estimated values. However, imputation should not bias data, nor alter the class balance. This paper presents an innovative approach to the problem of imputation of missing values in the training data for the classification. Method uses the k-NN classifier on a separate features to impute missing values. The unique approach used in this method allows using data from incomplete vectors to impute another incomplete vectors, unlike in conventional methods, where only complete vectors could be used in the imputation process. The paper also describes a test protocol, where the Cross Validation with a Set Substitution method is used as an evaluation tool for scoring missing value imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aeberhard, S., Coomans, D., De Vel, O.: Comparison of classifiers in high dimensional settings. Department of Mathematics and Statistics, James Cook University of North Queensland, Australia, Technical report 92-02 (1992)
Google Scholar
Antal, B., Hajdu, A.: An ensemble-based system for automatic screening of diabetic retinopathy. Knowl. Based Sys. 60, 20–27 (2014)
Article Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Benavoli, A., Corani, G., Demsar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. arXiv e-prints (June 2016). https://arxiv.org/abs/1606.04316
Benavoli, A., Mangili, F., Corani, G., Zaffalon, M., Ruggeri, F.: A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2014, pp. 1–9 (2014). http://www.idsia.ch/~alessio/benavoli2014a.pdf
Ayres-de Campos, D., Bernardes, J., Garrido, A., Marques-de Sa, J., Pereira-Leite, L.: SisPorto 2.0: a program for automated analysis of cardiotocograms. J. Matern. Fetal Med. 9(5), 311–318 (2000)
Google Scholar
Chhabra, G., Vashisht, V., Ranjan, J.: A review on missing data value estimation using imputation algorithm. J. Adv. Res. Dyn. Control Sys. 11, 312–318 (2019)
Google Scholar
Donders, A.R.T., Van Der Heijden, G.J., Stijnen, T., Moons, K.G.: A gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)
Article Google Scholar
Dong, Y., Peng, C.Y.J.: Principled missing data methods for researchers. Springerplus 2(1), 222 (2013)
Article Google Scholar
Efron, B.: The Jackknife, the Bootstrap, and Other Resampling Plans, vol. 38 (1982)
Google Scholar
Enders, C.K.: Applied Missing Data Analysis. Guilford Press (2010)
Google Scholar
Er, O., Tanrikulu, A.C., Abakay, A., Temurtas, F.: An approach based on probabilistic neural network for diagnosis of Mesothelioma’s disease. Comput. Electr. Eng. 38(1), 75–81 (2012)
Article Google Scholar
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Hox, J.J.: A review of current software for handling missing data. Kwantitatieve methoden 20, 123–138 (1999)
Google Scholar
Khozeimeh, F., Alizadehsani, R., Roshanzamir, M., Khosravi, A., Layegh, P., Nahavandi, S.: An expert system for selecting wart treatment method. Comput. Biol. Med. 81, 167–175 (2017)
Article Google Scholar
Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666
Chapter Google Scholar
Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 53(2), 1487–1509 (2019). https://doi.org/10.1007/s10462-019-09709-4
Article Google Scholar
Little, R., Rubin, D.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley (1987). https://books.google.pl/books?id=w40QAQAAIAAJ
Mangasarian, O.L., Street, W.N., Wolberg, W.H.: Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4), 570–577 (1995)
Article MathSciNet Google Scholar
Porwik, P., Doroz, R., Wrobel, K.: A new signature similarity measure. In: 2009 World Congress on Nature and Biologically Inspired Computing (NaBIC), pp. 1022–1027. IEEE (2009)
Google Scholar
Razavi-Far, R., Cheng, B., Saif, M., Ahmadi, M.: Similarity-learning information-fusion schemes for missing data imputation. Knowl. Based Sys. 187, 104805 (2020)
Article Google Scholar
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biom. Biostat. 6(1), 1 (2015)
Google Scholar
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4(1), 9 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science and Technology, University of Silesia in Katowice, Bedzinska 39, 41-200, Sosnowiec, PL, Poland
Tomasz Orczyk, Rafał Doroz & Piotr Porwik

Authors

Tomasz Orczyk
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Doroz
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Porwik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Orczyk .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Orczyk, T., Doroz, R., Porwik, P. (2021). Missing Value Imputation Method Using Separate Features Nearest Neighbors Algorithm. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-77967-2_12
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics