Skip to main content

Missing Value Imputation Method Using Separate Features Nearest Neighbors Algorithm

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Abstract

Missing value imputation is a problem often meet when working with medical and biometric data sets. Prior to working on these datasets, missing values have to be eliminated. It could be done by imputing estimated values. However, imputation should not bias data, nor alter the class balance. This paper presents an innovative approach to the problem of imputation of missing values in the training data for the classification. Method uses the k-NN classifier on a separate features to impute missing values. The unique approach used in this method allows using data from incomplete vectors to impute another incomplete vectors, unlike in conventional methods, where only complete vectors could be used in the imputation process. The paper also describes a test protocol, where the Cross Validation with a Set Substitution method is used as an evaluation tool for scoring missing value imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aeberhard, S., Coomans, D., De Vel, O.: Comparison of classifiers in high dimensional settings. Department of Mathematics and Statistics, James Cook University of North Queensland, Australia, Technical report 92-02 (1992)

    Google Scholar 

  2. Antal, B., Hajdu, A.: An ensemble-based system for automatic screening of diabetic retinopathy. Knowl. Based Sys. 60, 20–27 (2014)

    Article  Google Scholar 

  3. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  4. Benavoli, A., Corani, G., Demsar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. arXiv e-prints (June 2016). https://arxiv.org/abs/1606.04316

  5. Benavoli, A., Mangili, F., Corani, G., Zaffalon, M., Ruggeri, F.: A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2014, pp. 1–9 (2014). http://www.idsia.ch/~alessio/benavoli2014a.pdf

  6. Ayres-de Campos, D., Bernardes, J., Garrido, A., Marques-de Sa, J., Pereira-Leite, L.: SisPorto 2.0: a program for automated analysis of cardiotocograms. J. Matern. Fetal Med. 9(5), 311–318 (2000)

    Google Scholar 

  7. Chhabra, G., Vashisht, V., Ranjan, J.: A review on missing data value estimation using imputation algorithm. J. Adv. Res. Dyn. Control Sys. 11, 312–318 (2019)

    Google Scholar 

  8. Donders, A.R.T., Van Der Heijden, G.J., Stijnen, T., Moons, K.G.: A gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)

    Article  Google Scholar 

  9. Dong, Y., Peng, C.Y.J.: Principled missing data methods for researchers. Springerplus 2(1), 222 (2013)

    Article  Google Scholar 

  10. Efron, B.: The Jackknife, the Bootstrap, and Other Resampling Plans, vol. 38 (1982)

    Google Scholar 

  11. Enders, C.K.: Applied Missing Data Analysis. Guilford Press (2010)

    Google Scholar 

  12. Er, O., Tanrikulu, A.C., Abakay, A., Temurtas, F.: An approach based on probabilistic neural network for diagnosis of Mesothelioma’s disease. Comput. Electr. Eng. 38(1), 75–81 (2012)

    Article  Google Scholar 

  13. Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  14. Hox, J.J.: A review of current software for handling missing data. Kwantitatieve methoden 20, 123–138 (1999)

    Google Scholar 

  15. Khozeimeh, F., Alizadehsani, R., Roshanzamir, M., Khosravi, A., Layegh, P., Nahavandi, S.: An expert system for selecting wart treatment method. Comput. Biol. Med. 81, 167–175 (2017)

    Article  Google Scholar 

  16. Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666

    Chapter  Google Scholar 

  17. Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 53(2), 1487–1509 (2019). https://doi.org/10.1007/s10462-019-09709-4

    Article  Google Scholar 

  18. Little, R., Rubin, D.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley (1987). https://books.google.pl/books?id=w40QAQAAIAAJ

  19. Mangasarian, O.L., Street, W.N., Wolberg, W.H.: Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4), 570–577 (1995)

    Article  MathSciNet  Google Scholar 

  20. Porwik, P., Doroz, R., Wrobel, K.: A new signature similarity measure. In: 2009 World Congress on Nature and Biologically Inspired Computing (NaBIC), pp. 1022–1027. IEEE (2009)

    Google Scholar 

  21. Razavi-Far, R., Cheng, B., Saif, M., Ahmadi, M.: Similarity-learning information-fusion schemes for missing data imputation. Knowl. Based Sys. 187, 104805 (2020)

    Article  Google Scholar 

  22. Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biom. Biostat. 6(1), 1 (2015)

    Google Scholar 

  23. Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)

    Article  Google Scholar 

  24. Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4(1), 9 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomasz Orczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Orczyk, T., Doroz, R., Porwik, P. (2021). Missing Value Imputation Method Using Separate Features Nearest Neighbors Algorithm. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77967-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77966-5

  • Online ISBN: 978-3-030-77967-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics