Skip to main content

A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1037))

Abstract

Missing value is a common occurrence in the real-world dataset, and many methods have been proposed to solve it. Among those methods, KNN imputation attracts a lot of attention due to the simple realization, easy understanding, and relatively high accuracy. However, it ignores the influence of correlations between attributes on the similarity of records. In this paper, we take the correlations into consideration when selecting the nearest neighbors, and impute the incomplete records successively according to the number of missing values in each record. During the imputation, the correlation coefficients are calculated by the complete records and updated with the union of complete records and imputed records. Therefore, the correlations between attributes are more accurate with the improvement of data utilization, which makes the selected nearest neighbors more appropriate. Experimental results demonstrate that the improved method is more effective in missing value imputation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2014)

    MATH  Google Scholar 

  2. García-Laencina, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2009)

    Article  Google Scholar 

  3. Tsai, C., Li, M., Lin, W.: A class center based approach for missing value imputation. Knowl.-Based Syst., 1–12 (2018)

    Google Scholar 

  4. Aydilek, I.B., Arslan, A.: A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf. Sci. (Ny) 233, 25–35 (2013)

    Article  Google Scholar 

  5. Scholar, M.T.: Imputation of missing values using association rule mining & K-Mean clustering. IJSDR 1(8), 340–344 (2016)

    Google Scholar 

  6. Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)

    Article  Google Scholar 

  7. Batista, G.E.A.P.A., Monard, M.C.: An Analysis of four missing data treatment methods for supervised learning. Appl. Art. Intell., 17(5), 519–533 (2003)

    Article  Google Scholar 

  8. Nguyen, D.V., Wang, N., Carroll, R.J.: Evaluation of missing value estimation for microarray data. J. Data Sci. 2(4), 24 (2004)

    Google Scholar 

  9. Johansson, P., Häkkinen, J.: Improving missing value imputation of microarray data by using spot quality weights. Bioinformatics 7(1), 1–10 (2006)

    Google Scholar 

  10. Zhang, S.: Nearest neighbor selection for iteratively kNN imputation. J. Syst. Softw. 85(11), 2541–2552 (2012)

    Article  Google Scholar 

  11. Tutz, G., Ramzan, S.: Improved methods for the imputation of missing data by nearest neighbor methods. Comput. Stat. Data Anal. 90, 84–99 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  12. Liu, Z., Liu, Y., Dezert, J., Pan, Q.: Classification of incomplete data based on belief functions and K -nearest neighbors. Knowl.-Based Syst. 89, 113–125 (2015)

    Article  Google Scholar 

  13. Tabassian, M., Alessandrini, M., Jasaityte, R., et al.: Handling missing strain (rate) curves using K-nearest neighbor imputation. In: Ultrasonics Symposium. IEEE (2016)

    Google Scholar 

  14. Sun, B., Liyao, M., Wei, C., Wei, W., Prashant, G., Guohua, B.: An improved k-nearest neighbours method for traffic time series imputation. In: IEEE CAC, vol. 10 (2017)

    Google Scholar 

  15. Dheeru, D., Karra Taniskidou, E.: {UCI} Machine Learning Repository (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liyong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Lai, X., Zhang, L. (2020). A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_38

Download citation

Publish with us

Policies and ethics