Abstract
Missing value is a common occurrence in the real-world dataset, and many methods have been proposed to solve it. Among those methods, KNN imputation attracts a lot of attention due to the simple realization, easy understanding, and relatively high accuracy. However, it ignores the influence of correlations between attributes on the similarity of records. In this paper, we take the correlations into consideration when selecting the nearest neighbors, and impute the incomplete records successively according to the number of missing values in each record. During the imputation, the correlation coefficients are calculated by the complete records and updated with the union of complete records and imputed records. Therefore, the correlations between attributes are more accurate with the improvement of data utilization, which makes the selected nearest neighbors more appropriate. Experimental results demonstrate that the improved method is more effective in missing value imputation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2014)
García-Laencina, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2009)
Tsai, C., Li, M., Lin, W.: A class center based approach for missing value imputation. Knowl.-Based Syst., 1–12 (2018)
Aydilek, I.B., Arslan, A.: A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf. Sci. (Ny) 233, 25–35 (2013)
Scholar, M.T.: Imputation of missing values using association rule mining & K-Mean clustering. IJSDR 1(8), 340–344 (2016)
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Batista, G.E.A.P.A., Monard, M.C.: An Analysis of four missing data treatment methods for supervised learning. Appl. Art. Intell., 17(5), 519–533 (2003)
Nguyen, D.V., Wang, N., Carroll, R.J.: Evaluation of missing value estimation for microarray data. J. Data Sci. 2(4), 24 (2004)
Johansson, P., Häkkinen, J.: Improving missing value imputation of microarray data by using spot quality weights. Bioinformatics 7(1), 1–10 (2006)
Zhang, S.: Nearest neighbor selection for iteratively kNN imputation. J. Syst. Softw. 85(11), 2541–2552 (2012)
Tutz, G., Ramzan, S.: Improved methods for the imputation of missing data by nearest neighbor methods. Comput. Stat. Data Anal. 90, 84–99 (2015)
Liu, Z., Liu, Y., Dezert, J., Pan, Q.: Classification of incomplete data based on belief functions and K -nearest neighbors. Knowl.-Based Syst. 89, 113–125 (2015)
Tabassian, M., Alessandrini, M., Jasaityte, R., et al.: Handling missing strain (rate) curves using K-nearest neighbor imputation. In: Ultrasonics Symposium. IEEE (2016)
Sun, B., Liyao, M., Wei, C., Wei, W., Prashant, G., Guohua, B.: An improved k-nearest neighbours method for traffic time series imputation. In: IEEE CAC, vol. 10 (2017)
Dheeru, D., Karra Taniskidou, E.: {UCI} Machine Learning Repository (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., Lai, X., Zhang, L. (2020). A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-29516-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29515-8
Online ISBN: 978-3-030-29516-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)