A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors

Liu, Xin; Lai, Xiaochen; Zhang, Liyong

doi:10.1007/978-3-030-29516-5_38

A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors

Xin Liu¹⁷,
Xiaochen Lai^17,19 &
Liyong Zhang¹⁸

Conference paper
First Online: 24 August 2019

1794 Accesses
3 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1037))

Abstract

Missing value is a common occurrence in the real-world dataset, and many methods have been proposed to solve it. Among those methods, KNN imputation attracts a lot of attention due to the simple realization, easy understanding, and relatively high accuracy. However, it ignores the influence of correlations between attributes on the similarity of records. In this paper, we take the correlations into consideration when selecting the nearest neighbors, and impute the incomplete records successively according to the number of missing values in each record. During the imputation, the correlation coefficients are calculated by the complete records and updated with the union of complete records and imputed records. Therefore, the correlations between attributes are more accurate with the improvement of data utilization, which makes the selected nearest neighbors more appropriate. Experimental results demonstrate that the improved method is more effective in missing value imputation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2014)
MATH Google Scholar
García-Laencina, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2009)
Article Google Scholar
Tsai, C., Li, M., Lin, W.: A class center based approach for missing value imputation. Knowl.-Based Syst., 1–12 (2018)
Google Scholar
Aydilek, I.B., Arslan, A.: A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf. Sci. (Ny) 233, 25–35 (2013)
Article Google Scholar
Scholar, M.T.: Imputation of missing values using association rule mining & K-Mean clustering. IJSDR 1(8), 340–344 (2016)
Google Scholar
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Batista, G.E.A.P.A., Monard, M.C.: An Analysis of four missing data treatment methods for supervised learning. Appl. Art. Intell., 17(5), 519–533 (2003)
Article Google Scholar
Nguyen, D.V., Wang, N., Carroll, R.J.: Evaluation of missing value estimation for microarray data. J. Data Sci. 2(4), 24 (2004)
Google Scholar
Johansson, P., Häkkinen, J.: Improving missing value imputation of microarray data by using spot quality weights. Bioinformatics 7(1), 1–10 (2006)
Google Scholar
Zhang, S.: Nearest neighbor selection for iteratively kNN imputation. J. Syst. Softw. 85(11), 2541–2552 (2012)
Article Google Scholar
Tutz, G., Ramzan, S.: Improved methods for the imputation of missing data by nearest neighbor methods. Comput. Stat. Data Anal. 90, 84–99 (2015)
Article MathSciNet MATH Google Scholar
Liu, Z., Liu, Y., Dezert, J., Pan, Q.: Classification of incomplete data based on belief functions and K -nearest neighbors. Knowl.-Based Syst. 89, 113–125 (2015)
Article Google Scholar
Tabassian, M., Alessandrini, M., Jasaityte, R., et al.: Handling missing strain (rate) curves using K-nearest neighbor imputation. In: Ultrasonics Symposium. IEEE (2016)
Google Scholar
Sun, B., Liyao, M., Wei, C., Wei, W., Prashant, G., Guohua, B.: An improved k-nearest neighbours method for traffic time series imputation. In: IEEE CAC, vol. 10 (2017)
Google Scholar
Dheeru, D., Karra Taniskidou, E.: {UCI} Machine Learning Repository (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, Dalian, 116600, China
Xin Liu & Xiaochen Lai
School of Control Science and Engineering, Dalian University of Technology, Dalian, 116024, China
Liyong Zhang
Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, 116620, China
Xiaochen Lai

Authors

Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Lai
View author publications
You can also search for this author in PubMed Google Scholar
Liyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liyong Zhang .

Editor information

Editors and Affiliations

School of Computing, Computer Science Research Institute, Ulster University, Newtownabbey, UK
Yaxin Bi
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Lai, X., Zhang, L. (2020). A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-29516-5_38
Published: 24 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29515-8
Online ISBN: 978-3-030-29516-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics