Abstract
Missing data imputation is an actual and challenging issue in machine learning and data mining. This is because missing values in a dataset can generate bias that affects the quality of the learned patterns or the classification performances. To deal with this issue, this paper proposes a Grey-Based K-NN Iteration Imputation method, called GBKII, for imputing missing values. GBKII is an instance-based imputation method, which is referred to a non-parametric regression method in statistics. It is also efficient for handling with categorical attributes. We experimentally evaluate our approach and demonstrate that GBKII is much more efficient than the k-NN and mean-substitution methods.
This work is partially supported by Australian Research Council Discovery Projects (DP0449535, DP0559536 and DP0667060), a China NSF major research Program (60496327), China NSF grants (60463003, 10661003), an Overseas Outstanding Talent Research Program of Chinese Academy of Sciences (06S3011S01), an Overseas-Returning High-level Talent Research Program of China Hunan-Resource Ministry, and a Guangxi Postgraduate Educational Innovation Plan (2006106020812M35).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blake, C., Merz, C.: UCI Repository of machine learning databases (1998)
Caruana, R.: A Non-parametric EM-style algorithm for Imputing Missing Value. In: Artificial Intelligence and Statistics (January 2001)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B 39, 1–38 (1977)
Zhang, S.C., et al.: Optimized Parameters for Missing Data Imputation. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 1010–1016. Springer, Heidelberg (2006)
Huang, C.C., Lee, H.M.: An instance-based learning approach based on grey relational structure. In: Proc. of the UK Workshop on Computational Intelligence (UKCI-02), Birmingham (Sep. 2002)
Lakshminarayan, K., et al.: Imputation of missing data in industrial databases. Applied Intelligence 11, 259–275 (1999)
Brown, M.L.: Data mining and the impact of missing data. Industrial Management & Data Systems 103(8), 611–621 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Zhang, C., Zhu, X., Zhang, J., Qin, Y., Zhang, S. (2007). GBKII: An Imputation Method for Missing Values. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_122
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_122
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)