Abstract
Missing data imputation is an actual and challenging issue in machine learning and data mining. This is because missing values in a dataset can generate bias that affects the quality of the learned patterns or the classification performances. To deal with this issue, this paper proposes a Grey-Based K-NN Iteration Imputation method, called GBKII, for imputing missing values. GBKII is an instance-based imputation method, which is referred to a non-parametric regression method in statistics. It is also efficient for handling with categorical attributes. We experimentally evaluate our approach and demonstrate that GBKII is much more efficient than the k-NN and mean-substitution methods.
This work is partially supported by Australian Research Council Discovery Projects (DP0449535, DP0559536 and DP0667060), a China NSF major research Program (60496327), China NSF grants (60463003, 10661003), an Overseas Outstanding Talent Research Program of Chinese Academy of Sciences (06S3011S01), an Overseas-Returning High-level Talent Research Program of China Hunan-Resource Ministry, and a Guangxi Postgraduate Educational Innovation Plan (2006106020812M35).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blake, C., Merz, C.: UCI Repository of machine learning databases (1998)
Caruana, R.: A Non-parametric EM-style algorithm for Imputing Missing Value. In: Artificial Intelligence and Statistics (January 2001)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B 39, 1–38 (1977)
Zhang, S.C., et al.: Optimized Parameters for Missing Data Imputation. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 1010–1016. Springer, Heidelberg (2006)
Huang, C.C., Lee, H.M.: An instance-based learning approach based on grey relational structure. In: Proc. of the UK Workshop on Computational Intelligence (UKCI-02), Birmingham (Sep. 2002)
Lakshminarayan, K., et al.: Imputation of missing data in industrial databases. Applied Intelligence 11, 259–275 (1999)
Brown, M.L.: Data mining and the impact of missing data. Industrial Management & Data Systems 103(8), 611–621 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Zhang, C., Zhu, X., Zhang, J., Qin, Y., Zhang, S. (2007). GBKII: An Imputation Method for Missing Values. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_122
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_122
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)