Abstract
Missing data imputation is a key issue of data pre-processing in data mining field. Though there are many methods for missing value imputation, almost each of these imputation methods has its limitation and is designed for either numeric attributes or categorical attributes. This paper presents IMIC, a new missing value Imputation method for Mixed numeric and categorical attributes based on Incomplete data hierarchical clustering after the introduction of a new concept Incomplete Set Mixed Feature Vector (ISMFV). The effect of the new method is valuated through the comparison experiment using 3 real data sets from UCI.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Little, R., Rubin, D.: Statistical Analysis with Missing Data, 2nd edn. John Wiley and Sons, New York (2002)
Cao, Y H.: Study of the case of learning bayesian network from incomplete Data. In: 2009 International Conference on Information Management, Innovation Management and Industrial Engineering, Shenzhen, pp. 66–69 (2009)
Randolph, M.: A New Neural Network to Process Missing Data without Imputation. In: Seventh International Conference on Machine Learning and Applications, San Diego, pp. 756–762 (2008)
Wu, W.Z.: Attribute reduction based on evidence theory in incomplete decision systems. Information Sciences 178(5), 1355–1371 (2008)
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)
Little, R., Rubin, D.: The Analysis of Social Science Data with Missing Values. J. Sociological Methods and Research (18), 292–326 (1990)
Wang, G.: Rough set theory and Knowledge Acquisition. Xi’an Jiaotong University press, Xi’an (2001) (in Chinese)
Sam, E.: Nonparametric Regression with Predictors Missing at Random. Journal of the American Statistical Association 106(493), 306–319 (2011)
Li, J., Wang, Y., Stoica, P., Marzetta, T.L.: Nonparametric spectral analysis with missing data via the EM algorithm. In: Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, California, pp. 8–12 (2004)
Chad, C., Serge, B., Hayit, G., et al.: Image Segmentation Using Expectation Maximization and Its Application to Image Querying. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(8), 1026–1038 (2002)
Zhang, W., Liao, X.F., Wu, Z.F.: An incomplete data analysis approach based on rough set theory. Pattern Recognition and Artificial Intelligence 16(2), 158–163 (in Chinese)
Kryszkiewicz, M.: Rough Set Approach to Information Systems. J. Information Sciences 112, 39–49 (1998)
Kryszkiewicz, M.: Rules in Incomplete Information Systems. J. Information Sciences 113, 271–292 (1999)
KiYeol, K., ByoungJin, K., GwanSu, Y.: Reuse of Imputed Data in Microarray Analysis Increases Imputation Efficiency. J. BMC Bioinformatics 5(1), 160 (2004)
Wang, L., Dongmei, F.: Estimation of Missing Values Using a Weighted K-Nearest Neighbors Algorithm. In: International Conference on Environmental Science and Information Application Technology, pp. 660–663. ESIAT (2009)
Tao, Y., Jiawei, L., Yan, W., et al.: Missing value estimation for gene expression data based on Mahalanobis distance. J. Computer Application 25(12), 2868–2871 (2005) (in Chinese)
Hruschka, E.R., Hruschka Jr., E.R., Ebecken, N.F.F.: Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 513–525. Springer, Heidelberg (2004)
Jiawei, L., Tao,Y., Yan, W.: Missing Value Estimation For Microarray Data Based on Fuzzy C-Means Clustering. In: The 8th International Conference on High Performance Computing in Asia Pacific Region, Beijing, pp. 1–5 (2005)
Shgeyuki, O., Masaaki, S., Ichiro, T., et al.: A Bayesian missing value estimation method for gene expression profile data. J. Bioinformatics 19(16), 2088–2096 (2003)
Wang, L., Fu, D.M., Li, Q., et al.: Modelling method with missing values based on clustering and support vector regression. Journal of Systems Engineering and Electronics 21(1), 142–147 (2010)
Kim, H., Golubz, G.H., Park, H.: Missing value estimation for DNA microarray gene expression data: local least squares imputation. J. Bioinformatics 21(2), 187–198 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feng, X., Wu, S., Liu, Y. (2011). Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering. In: Xiong, H., Lee, W.B. (eds) Knowledge Science, Engineering and Management. KSEM 2011. Lecture Notes in Computer Science(), vol 7091. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25975-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-25975-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25974-6
Online ISBN: 978-3-642-25975-3
eBook Packages: Computer ScienceComputer Science (R0)