Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering

Feng, Xiaodong; Wu, Sen; Liu, Yanchi

doi:10.1007/978-3-642-25975-3_37

Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering

Xiaodong Feng²¹,
Sen Wu²¹ &
Yanchi Liu²¹

Conference paper

1547 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7091))

Abstract

Missing data imputation is a key issue of data pre-processing in data mining field. Though there are many methods for missing value imputation, almost each of these imputation methods has its limitation and is designed for either numeric attributes or categorical attributes. This paper presents IMIC, a new missing value Imputation method for Mixed numeric and categorical attributes based on Incomplete data hierarchical clustering after the introduction of a new concept Incomplete Set Mixed Feature Vector (ISMFV). The effect of the new method is valuated through the comparison experiment using 3 real data sets from UCI.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Little, R., Rubin, D.: Statistical Analysis with Missing Data, 2nd edn. John Wiley and Sons, New York (2002)
MATH Google Scholar
Cao, Y H.: Study of the case of learning bayesian network from incomplete Data. In: 2009 International Conference on Information Management, Innovation Management and Industrial Engineering, Shenzhen, pp. 66–69 (2009)
Google Scholar
Randolph, M.: A New Neural Network to Process Missing Data without Imputation. In: Seventh International Conference on Machine Learning and Applications, San Diego, pp. 756–762 (2008)
Google Scholar
Wu, W.Z.: Attribute reduction based on evidence theory in incomplete decision systems. Information Sciences 178(5), 1355–1371 (2008)
Article MathSciNet MATH Google Scholar
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)
Article MathSciNet MATH Google Scholar
Little, R., Rubin, D.: The Analysis of Social Science Data with Missing Values. J. Sociological Methods and Research (18), 292–326 (1990)
Google Scholar
Wang, G.: Rough set theory and Knowledge Acquisition. Xi’an Jiaotong University press, Xi’an (2001) (in Chinese)
Google Scholar
Sam, E.: Nonparametric Regression with Predictors Missing at Random. Journal of the American Statistical Association 106(493), 306–319 (2011)
Article MathSciNet MATH Google Scholar
Li, J., Wang, Y., Stoica, P., Marzetta, T.L.: Nonparametric spectral analysis with missing data via the EM algorithm. In: Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, California, pp. 8–12 (2004)
Google Scholar
Chad, C., Serge, B., Hayit, G., et al.: Image Segmentation Using Expectation Maximization and Its Application to Image Querying. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(8), 1026–1038 (2002)
Article Google Scholar
Zhang, W., Liao, X.F., Wu, Z.F.: An incomplete data analysis approach based on rough set theory. Pattern Recognition and Artificial Intelligence 16(2), 158–163 (in Chinese)
Google Scholar
Kryszkiewicz, M.: Rough Set Approach to Information Systems. J. Information Sciences 112, 39–49 (1998)
Article MathSciNet MATH Google Scholar
Kryszkiewicz, M.: Rules in Incomplete Information Systems. J. Information Sciences 113, 271–292 (1999)
Article MathSciNet MATH Google Scholar
KiYeol, K., ByoungJin, K., GwanSu, Y.: Reuse of Imputed Data in Microarray Analysis Increases Imputation Efficiency. J. BMC Bioinformatics 5(1), 160 (2004)
Article Google Scholar
Wang, L., Dongmei, F.: Estimation of Missing Values Using a Weighted K-Nearest Neighbors Algorithm. In: International Conference on Environmental Science and Information Application Technology, pp. 660–663. ESIAT (2009)
Google Scholar
Tao, Y., Jiawei, L., Yan, W., et al.: Missing value estimation for gene expression data based on Mahalanobis distance. J. Computer Application 25(12), 2868–2871 (2005) (in Chinese)
Google Scholar
Hruschka, E.R., Hruschka Jr., E.R., Ebecken, N.F.F.: Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 513–525. Springer, Heidelberg (2004)
Chapter Google Scholar
Jiawei, L., Tao,Y., Yan, W.: Missing Value Estimation For Microarray Data Based on Fuzzy C-Means Clustering. In: The 8th International Conference on High Performance Computing in Asia Pacific Region, Beijing, pp. 1–5 (2005)
Google Scholar
Shgeyuki, O., Masaaki, S., Ichiro, T., et al.: A Bayesian missing value estimation method for gene expression profile data. J. Bioinformatics 19(16), 2088–2096 (2003)
Article Google Scholar
Wang, L., Fu, D.M., Li, Q., et al.: Modelling method with missing values based on clustering and support vector regression. Journal of Systems Engineering and Electronics 21(1), 142–147 (2010)
Article Google Scholar
Kim, H., Golubz, G.H., Park, H.: Missing value estimation for DNA microarray gene expression data: local least squares imputation. J. Bioinformatics 21(2), 187–198 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Economics and Management, University of Science and Technology Beijing, Beijing, 100083, P.R. China
Xiaodong Feng, Sen Wu & Yanchi Liu

Authors

Xiaodong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Sen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yanchi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Management Science and Information Systems Department, Rutgers, the State University of New Jersey, 1, Washington Park, 07102, Newark, NJ, USA
Hui Xiong
Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China
W. B. Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, X., Wu, S., Liu, Y. (2011). Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering. In: Xiong, H., Lee, W.B. (eds) Knowledge Science, Engineering and Management. KSEM 2011. Lecture Notes in Computer Science(), vol 7091. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25975-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-25975-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25974-6
Online ISBN: 978-3-642-25975-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics