Skip to main content

Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7091))

Abstract

Missing data imputation is a key issue of data pre-processing in data mining field. Though there are many methods for missing value imputation, almost each of these imputation methods has its limitation and is designed for either numeric attributes or categorical attributes. This paper presents IMIC, a new missing value Imputation method for Mixed numeric and categorical attributes based on Incomplete data hierarchical clustering after the introduction of a new concept Incomplete Set Mixed Feature Vector (ISMFV). The effect of the new method is valuated through the comparison experiment using 3 real data sets from UCI.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Little, R., Rubin, D.: Statistical Analysis with Missing Data, 2nd edn. John Wiley and Sons, New York (2002)

    MATH  Google Scholar 

  2. Cao, Y H.: Study of the case of learning bayesian network from incomplete Data. In: 2009 International Conference on Information Management, Innovation Management and Industrial Engineering, Shenzhen, pp. 66–69 (2009)

    Google Scholar 

  3. Randolph, M.: A New Neural Network to Process Missing Data without Imputation. In: Seventh International Conference on Machine Learning and Applications, San Diego, pp. 756–762 (2008)

    Google Scholar 

  4. Wu, W.Z.: Attribute reduction based on evidence theory in incomplete decision systems. Information Sciences 178(5), 1355–1371 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  6. Little, R., Rubin, D.: The Analysis of Social Science Data with Missing Values. J. Sociological Methods and Research (18), 292–326 (1990)

    Google Scholar 

  7. Wang, G.: Rough set theory and Knowledge Acquisition. Xi’an Jiaotong University press, Xi’an (2001) (in Chinese)

    Google Scholar 

  8. Sam, E.: Nonparametric Regression with Predictors Missing at Random. Journal of the American Statistical Association 106(493), 306–319 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Li, J., Wang, Y., Stoica, P., Marzetta, T.L.: Nonparametric spectral analysis with missing data via the EM algorithm. In: Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, California, pp. 8–12 (2004)

    Google Scholar 

  10. Chad, C., Serge, B., Hayit, G., et al.: Image Segmentation Using Expectation Maximization and Its Application to Image Querying. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(8), 1026–1038 (2002)

    Article  Google Scholar 

  11. Zhang, W., Liao, X.F., Wu, Z.F.: An incomplete data analysis approach based on rough set theory. Pattern Recognition and Artificial Intelligence 16(2), 158–163 (in Chinese)

    Google Scholar 

  12. Kryszkiewicz, M.: Rough Set Approach to Information Systems. J. Information Sciences 112, 39–49 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  13. Kryszkiewicz, M.: Rules in Incomplete Information Systems. J. Information Sciences 113, 271–292 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  14. KiYeol, K., ByoungJin, K., GwanSu, Y.: Reuse of Imputed Data in Microarray Analysis Increases Imputation Efficiency. J. BMC Bioinformatics 5(1), 160 (2004)

    Article  Google Scholar 

  15. Wang, L., Dongmei, F.: Estimation of Missing Values Using a Weighted K-Nearest Neighbors Algorithm. In: International Conference on Environmental Science and Information Application Technology, pp. 660–663. ESIAT (2009)

    Google Scholar 

  16. Tao, Y., Jiawei, L., Yan, W., et al.: Missing value estimation for gene expression data based on Mahalanobis distance. J. Computer Application 25(12), 2868–2871 (2005) (in Chinese)

    Google Scholar 

  17. Hruschka, E.R., Hruschka Jr., E.R., Ebecken, N.F.F.: Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 513–525. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Jiawei, L., Tao,Y., Yan, W.: Missing Value Estimation For Microarray Data Based on Fuzzy C-Means Clustering. In: The 8th International Conference on High Performance Computing in Asia Pacific Region, Beijing, pp. 1–5 (2005)

    Google Scholar 

  19. Shgeyuki, O., Masaaki, S., Ichiro, T., et al.: A Bayesian missing value estimation method for gene expression profile data. J. Bioinformatics 19(16), 2088–2096 (2003)

    Article  Google Scholar 

  20. Wang, L., Fu, D.M., Li, Q., et al.: Modelling method with missing values based on clustering and support vector regression. Journal of Systems Engineering and Electronics 21(1), 142–147 (2010)

    Article  Google Scholar 

  21. Kim, H., Golubz, G.H., Park, H.: Missing value estimation for DNA microarray gene expression data: local least squares imputation. J. Bioinformatics 21(2), 187–198 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feng, X., Wu, S., Liu, Y. (2011). Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering. In: Xiong, H., Lee, W.B. (eds) Knowledge Science, Engineering and Management. KSEM 2011. Lecture Notes in Computer Science(), vol 7091. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25975-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25975-3_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25974-6

  • Online ISBN: 978-3-642-25975-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics