Skip to main content

Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories

  • Conference paper
Transactions on Rough Sets IV

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 3700))

Abstract

Missing data, commonly encountered in many fields of study, introduce inaccuracy in the analysis and evaluation. Previous methods used for handling missing data (e.g., deleting cases with incomplete information, or substituting the missing values with estimated mean scores), though simple to implement, are problematic because these methods may result in biased data models. Fortunately, recent advances in theoretical and computational statistics have led to more flexible techniques to deal with the missing data problem. In this paper, we present missing data imputation methods based on clustering, one of the most popular techniques in Knowledge Discovery in Databases (KDD). We combine clustering with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply fuzzy and rough clustering algorithms to deal with incomplete data. The experiments show that a hybridization of fuzzy set and rough set theories in missing data imputation algorithms leads to the best performance among our four algorithms, i.e., crisp K-means, fuzzy K-means, rough K-means, and rough-fuzzy K-means imputation algorithms.

This work was supported, in part, by a grant from NSF (EIA-0091530), a cooperative agreement with USADA FCIC/RMA (2IE08310228), and an NSF EPSCOR Grant (EPS-0091900).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)

    MATH  Google Scholar 

  2. Harms, S., Li, D., Deogun, J.S., Tadesse, T.: Efficient rule discovery in a geo-spatial desicion support system. In: Proceedings of the Second National Conference on Digital Government, pp. 235–241 (2002)

    Google Scholar 

  3. Li, D., Deogun, J.S.: Spatio-temporal association mining for un-sampled sites. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 478–485. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Li, D., Deogun, J., Harms, S.: Interpolation techniques for geo-spatial association rule mining. In: Proceedings of the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 573–580 (2003)

    Google Scholar 

  5. Li, D., Deogun, J.S.: Interpolation models for spatio-temporal association mining. Fundamenta Informaticae 59, 153–172 (2004)

    MATH  MathSciNet  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  7. Gary, K., Honaker, J., Joseph, A., Scheve, K.: Listwise deletion is evil: What to do about missing data in political science (2000), http://GKing.Harvard.edu

  8. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, the third IEEE International Conference on Data Mining, Melbourne, FL, pp. 56–63 (2003)

    Google Scholar 

  9. Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)

    Article  Google Scholar 

  10. Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering 27, 999–1013 (2001)

    Article  Google Scholar 

  11. Roth, P.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47, 537–560 (1994)

    Article  Google Scholar 

  12. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC (1997)

    Google Scholar 

  13. Weiss, S.M., Indurkhya, N.: Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp. 1–10 (2000)

    Google Scholar 

  14. Fujikawa, Y., Ho, T.: Cluster-based algorithms for dealing with missing values. In: Proceedings of Advances in Knowledge Discovery and Data Mining, 6th Pacific-Asia Conference (PAKDD), pp. 535–548 (2002)

    Google Scholar 

  15. Hartigan, J., Wong, M.: Algorithm AS136: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  16. Zadeh, L.: Fuzzy sets. Information and Control 8, 338–353 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  17. Li, D., Deogun, J.S., Spaulding, W., Shuart, B.: Towards missing data imputation: A study of fuzzy K-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 573–579. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Yager, R.R.: Using fuzzy methods to model nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics, Part B 32, 512–525 (2002)

    Article  Google Scholar 

  19. Akleman, E., Chen, J.: Generalized distance functions. In: Proceedings of the 1999 International Conference on Shape Modeling, pp. 72–79 (1999)

    Google Scholar 

  20. Joshi, A., Krishnapuram, R.: Robust fuzzy clustering methods to support web mining. In: Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD, pp. 15-1 – 15-8 (1998)

    Google Scholar 

  21. Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9, 595–607 (2001)

    Article  Google Scholar 

  22. Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  23. Peters, J.F., Borkowski, M.: K-means indiscernibility over pixels. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 580–585. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  24. Lingras, P., Yan, R., West, C.: Comparison of conventional and rough k-means clustering. In: Proc. of the 9th Intl. Conf. on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, pp. 130–137 (2003)

    Google Scholar 

  25. Asharaf, S., Murty, M.N.: An adaptive rough fuzzy single pass algorithm for clustering large data sets. Pattern Recognition 36, 3015–3018 (2003)

    Article  MATH  Google Scholar 

  26. Banerjee, M., Mitra, S., Pal, S.K.: Rough fuzzy mlp: Knowledge encoding and classification. IEEE Trans. Neural Networks 9, 1203–1216 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, D., Deogun, J., Spaulding, W., Shuart, B. (2005). Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories. In: Peters, J.F., Skowron, A. (eds) Transactions on Rough Sets IV. Lecture Notes in Computer Science, vol 3700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11574798_3

Download citation

  • DOI: https://doi.org/10.1007/11574798_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29830-4

  • Online ISBN: 978-3-540-32016-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics