Skip to main content

Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5306))

Abstract

We study the Rough Set theory as a method of feature selection based on tolerant classes that extends the existing equivalent classes. The determination of initial tolerant classes is a challenging and important task for accurate feature selection and classification. In this paper the Expectation-Maximization clustering algorithm is applied to determine similar objects. This method generates fewer features with either a higher or the same accuracy compared with two existing methods, i.e., Fuzzy Rough Feature Selection and Tolerance-based Feature Selection, on a number of benchmarks from the UCI repository.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

    Google Scholar 

  2. Boulesteix, A.L.: PLS Dimension Reduction for Classification with Microarray Data. Statistical Applications in Genetics and Molecular Biology 3(1), Article 33 (2004)

    Google Scholar 

  3. Bradley, P., Fayyad, U., Reina, C.: Scaling EM clustering to large databases. Technical report, Microsoft Research (1999)

    Google Scholar 

  4. Chouchoulas, A., Shen, Q.: Rough Set-Aided Keyword Reduction for Text Categorisation. Applied Artificial Intelligence 15, 843–873 (2001)

    Article  Google Scholar 

  5. Dempster, A.P., Laird, N.M., Rubin, D.: Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of The Royal Statistical Society 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  6. Doherty, P., Szalas, A.: On the Correspondence between Approximations and Similarity. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 143–152. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Fu, X., Wang, L.: Data dimensionality reduction with application to simplifying rbf network structure and improving classification performance. IEEE Transactions on Systems, Man, and Cybernetics, Part B 33, 399–409 (2003)

    Article  Google Scholar 

  8. Greco, S., Inuiguchi, M., Slowinski, R.: Fuzzy rough sets and multiple-premise gradual decision rules. International Journal of Approximate Reasoning 41, 179–211 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hancock, P., Burton, A., Bruce, V.: Face processing: Human perception and principal components analysis (1996)

    Google Scholar 

  10. Jensen, R., Shen, Q.: Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches. IEEE Transactions on Knowledge and Data Engineering 16, 1457–1471 (2004)

    Article  Google Scholar 

  11. Jensen, R., Shen, Q.: Tolerance-based and Fuzzy-Rough Feature Selection. In: Proceedings of the 16th International Conference on Fuzzy Systems (FUZZ-IEEE 2007), pp. 877–882 (2007)

    Google Scholar 

  12. Jensen, R., Shen, Q.: Fuzzy-Rough Sets Assisted Attribute Selection. IEEE Transactions on Fuzzy Systems 15, 73–89 (2007)

    Article  Google Scholar 

  13. Kambhatla, N., Leen, T.K.: Dimension Reduction by Local Principal Component Analysis. Neural Comp. 9, 1493–1516 (1997)

    Article  Google Scholar 

  14. Kim, D.: Data classification based on tolerant rough set. Pattern Recognition 34, 1613–1624 (2001)

    Article  MATH  Google Scholar 

  15. Lai, Y., Wu, B., Chen, L., Zhao, H.: Statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics, 3146–3155 (2004)

    Google Scholar 

  16. Ordonez, C., Cereghini, P.: SQLEM: Fast clustering in SQL using the EM algorithm. In: ACM SIGMOD Conference (2000)

    Google Scholar 

  17. Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18, 546–554 (2002)

    Article  Google Scholar 

  18. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  19. Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  20. Polkowski, L., Semeniuk-Polkowska, M.: On Rough Set Logics Based on Similarity Relations. Fundam. Inf. 64, 379–390 (2005)

    MathSciNet  MATH  Google Scholar 

  21. Porrill, J., Stone, J.: Independent components analysis for signal separation and dimension reduction (1997)

    Google Scholar 

  22. Radzikowska, A., Kerre, E.E.: Fuzzy rough sets based on residuated lattices. In: Peters, et al. (eds.), vol. 228, pp. 278–296.

    Google Scholar 

  23. Radzikowska, A., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126, 137–155 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. Roweis, S., Ghahramani, Z.: A unifying review of Linear Gaussian Models. Neural Computation (1999)

    Google Scholar 

  25. Roy, A., Pal, S.K.: Fuzzy discretization of feature space for a rough set classifier. Pattern Recogn. Lett. 24, 895–902 (2003)

    Article  MATH  Google Scholar 

  26. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27, 245–253 (1996)

    MathSciNet  MATH  Google Scholar 

  27. Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P. (ed.) Advances in Machine Intelligence and Soft Computing, vol. 4, pp. 17–33. Duke University Press, Duke (1997)

    Google Scholar 

  28. Slowinski, R., Vanderpooten, D.: A Generalized Definition of Rough Approximations Based on Similarity. IEEE Trans. on Knowl. and Data Eng. 12, 331–336 (2000)

    Article  Google Scholar 

  29. Torkkola, K.: Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research 3, 1415–1438 (2003)

    MathSciNet  MATH  Google Scholar 

  30. Vakarelov, D.: A modal characterization of indiscernibility and similarity relations in Pawlaks information systems. In: Slezak, et al. (eds.), vol. 300, pp. 12–22 (plenary talk)

    Google Scholar 

  31. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  32. Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. CVPR 2, 251–258 (2004)

    Google Scholar 

  33. Xu, L., Jordan, M.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation 7 (1995)

    Google Scholar 

  34. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 856–863. AAAI Press, Menlo Park (2003)

    Google Scholar 

  35. Yuille, A.L., Stolorz, P., Utans, J.: Statistical physics, mixtures of distributions and the EM algorithm. Neural Computation 6, 334–340 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fazayeli, F., Wang, L., Mandziuk, J. (2008). Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm. In: Chan, CC., Grzymala-Busse, J.W., Ziarko, W.P. (eds) Rough Sets and Current Trends in Computing. RSCTC 2008. Lecture Notes in Computer Science(), vol 5306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88425-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88425-5_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88423-1

  • Online ISBN: 978-3-540-88425-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics