Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm

Fazayeli, Farideh; Wang, Lipo; Mandziuk, Jacek

doi:10.1007/978-3-540-88425-5_28

Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm

Farideh Fazayeli⁴,
Lipo Wang⁴ &
Jacek Mandziuk⁵

Conference paper

798 Accesses
27 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5306))

Abstract

We study the Rough Set theory as a method of feature selection based on tolerant classes that extends the existing equivalent classes. The determination of initial tolerant classes is a challenging and important task for accurate feature selection and classification. In this paper the Expectation-Maximization clustering algorithm is applied to determine similar objects. This method generates fewer features with either a higher or the same accuracy compared with two existing methods, i.e., Fuzzy Rough Feature Selection and Tolerance-based Feature Selection, on a number of benchmarks from the UCI repository.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar
Boulesteix, A.L.: PLS Dimension Reduction for Classification with Microarray Data. Statistical Applications in Genetics and Molecular Biology 3(1), Article 33 (2004)
Google Scholar
Bradley, P., Fayyad, U., Reina, C.: Scaling EM clustering to large databases. Technical report, Microsoft Research (1999)
Google Scholar
Chouchoulas, A., Shen, Q.: Rough Set-Aided Keyword Reduction for Text Categorisation. Applied Artificial Intelligence 15, 843–873 (2001)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.: Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of The Royal Statistical Society 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Doherty, P., Szalas, A.: On the Correspondence between Approximations and Similarity. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 143–152. Springer, Heidelberg (2004)
Chapter Google Scholar
Fu, X., Wang, L.: Data dimensionality reduction with application to simplifying rbf network structure and improving classification performance. IEEE Transactions on Systems, Man, and Cybernetics, Part B 33, 399–409 (2003)
Article Google Scholar
Greco, S., Inuiguchi, M., Slowinski, R.: Fuzzy rough sets and multiple-premise gradual decision rules. International Journal of Approximate Reasoning 41, 179–211 (2006)
Article MathSciNet MATH Google Scholar
Hancock, P., Burton, A., Bruce, V.: Face processing: Human perception and principal components analysis (1996)
Google Scholar
Jensen, R., Shen, Q.: Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches. IEEE Transactions on Knowledge and Data Engineering 16, 1457–1471 (2004)
Article Google Scholar
Jensen, R., Shen, Q.: Tolerance-based and Fuzzy-Rough Feature Selection. In: Proceedings of the 16th International Conference on Fuzzy Systems (FUZZ-IEEE 2007), pp. 877–882 (2007)
Google Scholar
Jensen, R., Shen, Q.: Fuzzy-Rough Sets Assisted Attribute Selection. IEEE Transactions on Fuzzy Systems 15, 73–89 (2007)
Article Google Scholar
Kambhatla, N., Leen, T.K.: Dimension Reduction by Local Principal Component Analysis. Neural Comp. 9, 1493–1516 (1997)
Article Google Scholar
Kim, D.: Data classification based on tolerant rough set. Pattern Recognition 34, 1613–1624 (2001)
Article MATH Google Scholar
Lai, Y., Wu, B., Chen, L., Zhao, H.: Statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics, 3146–3155 (2004)
Google Scholar
Ordonez, C., Cereghini, P.: SQLEM: Fast clustering in SQL using the EM algorithm. In: ACM SIGMOD Conference (2000)
Google Scholar
Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18, 546–554 (2002)
Article Google Scholar
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Article MathSciNet MATH Google Scholar
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Polkowski, L., Semeniuk-Polkowska, M.: On Rough Set Logics Based on Similarity Relations. Fundam. Inf. 64, 379–390 (2005)
MathSciNet MATH Google Scholar
Porrill, J., Stone, J.: Independent components analysis for signal separation and dimension reduction (1997)
Google Scholar
Radzikowska, A., Kerre, E.E.: Fuzzy rough sets based on residuated lattices. In: Peters, et al. (eds.), vol. 228, pp. 278–296.
Google Scholar
Radzikowska, A., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126, 137–155 (2002)
Article MathSciNet MATH Google Scholar
Roweis, S., Ghahramani, Z.: A unifying review of Linear Gaussian Models. Neural Computation (1999)
Google Scholar
Roy, A., Pal, S.K.: Fuzzy discretization of feature space for a rough set classifier. Pattern Recogn. Lett. 24, 895–902 (2003)
Article MATH Google Scholar
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27, 245–253 (1996)
MathSciNet MATH Google Scholar
Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P. (ed.) Advances in Machine Intelligence and Soft Computing, vol. 4, pp. 17–33. Duke University Press, Duke (1997)
Google Scholar
Slowinski, R., Vanderpooten, D.: A Generalized Definition of Rough Approximations Based on Similarity. IEEE Trans. on Knowl. and Data Eng. 12, 331–336 (2000)
Article Google Scholar
Torkkola, K.: Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research 3, 1415–1438 (2003)
MathSciNet MATH Google Scholar
Vakarelov, D.: A modal characterization of indiscernibility and similarity relations in Pawlaks information systems. In: Slezak, et al. (eds.), vol. 300, pp. 12–22 (plenary talk)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. CVPR 2, 251–258 (2004)
Google Scholar
Xu, L., Jordan, M.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation 7 (1995)
Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 856–863. AAAI Press, Menlo Park (2003)
Google Scholar
Yuille, A.L., Stolorz, P., Utans, J.: Statistical physics, mixtures of distributions and the EM algorithm. Neural Computation 6, 334–340 (1994)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore
Farideh Fazayeli & Lipo Wang
Faculty of Mathematics and Information Science, Warsaw University of Technology, Plac Politechniki 1, 00-661, Warsaw, Poland
Jacek Mandziuk

Authors

Farideh Fazayeli
View author publications
You can also search for this author in PubMed Google Scholar
Lipo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Mandziuk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Akron, OH 44325-4003, Akron, USA
Chien-Chung Chan
Department of Electrical Engineering and Computer Science, University of Kansas, KS 66045, Lawrence, USA
Jerzy W. Grzymala-Busse
Department of Computer Science, University of Regina,, S4S 0A2, Regina, Saskatchewan, Canada
Wojciech P. Ziarko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fazayeli, F., Wang, L., Mandziuk, J. (2008). Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm. In: Chan, CC., Grzymala-Busse, J.W., Ziarko, W.P. (eds) Rough Sets and Current Trends in Computing. RSCTC 2008. Lecture Notes in Computer Science(), vol 5306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88425-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-88425-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88423-1
Online ISBN: 978-3-540-88425-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics