Abstract
We study the Rough Set theory as a method of feature selection based on tolerant classes that extends the existing equivalent classes. The determination of initial tolerant classes is a challenging and important task for accurate feature selection and classification. In this paper the Expectation-Maximization clustering algorithm is applied to determine similar objects. This method generates fewer features with either a higher or the same accuracy compared with two existing methods, i.e., Fuzzy Rough Feature Selection and Tolerance-based Feature Selection, on a number of benchmarks from the UCI repository.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Boulesteix, A.L.: PLS Dimension Reduction for Classification with Microarray Data. Statistical Applications in Genetics and Molecular Biology 3(1), Article 33 (2004)
Bradley, P., Fayyad, U., Reina, C.: Scaling EM clustering to large databases. Technical report, Microsoft Research (1999)
Chouchoulas, A., Shen, Q.: Rough Set-Aided Keyword Reduction for Text Categorisation. Applied Artificial Intelligence 15, 843–873 (2001)
Dempster, A.P., Laird, N.M., Rubin, D.: Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of The Royal Statistical Society 39, 1–38 (1977)
Doherty, P., Szalas, A.: On the Correspondence between Approximations and Similarity. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 143–152. Springer, Heidelberg (2004)
Fu, X., Wang, L.: Data dimensionality reduction with application to simplifying rbf network structure and improving classification performance. IEEE Transactions on Systems, Man, and Cybernetics, Part B 33, 399–409 (2003)
Greco, S., Inuiguchi, M., Slowinski, R.: Fuzzy rough sets and multiple-premise gradual decision rules. International Journal of Approximate Reasoning 41, 179–211 (2006)
Hancock, P., Burton, A., Bruce, V.: Face processing: Human perception and principal components analysis (1996)
Jensen, R., Shen, Q.: Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches. IEEE Transactions on Knowledge and Data Engineering 16, 1457–1471 (2004)
Jensen, R., Shen, Q.: Tolerance-based and Fuzzy-Rough Feature Selection. In: Proceedings of the 16th International Conference on Fuzzy Systems (FUZZ-IEEE 2007), pp. 877–882 (2007)
Jensen, R., Shen, Q.: Fuzzy-Rough Sets Assisted Attribute Selection. IEEE Transactions on Fuzzy Systems 15, 73–89 (2007)
Kambhatla, N., Leen, T.K.: Dimension Reduction by Local Principal Component Analysis. Neural Comp. 9, 1493–1516 (1997)
Kim, D.: Data classification based on tolerant rough set. Pattern Recognition 34, 1613–1624 (2001)
Lai, Y., Wu, B., Chen, L., Zhao, H.: Statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics, 3146–3155 (2004)
Ordonez, C., Cereghini, P.: SQLEM: Fast clustering in SQL using the EM algorithm. In: ACM SIGMOD Conference (2000)
Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18, 546–554 (2002)
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Polkowski, L., Semeniuk-Polkowska, M.: On Rough Set Logics Based on Similarity Relations. Fundam. Inf. 64, 379–390 (2005)
Porrill, J., Stone, J.: Independent components analysis for signal separation and dimension reduction (1997)
Radzikowska, A., Kerre, E.E.: Fuzzy rough sets based on residuated lattices. In: Peters, et al. (eds.), vol. 228, pp. 278–296.
Radzikowska, A., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126, 137–155 (2002)
Roweis, S., Ghahramani, Z.: A unifying review of Linear Gaussian Models. Neural Computation (1999)
Roy, A., Pal, S.K.: Fuzzy discretization of feature space for a rough set classifier. Pattern Recogn. Lett. 24, 895–902 (2003)
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27, 245–253 (1996)
Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P. (ed.) Advances in Machine Intelligence and Soft Computing, vol. 4, pp. 17–33. Duke University Press, Duke (1997)
Slowinski, R., Vanderpooten, D.: A Generalized Definition of Rough Approximations Based on Similarity. IEEE Trans. on Knowl. and Data Eng. 12, 331–336 (2000)
Torkkola, K.: Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research 3, 1415–1438 (2003)
Vakarelov, D.: A modal characterization of indiscernibility and similarity relations in Pawlaks information systems. In: Slezak, et al. (eds.), vol. 300, pp. 12–22 (plenary talk)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. CVPR 2, 251–258 (2004)
Xu, L., Jordan, M.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation 7 (1995)
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 856–863. AAAI Press, Menlo Park (2003)
Yuille, A.L., Stolorz, P., Utans, J.: Statistical physics, mixtures of distributions and the EM algorithm. Neural Computation 6, 334–340 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fazayeli, F., Wang, L., Mandziuk, J. (2008). Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm. In: Chan, CC., Grzymala-Busse, J.W., Ziarko, W.P. (eds) Rough Sets and Current Trends in Computing. RSCTC 2008. Lecture Notes in Computer Science(), vol 5306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88425-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-88425-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88423-1
Online ISBN: 978-3-540-88425-5
eBook Packages: Computer ScienceComputer Science (R0)