Abstract
Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are called reducts. The intersection of all reducts is called core. However, RSFS handles discrete attributes only. To process datasets consisting of real attributes, they are discretized before applying RSFS. Discretization controls core of the discrete dataset. Moreover, core may critically affect the classification performance of reducts. This paper defines core-generating discretization, a type of discretization method; analyzes the properties of core-generating discretization; models core-generating discretization using constraint satisfaction; defines core-generating approximate minimum entropy (C-GAME) discretization; models C-GAME using constraint satisfaction and evaluates the performance of C-GAME as a pre-processor of RSFS using ten datasets from the UCI Machine Learning Repository.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Apt, K., Wallace, M.: Constraint Logic Programming using ECLiPSe. Cambridge University Press, Cambridge (2007)
Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wroblewski, J.: Rough set algorithms in classification problems. In: Polkowski, L., et al. (eds.) Rough Set Methods and Applications: New Developments in Kownledge Discovery in Information Systems, pp. 49–88. Physica-Verlag, Heidelberg (2000)
Chmielewski, M.R., Grzymala-Busse, J.W.: Global Discretization of Continuous Attributes as Preprocessing for Machine Learning. International Journal of Approximate Reasoning 15(4), 319–331 (1996)
Chai, D., Kuehlmann, A.: A Fast Pseudo-Boolean Constraint Solver. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24(3), 305–317 (2005)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: Proceedings of the Twelfth International Conference on Machine Learning, San Francisco, CA, pp. 194–202 (1995)
Fayyad, M.U., Irani, B.K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Fayyad, M.U.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning 8(1), 87–102 (1992)
Gama, J., Torgo, L., Soares, C.: Dynamic Discretization of Continuous Attributes. In: Proceedings of the Sixth Ibero-American Conference on AI, pp. 160–169 (1998)
Han, J., Hu, X., Lin, Y.T.: Feature Subset Selection Based on Relative Dependency between Attributes. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 176–185. Springer, Heidelberg (2004)
Hettich, S., Blake, L.C., Merz, J.C.: UCI Repository of machine learning databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Hentenryck, V.: Constraint Satisfaction in Logic Programming. MIT Press, Cambridge (1989)
Jiao, N., Miao, D.: An efficient gene selection algorithm based on tolerance rough set theory. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds.) RSFDGrC 2009. LNCS, vol. 5908, pp. 176–183. Springer, Heidelberg (2009)
Jensen, R., Shen, Q.: Tolerance-based and Fuzzy-rough Feature Selection. In: Proceedings of the 16th IEEE International Conference on Fuzzy Systems, pp. 877–882 (2007)
Jensen, R., Shen, Q.: Are More Features Better? A response to Attributes Reduction Using Fuzzy Rough Sets. IEEE Transactions on Fuzzy Systems 17(6), 1456–1458 (2009)
Jensen, R., Shen, Q.: New Approaches to Fuzzy-Rough Feature Selection. IEEE Transactions on Fuzzy Systems 17(4), 824–838 (2009)
Johnson, S.D.: Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences 9, 256–278 (1974)
Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches. IEEE Transactions On Knowledge and Data Engineering 16(12), 1457–1471 (2004)
Kohavi, R., Sahami, M.: Error-based and Entropy-based Discretization of Continous Features. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 114–119 (1996)
Liu, H., Hussain, F., Tan, L.C., Dash, M.: Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
Marriot, K., Stuckey, J.P.: Programming with Constraints: an Introduction. MIT Press, Cambridge (1998)
Marcus, S.: Tolerance rough sets, Cech topologies, learning processes. Bull. Polish Academy of Sciences, Technical Sciences 42(3), 471–487 (1994)
Nguyen, H.S., Skrowron, A.: Quantization of real values attributes, Rough set and boolean reasoning approach. In: Proceedings of the Second Joint Annual Conference on Information Sciences, Wrightsville Beach, North Carolina, pp. 34–37 (1995)
Nguyen, S.H., Nguyen, H.S.: Some efficient algorithms for rough set methods. In: Proceedings of the Conference of Information Processing and Management of Uncertainty in Knowledge-Based Systems IPMU 1996, Granada, Spain, pp. 1451–1456 (1996)
Nguyen, H.S.: Discretization Problem for Rough Sets Methods. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 545–552. Springer, Heidelberg (1998)
Nguyen, S.H.: Discretization of Real Value Attributes: Boolean Reasoning Approach. Ph.D. Thesis, Warsaw University, Warsaw, Poland (1997)
Parthalain, M.N., Jensen, R., Shen, Q., Zwiggelaar, R.: Fuzzy-rough approaches for mammographic risk analysis. Intelligent Data Analysis 14(2), 225–244 (2010)
Polkowski, L.: Rough Sets. Mathematical Foundations. Physica–Verlag, Heidelberg (2002)
Peters, J.F.: Tolerance near sets and image correspondence. Int. J. Bio-Inspired Computation 1(4), 239–245 (2009)
Peters, J.F.: Corrigenda, Addenda: Tolerance near sets and image correspondence. Int. J. Bio-Inspired Computation 2(5) (in press, 2010)
Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Slowinski, R., et al. (eds.) Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory, pp. 331–362. Kluwer Academic Publisher, Dordrecht (1992)
Shang, C., Shen, Q.: Rough Feature Selection for Neural Network Based Image Classification. International Journal of Image and Graphics 2(4), 541–555 (2002)
Shen, Q., Chouchoulas, A.: A rough-fuzzy approach for generating classification rules. Pattern Recognition 35(11), 341–354 (2002)
Swiniarski, W.R., Skowron, A.: Rough set methods in feature selection and recognition. Pattern Recognition Letters 24(6), 833–849 (2003)
Skowron, A., Stephaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27, 245–253 (1996)
Stepaniuk, J., Kretowski, M.: Decision systems based on tolerance rough sets. In: Proc. 4th Int. Workshop on Intelligent Information Systems, Augustow, Poland, pp. 62–73 (1995)
Tsang, E.P.K.: Foundations of Constraint Satisfaction. Academic Press Limited, London (1993)
Vinterbo, S., Ohrn, A.: Minimal approximate hitting sets and rule templates. International Journal of Approximate Reasoning 25(2), 123–143 (2000)
Zhong, N., Dong, J., Ohsuga, S.: Using Rough Sets with Heuristics for Feature Selection. Journal of Intelligent Information Systems 16(4), 199–214 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tian, D., Zeng, Xj., Keane, J. (2011). Core-Generating Discretization for Rough Set Feature Selection. In: Peters, J.F., Skowron, A., Chan, CC., Grzymala-Busse, J.W., Ziarko, W.P. (eds) Transactions on Rough Sets XIII. Lecture Notes in Computer Science, vol 6499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18302-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-18302-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18301-0
Online ISBN: 978-3-642-18302-7
eBook Packages: Computer ScienceComputer Science (R0)