Abstract
Rough sets are widely used in feature subset selection and attribute reduction. In most of the existing algorithms, the dependency function is employed to evaluate the quality of a feature subset. The disadvantages of using dependency are discussed in this paper. And the problem of forward greedy search algorithm based on dependency is presented. We introduce the consistency measure to deal with the problems. The relationship between dependency and consistency is analyzed. It is shown that consistency measure can reflects not only the size of decision positive region, like dependency, but also the sample distribution in the boundary region. Therefore it can more finely describe the distinguishing power of an attribute set. Based on consistency, we redefine the redundancy and reduct of a decision system. We construct a forward greedy search algorithm to find reducts based on consistency. What’s more, we employ cross validation to test the selected features, and reduce the overfitting features in a reduct. The experimental results with UCI data show that the proposed algorithm is effective and efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bhatt, R.B., Gopal, M.: On fuzzy-rough sets approach to feature selection. Pattern Recognition Letters 26, 965–975 (2005)
Breiman, L., et al.: Classification and regression trees. Wadsworth International, Belmont (1984)
Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151, 155–176 (2003)
Guyon, I., Weston, J., Barnhill, S., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hu, Q.H., Li, X.D., Yu, D.R.: Analysis on Classification Performance of Rough Set Based Reducts. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 423–433. Springer, Heidelberg (2006)
Hu, Q.H., Yu, D.R., Xie, Z.X.: Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recognition Letters 27, 414–423 (2006)
Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches. IEEE transactions of knowledge and data engineering 16, 1457–1471 (2004)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on knowledge and data engineering 17, 491–502 (2005)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Slowinski, R. (ed.) Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, pp. 331–362 (1991)
Slezak, D.: Approximate decision reducts. Ph.D. Thesis, Warsaw University (2001)
Ślezak, D.: Approximate Entropy Reducts. Fundamenta Informaticae 53, 365–390 (2002)
Swiniarski, R.W., Skowron, A.: Rough set methods in feature selection and recognition. Pattern recognition letters 24, 833–849 (2003)
Xie, Z.X., Hu, Q.H., Yu, D.R.: Improved feature selection algorithm based on SVM and correlation. In: Wang, J., et al. (eds.) ISNN 2006. LNCS, vol. 3971, pp. 1373–1380. Springer, Heidelberg (2006)
Zhong, N., Dong, J., Ohsuga, S.: Using rough sets with heuristics for feature selection. J. Intelligent Information Systems 16, 199–214 (2001)
Ziarko, W.: Variable precision rough sets model. Journal of Computer and System Sciences 46, 39–59 (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Hu, Q., Zhao, H., Xie, Z., Yu, D. (2007). Consistency Based Attribute Reduction. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)