Abstract
As a feature selection method, support vector machines-recursive feature elimination (SVM-RFE) can remove irrelevance features but don’t take redundant features into consideration. In this paper, it is shown why this method can’t remove redundant features and an improved technique is presented. Correlation coefficient is introduced to measure the redundancy in the selected subset with SVM-RFE. The features which have a great correlation coefficient with some important feature are removed. Experimental results show that there actually are several strongly redundant features in the selected subsets by SVM-RFE. The coefficients are high to 0.99. The proposed method can not only reduce the number of features, but also keep the classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dash, M., Liu, H.: Feature Selection for Classification. Intelligent Data Analysis 1, 131–156 (1997)
Kohavi, R., George, J.: Wrappers for Feature Subset Selection. Artificial Intelligence 97, 273–324 (1997)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hu, Q., Yu, D., Xie, Z.: Information Preserving Hybrid Data Reduction Based on Fuzzy Rough Techniques. Pattern Recognition Letters (in press)
Liu, H., Yu, L., Dash, M., Motoda, H.: Active Feature Selection Using Classes. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637. Springer, Heidelberg (2003)
Guyon, I., Matic, N., Vapnik, V.: Discovering Informative Patterns and Data Cleaning. Advances in Knowledge Discovery and Data Mining, 181–203 (1996)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning 46, 389–422 (2002)
Rakotomamonjy, A.: Variable Selection Using SVM-based Criteria. Journal of Machine Learning Research 3, 1357–1370 (2003)
Li, G., Yang, J., Liu, G., Li, X.: Feature Selection for Multi-class Problems Using Support Vector Machines. In: Zhang, C., W. Guesgen, H., Yeap, W.-K. (eds.) PRICAI 2004. LNCS (LNAI), vol. 3157, pp. 292–300. Springer, Heidelberg (2004)
Duan, K., Rajapakse, J.C., Wang, H., Francisco, A.: Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data. IEEE Transactions on Nanobioscience, 228–234 (2005)
Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Hsing, T., Liu, L., Brun, M., et al.: The Coefficient of Intrinsic Dependence (Feature Selection Using el CID). Pattern Recognition, 623–636 (2005)
Yao, K., Lu, W., Zhang, S., et al.: Feature Expansion and Feature Selection for General Pattern Recognition Problems. IEEE Int. Conf. Neural Networks and Signal Processing, 29–32 (2003)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Blake, C., Keogh, E., Merz, C.: UCI Repository of Machine Learning Databases. Technical Report, Department of Information and Computer Science, University of California, Irvine, CA (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xie, ZX., Hu, QH., Yu, DR. (2006). Improved Feature Selection Algorithm Based on SVM and Correlation. In: Wang, J., Yi, Z., Zurada, J.M., Lu, BL., Yin, H. (eds) Advances in Neural Networks - ISNN 2006. ISNN 2006. Lecture Notes in Computer Science, vol 3971. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11759966_204
Download citation
DOI: https://doi.org/10.1007/11759966_204
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34439-1
Online ISBN: 978-3-540-34440-7
eBook Packages: Computer ScienceComputer Science (R0)