Abstract
The use of SVM (Support Vector Machines) in detecting e-mail as spam or nonspam by incorporating feature selection using GA (Genetic Algorithm) is investigated. An GA approach is adopted to select features that are most favorable to SVM classifier, which is named as GA-SVM. Scaling factor is exploited to measure the relevant coefficients of feature to the classification task and is estimated by GA. Heavy-bias operator is introduced in GA to promote sparse in the scaling factors of features. So, feature selection is performed by eliminating irrelevant features whose scaling factor is zero. The experiment results on UCI Spam database show that comparing with original SVM classifier, the number of support vector decreases while better classification results are achieved based on GA-SVM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cohen, W.W.: Learning rules that classify e-mail. In: Proc. 1996 AAAI Spring Symp. Inform. Access. (1996)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: AAAI 1998 Wkshp. Learning for Text Categorization, Madison, WI, July 27 (1998)
Drucker, H., et al.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Cortes, C., Vapnik, V.: Support -vector networks. Machine Learning (20), 273–297 (1995)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
Guyon, I., Elissee, A.: An introduction to variable and feature selection. Journal of Machine Learning Research (3), 1157–1182 (2003)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for support vector machines. In: Neural Information Processing Systems. MIT Press, Cambridge (2001)
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning (46), 131–159 (2002)
Rakotomamonjy, A.: Variable selection using SVM-based criteria. Journal of Machine Learning Research (3), 1357–1370 (2003)
Krishnapuram, B., Hartemink, A.J., Carin, L., Figueiredo, M.A.T.: A bayesian approach to joint feature selection and classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence (26), 9: 1105–1111 (2004)
Grandvalet, Y., Canu, S.: Adaptive scaling for feature selection in SVMs. In: Neural Information Processing Systems, vol. 15 (2002)
Srinivas, M., Patnaik, L.: Genetic algorithms: a survey. IEEE Comput. 6(27), 17–26 (1994)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Hb., Yu, Y., Liu, Z. (2005). SVM Classifier Incorporating Feature Selection Using GA for Spam Detection. In: Yang, L.T., Amamiya, M., Liu, Z., Guo, M., Rammig, F.J. (eds) Embedded and Ubiquitous Computing – EUC 2005. EUC 2005. Lecture Notes in Computer Science, vol 3824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596356_113
Download citation
DOI: https://doi.org/10.1007/11596356_113
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30807-2
Online ISBN: 978-3-540-32295-5
eBook Packages: Computer ScienceComputer Science (R0)