Abstract
The difficulties of email spam detection system associated with high dimensionality in feature selection process and low accuracy of spam email classification. However, in machine learning, Feature selection (FS) as a global optimization problem decreases irrelevant and redundant data and creates a set of acceptable results with high accuracy. This paper presents a feature selection algorithm based on particle swarm optimization (PSO), which decreases dimensionality and improves the accuracy of spam email classification. PSO as a computational model fallows the social behavior of bird flocking or fish schooling. The proposed PSO-based feature selection algorithm searches the feature space for the best feature subsets. The evolution of feature selected is determined by a fitness function. The classifier performance and the length of selected feature vector as a classifier input are considered for performance evaluation using Ling-Spam and SpamAssassin databases. Experimental results show that the PSO-based feature selection algorithm was presented to generate excellent feature selection results with the minimal set of selected features to be caused by a high accuracy of spam email classification based on Multi-Layer Perceptron (MLP) classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wu, Q., Wu, S., Liu, J.: Hybrid model based on SVM with Gaussian loss function and adaptive Gaussian PSO. Engineering Applications of Artificial Intelligence, 487–494 (2010)
Sang, M.L., Dong, S.K., Ji, H.K., Jong, S.P.: Spam Detection Using Feature Selection and Parameters Optimization. In: International Conference on Complex, Intelligent and Software Intensive Systems, Krakow, Poland, pp. 883–888 (2010)
Nitin, J., Bing, L.: Analyzing and Detecting Review Spam. In: Seventh IEEE International Conference on Data Mining, Omaha, NE, pp. 547–552 (2007)
Michalak, K., Kwasnicka, H.: Correlation-based Feature Selection Strategy in Neural Classification. In: Sixth International Conference on Intelligent Systems Design and Applications, Washington, DC, USA, pp. 741–746 (2006)
Matthew, Chung, K.P.: Using phrases as features in email classification. The Journal of Systems and Software, 1036–1045 (2009)
Huang, C.L., Wang, C.J.: A GA-based feature selection and parameters optimization for support vector machines. Expert Systems with Applications, 231–240 (2006)
Zahran, B.M., Kanaan, G.H.: Text Feature Selection using Particle Swarm Optimization Algorithm. World Applied Sciences Journal 7, 69–74 (2009)
Alper, U., Alper, M., Ratna, B.C.: mr 2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Information Sciences, 4625–4641 (2011)
Tu, C.-J., Li, Y.C., Jun, Y.C., Cheng, H.Y.: Feature Selection using PSO-SVM. International Journal of Computer Science (2007)
Wang, I.R., Youssef, A.M., Elhakeem, A.K.: On Some Feature Selection Strategies for Spam Filter Design. In: IEEE Electrical and Computer Engineering, Canadian, pp. 2186–2189 (2006)
Islam, R.M., Chowdhury, M.U., Zhou, W.: An Innovative Spam Filtering Model Based on Support Vector Machine. In: International Conference on Computational Intelligence for Modeling, Control and Automation, Vienna, Austria, pp. 348–353 (2005)
Sirisanyalak, B., Sornil, O.: Artificial Immunity-Based Feature Extraction for Spam Detection. In: International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, pp. 359–364 (2007)
El-Alfy,EL.M.: Discovering Classification Rules for Email Spam Filtering with an Ant Colony Optimization Algorithm. In: IEEE Evolutionary Computation, Trondheim, pp. 1778–1783 (2009)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: IEEE International Conference on Neural Networks, Perth, WA, pp. 1942–1948 (1995)
Valle, Y.D., Venayagamoorthy, G.K., Mohagheghi, S., Hernandez, J.C., Harley, R.G.: Particle Swarm Optimization: Basic Concepts, Variants and Applications in Power Systems. In: IEEE Evolutionary Computation, pp. 171–195 (2008)
Lai, C.H.: Particle Swarm Optimization-Aided Feature Selection for Spam Email Classification. IEEE, Kumamoto (2007)
Ramadan, R.M., Abdel-Kader, R.F.: Face Recognition Using Particle Swarm Optimization-Based Selected Features. International Journal of Signal Processing, Image Processing and Pattern Recognition 2(1), 51–66 (2009)
Soranamageswari, M., Meena, C.: Statistical Feature Extraction for Classification of Image Spam Using Artificial Neural Networks. In: International Conference on Machine Learning and Computing, Bangalore, pp. 101–105 (2010)
Soranamageswari, M., Meena, C.: An Efficient Feature Extraction Method for Classification of Image Spam Using Artificial Neural Networks. In: International Conference on Data Storage and Data Engineering, Bangalore, India, pp. 169–172 (2010)
Vafaie, H., Jong, K.D.: Genetic Algorithms as a Tool for Feature Selection in Machine Learning. In: Fourth International Conference on Tools with Artificial Intelligence, Arlington, VA, pp. 20–23. IEEE (1992)
Perez, F.M., Gimeno, F.J.M., Jorquera, D.M.M., Abarca, J.A.G.M., Morillo, H.R., Fonseca, I.L.: Network Intrusion Detection System Embedded on a Smart Sensor. IEEE Industrial Informatics 58 (2011)
Tretyakov, K.: Machine Learning Techniques in Spam Filtering. Data Mining Problem-oriented Seminar, MTAT.03.177, pp. 60–79 (2004)
Carpinteiro, O.A.S., Lima, I., Assis, J.M.C., de Souza, A.C.Z., Moreira, E.M., Pinheiro, C.A.M.: A Neural Model in Anti-spam Systems. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4132, pp. 847–855. Springer, Heidelberg (2006)
Pazoki, A., Pazoki, Z.: Classification system for rain fed wheat grain cultivars using artificial neural network. African Journal of Biotechnology 10(41), 8031–8038 (2011)
Ruan, G., Ying, T.: A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Computing 4(2), 139–150 (2010)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, New York, USA, pp. 160–167 (2000)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An evaluation of Naive Bayesian anti-spam filtering. In: 11th European Conference on Machine Learning, Barcelona, Spain, pp. 9–17 (2000)
Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of International Conference on Web Intelligence, pp. 702–705. IEEE (2003)
Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify e-mail. Information Sciences 177, 2167–2187 (2007)
Alper, U., Alper, M.: A discrete particle swarm optimization method for feature selection in binary classification problems. European Journal of Operational Research 206(3), 528–539 (2010)
Martin, S., Nelson, B., Sewani, A., Chen, K., Joseph, A.: Analyzing Behavioral Features for Email Classification
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Behjat, A.R., Mustapha, A., Nezamabadi-pour, H., Sulaiman, M.N., Mustapha, N. (2013). A PSO-Based Feature Subset Selection for Application of Spam /Non-spam Detection. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-40567-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40566-2
Online ISBN: 978-3-642-40567-9
eBook Packages: Computer ScienceComputer Science (R0)