Skip to main content

A PSO-Based Feature Subset Selection for Application of Spam /Non-spam Detection

  • Conference paper
Soft Computing Applications and Intelligent Systems (M-CAIT 2013)

Abstract

The difficulties of email spam detection system associated with high dimensionality in feature selection process and low accuracy of spam email classification. However, in machine learning, Feature selection (FS) as a global optimization problem decreases irrelevant and redundant data and creates a set of acceptable results with high accuracy. This paper presents a feature selection algorithm based on particle swarm optimization (PSO), which decreases dimensionality and improves the accuracy of spam email classification. PSO as a computational model fallows the social behavior of bird flocking or fish schooling. The proposed PSO-based feature selection algorithm searches the feature space for the best feature subsets. The evolution of feature selected is determined by a fitness function. The classifier performance and the length of selected feature vector as a classifier input are considered for performance evaluation using Ling-Spam and SpamAssassin databases. Experimental results show that the PSO-based feature selection algorithm was presented to generate excellent feature selection results with the minimal set of selected features to be caused by a high accuracy of spam email classification based on Multi-Layer Perceptron (MLP) classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Wu, Q., Wu, S., Liu, J.: Hybrid model based on SVM with Gaussian loss function and adaptive Gaussian PSO. Engineering Applications of Artificial Intelligence, 487–494 (2010)

    Google Scholar 

  2. Sang, M.L., Dong, S.K., Ji, H.K., Jong, S.P.: Spam Detection Using Feature Selection and Parameters Optimization. In: International Conference on Complex, Intelligent and Software Intensive Systems, Krakow, Poland, pp. 883–888 (2010)

    Google Scholar 

  3. Nitin, J., Bing, L.: Analyzing and Detecting Review Spam. In: Seventh IEEE International Conference on Data Mining, Omaha, NE, pp. 547–552 (2007)

    Google Scholar 

  4. Michalak, K., Kwasnicka, H.: Correlation-based Feature Selection Strategy in Neural Classification. In: Sixth International Conference on Intelligent Systems Design and Applications, Washington, DC, USA, pp. 741–746 (2006)

    Google Scholar 

  5. Matthew, Chung, K.P.: Using phrases as features in email classification. The Journal of Systems and Software, 1036–1045 (2009)

    Google Scholar 

  6. Huang, C.L., Wang, C.J.: A GA-based feature selection and parameters optimization for support vector machines. Expert Systems with Applications, 231–240 (2006)

    Google Scholar 

  7. Zahran, B.M., Kanaan, G.H.: Text Feature Selection using Particle Swarm Optimization Algorithm. World Applied Sciences Journal 7, 69–74 (2009)

    Google Scholar 

  8. Alper, U., Alper, M., Ratna, B.C.: mr 2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Information Sciences, 4625–4641 (2011)

    Google Scholar 

  9. Tu, C.-J., Li, Y.C., Jun, Y.C., Cheng, H.Y.: Feature Selection using PSO-SVM. International Journal of Computer Science (2007)

    Google Scholar 

  10. Wang, I.R., Youssef, A.M., Elhakeem, A.K.: On Some Feature Selection Strategies for Spam Filter Design. In: IEEE Electrical and Computer Engineering, Canadian, pp. 2186–2189 (2006)

    Google Scholar 

  11. Islam, R.M., Chowdhury, M.U., Zhou, W.: An Innovative Spam Filtering Model Based on Support Vector Machine. In: International Conference on Computational Intelligence for Modeling, Control and Automation, Vienna, Austria, pp. 348–353 (2005)

    Google Scholar 

  12. Sirisanyalak, B., Sornil, O.: Artificial Immunity-Based Feature Extraction for Spam Detection. In: International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, pp. 359–364 (2007)

    Google Scholar 

  13. El-Alfy,EL.M.: Discovering Classification Rules for Email Spam Filtering with an Ant Colony Optimization Algorithm. In: IEEE Evolutionary Computation, Trondheim, pp. 1778–1783 (2009)

    Google Scholar 

  14. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: IEEE International Conference on Neural Networks, Perth, WA, pp. 1942–1948 (1995)

    Google Scholar 

  15. Valle, Y.D., Venayagamoorthy, G.K., Mohagheghi, S., Hernandez, J.C., Harley, R.G.: Particle Swarm Optimization: Basic Concepts, Variants and Applications in Power Systems. In: IEEE Evolutionary Computation, pp. 171–195 (2008)

    Google Scholar 

  16. Lai, C.H.: Particle Swarm Optimization-Aided Feature Selection for Spam Email Classification. IEEE, Kumamoto (2007)

    Google Scholar 

  17. Ramadan, R.M., Abdel-Kader, R.F.: Face Recognition Using Particle Swarm Optimization-Based Selected Features. International Journal of Signal Processing, Image Processing and Pattern Recognition 2(1), 51–66 (2009)

    Google Scholar 

  18. Soranamageswari, M., Meena, C.: Statistical Feature Extraction for Classification of Image Spam Using Artificial Neural Networks. In: International Conference on Machine Learning and Computing, Bangalore, pp. 101–105 (2010)

    Google Scholar 

  19. Soranamageswari, M., Meena, C.: An Efficient Feature Extraction Method for Classification of Image Spam Using Artificial Neural Networks. In: International Conference on Data Storage and Data Engineering, Bangalore, India, pp. 169–172 (2010)

    Google Scholar 

  20. Vafaie, H., Jong, K.D.: Genetic Algorithms as a Tool for Feature Selection in Machine Learning. In: Fourth International Conference on Tools with Artificial Intelligence, Arlington, VA, pp. 20–23. IEEE (1992)

    Google Scholar 

  21. Perez, F.M., Gimeno, F.J.M., Jorquera, D.M.M., Abarca, J.A.G.M., Morillo, H.R., Fonseca, I.L.: Network Intrusion Detection System Embedded on a Smart Sensor. IEEE Industrial Informatics 58 (2011)

    Google Scholar 

  22. Tretyakov, K.: Machine Learning Techniques in Spam Filtering. Data Mining Problem-oriented Seminar, MTAT.03.177, pp. 60–79 (2004)

    Google Scholar 

  23. Carpinteiro, O.A.S., Lima, I., Assis, J.M.C., de Souza, A.C.Z., Moreira, E.M., Pinheiro, C.A.M.: A Neural Model in Anti-spam Systems. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4132, pp. 847–855. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Pazoki, A., Pazoki, Z.: Classification system for rain fed wheat grain cultivars using artificial neural network. African Journal of Biotechnology 10(41), 8031–8038 (2011)

    Google Scholar 

  25. Ruan, G., Ying, T.: A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Computing 4(2), 139–150 (2010)

    Article  Google Scholar 

  26. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, New York, USA, pp. 160–167 (2000)

    Google Scholar 

  27. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An evaluation of Naive Bayesian anti-spam filtering. In: 11th European Conference on Machine Learning, Barcelona, Spain, pp. 9–17 (2000)

    Google Scholar 

  28. Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of International Conference on Web Intelligence, pp. 702–705. IEEE (2003)

    Google Scholar 

  29. Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify e-mail. Information Sciences 177, 2167–2187 (2007)

    Article  Google Scholar 

  30. Alper, U., Alper, M.: A discrete particle swarm optimization method for feature selection in binary classification problems. European Journal of Operational Research 206(3), 528–539 (2010)

    Article  MATH  Google Scholar 

  31. Martin, S., Nelson, B., Sewani, A., Chen, K., Joseph, A.: Analyzing Behavioral Features for Email Classification

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Behjat, A.R., Mustapha, A., Nezamabadi-pour, H., Sulaiman, M.N., Mustapha, N. (2013). A PSO-Based Feature Subset Selection for Application of Spam /Non-spam Detection. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40567-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40566-2

  • Online ISBN: 978-3-642-40567-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics