Skip to main content
Log in

Improved email spam detection model based on support vector machines

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Email has become extremely popular among people nowadays. In fact, it has been reported to be the cheapest, popular and fastest means of communication in recent times. Despite the huge benefits of emails, unfortunately its usage has been bedeviled with the huge presence of unsolicited and sometimes fraudulent emails which must be promptly detected and isolated through what is popularly referred to as spam detection system. Spam detection is highly needed to protect email users and prevents several negative usages to which emails have been subjected to of recent. Unfortunately, due to the adaptive nature of unsolicited emails through the use of mailing tools, the effectiveness of the spam detecting tools has often been limited and sometimes rendered ineffective, hence the need for better spam detection tools to achieve better spam detection accuracy. Several spam detection models have been proposed and tested in the literature, but still the reported accuracy indicated that there is still need for more work in this direction in order to achieve better accuracy. In this work, support vector machines-based model is proposed for spam detection while paying attention to appropriately search for the optimal parameters to achieve better performance. Experimental results show that the proposed model outperformed all the earlier proposed models on the same popular dataset employed in this work. Accuracy of 95.87 and 94.06% was obtained for training and testing sets, respectively. The 94.06% testing accuracy represents an improvement of 3.11% over the best reported model in the literature that had an accuracy of 91.22% on the same dataset used in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Abu-Nimeh S, Nappa D, Wang X, Nair S (2008) Bayesian additive regression trees-based spam detection for enhanced email privacy. In: 2008 third international conference on availability, reliability and security. IEEE, pp. 1044–1051. doi:10.1109/ARES.2008.136

  2. Adewumi AAAA, Owolabi TO, Alade IOIO, Olatunji SO (2016) Estimation of physical, mechanical and hydrological properties of permeable concrete using computational intelligence approach. Appl Soft Comput 42:342–350. doi:10.1016/j.asoc.2016.02.009

    Article  Google Scholar 

  3. Akande KOKO, Owolabi TO, Olatunji SO (2015) Investigating the effect of correlation-based feature selection on the performance of support vector machines in reservoir characterization. J Nat Gas Sci Eng 22:515–522. doi:10.1016/j.jngse.2015.01.007

    Article  Google Scholar 

  4. Akande KO, Olatunji SO, Owolabi TO, AbdulRaheem A (2015a) Comparative analysis of feature selection-based machine learning techniques in reservoir characterization. CPAPER, Society of Petroleum Engineers. doi:10.2118/178006-MS

  5. Akande KO, Olatunji SO, Owolabi TO, AbdulRaheem A (2015b) Feature selection-based ANN for improved characterization of carbonate reservoir. CPAPER, Society of Petroleum Engineers. doi:10.2118/178029-MS

  6. Akande KO, Owolabi TO, Twaha S, Olatunji SO (2014) Performance comparison of SVM and ANN in predicting compressive strength of concrete. IOSR J Comput Eng 16(5):88–94

    Article  Google Scholar 

  7. Ariaeinejad R, Sadeghian A (2011) Spam detection system: a new approach based on interval type-2 fuzzy sets. In: 2011 24th Canadian conference on electrical and computer engineering(CCECE). IEEE, pp. 000379–000384. doi:10.1109/CCECE.2011.6030477

  8. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  9. Fernandez R, Picard RW (2002) Dialog act classification from prosodic features using support vector machines. In: Speech Prosody. Conference paper, Aix-en Provence, France, Dialog Act

  10. Gupta SM (2007) Support vector machines based modelling of concrete strength. World Acad Sci Eng Technol 36:305–311

    Google Scholar 

  11. Ibitoye M, Hamzaid N, Abdul Wahab A, Hasnan N, Olatunji S, Davis G (2016) Estimation of electrically-evoked knee torque from mechanomyography using support vector regression. Sensors 16(7):1115. doi:10.3390/s16071115

    Article  Google Scholar 

  12. Idris I, Selamat A (2014) Improved email spam detection model with negative selection algorithm and particle swarm optimization. Appl Soft Comput 22:11–27. doi:10.1016/j.asoc.2014.05.002

    Article  Google Scholar 

  13. Özgür L, Güngör T, Gürgen F (2004) Spam mail detection using artificial neural network and Bayesian filter, 505–510. doi:10.1007/978-3-540-28651-6_74

  14. Hopkins M, Reeber E, Forman G, Suermondt J (1999) SpamBase dataset. Hewlett-Packard Labs; 1501 Page Mill Rd.; Palo Alto; CA 94304. https://archive.ics.uci.edu/ml/datasets/Spambase

  15. Milano P, Chicco D (2012) Support vector machines in bioinformatics: a survey. A technical report, pp 1–35. https://s3-us-west-2.amazonaws.com/mlsurveys/125.pdf. Accessed June 2017

  16. Ni L-P, Ni Z-W, Gao Y-Z (2011) Stock trend prediction based on fractal feature selection and support vector machine. Expert Syst Appl 38(5):5569–5576. http://www.sciencedirect.com/science/article/B6V03-51F7PMS-B/2/f3645bc7144b2047233ac753849dccce

  17. Olatunji SO, Hossain A (2012) Support vector machines based model for predicting software maintainability of object-oriented software systems. J Inf Commun Technol 2(5), 23–32. http://www.jict.co.uk/volume-2-issue-5-may-2012

  18. Olatunji SO, Selamat A, Abdulraheem A, Abdul Raheem AA (2014) A hybrid model through the fusion of type-2 fuzzy logic systems, and extreme learning machines for modelling permeability prediction. Inf Fusion 16(2014):29–45. doi:10.1016/j.inffus.2012.06.001

    Article  Google Scholar 

  19. Owolabi T, Akande K, Olatunji S (2014) Estimation of superconducting transition temperature T C for superconductors of the doped MgB2 system from the crystal lattice parameters using support vector regression. J Supercond Novel Magn. doi:10.1007/s10948-014-2891-7

    Google Scholar 

  20. Owolabi TO, Akande KO, Olatunji SO (2015) Estimation of surface energies of hexagonal close packed metals using computational intelligence technique. Appl Soft Comput 31:360–368. doi:10.1016/j.asoc.2015.03.009

    Article  Google Scholar 

  21. Owolabi TO, Akande KOKO, Olatunji SO (2016) Application of computational intelligence technique for estimating superconducting transition temperature of YBCO superconductors. Appl Soft Comput 43:143–149. doi:10.1016/j.asoc.2016.02.005

    Article  Google Scholar 

  22. Owolabi TO, Akande KO, Olatunji SO (2014) Estimation of the atomic radii of periodic elements using support vector machine. Int J Adv Inf Sci Technol 28(28):39–49

    Google Scholar 

  23. Owolabi TO, Akande KO, Olatunji SO (2014) Prediction of superconducting transition temperatures for fe-based superconductors using support vector machine. Adv Phys Theories Appl 35:12–26

    Google Scholar 

  24. Owolabi TO, Akande KO, Olatunji SO (2014) Support vector machines approach for estimating work function of semiconductors: addressing the limitation of metallic plasma model. Appl Phys Res 6(5):122

    Article  Google Scholar 

  25. Owolabi TO, Akande KO, Olatunji SO (2015) Development and validation of surface energies estimator (SEE) using computational intelligence technique. Comput Mater Sci 101:143–151. doi:10.1016/j.commatsci.2015.01.020

    Article  Google Scholar 

  26. Owolabi TO, Akande KO, Olatunji SO (2015) Estimation of surface energies of transition metal carbides using machine learning approach. Int J Mater Sci Eng. doi:10.17706/ijmse.2015.3.2.104-119

    Google Scholar 

  27. Owolabi TO, Akande KO, Olatunji SO (2016) Computational intelligence method of estimating solid–liquid interfacial energy of materials at their melting temperatures. J Intell Fuzzy Syst 31:519–527

    Article  Google Scholar 

  28. Owolabi TO, Akande KO, Sunday OO (2015) Modeling of average surface energy estimator using computational intelligence technique. Multidiscip Modell Mater Struct 11(2):284–296. doi:10.1108/MMMS-12-2014-0059

    Article  Google Scholar 

  29. Owolabi TO, Faiz M, Olatunji SO, Popoola IK (2016) Computational intelligence method of determining the energy band gap of doped ZnO semiconductor. Mater Des 101:277–284. doi:10.1016/j.matdes.2016.03.116

    Article  Google Scholar 

  30. Rojas DA, Ramos OL, Saby JE (2016) Recognition of Spanish vowels through imagined speech by using spectral analysis and SVM. J Inf Hiding Multimed Signal Process 7(4):889–897. http://bit.kuas.edu.tw/~jihmsp/2016/vol7/JIH-MSP-2016-04-020.pdf

  31. Canu S, Grandvalet Y, Guigue V, Rakotomamonjy A (2008) SVM and kernel methods matlab toolbox. A free SVM toolbox. http://asi.insa-rouen.fr/enseignants/~arakoto/toolbox/. Accessed June 2017

  32. Olatunji SO, Arif H (2015) Identification of erythemato-squamous skin diseases using support vector machines and extreme learning machines: a comparative study towards effective diagnosis. Trans Mach Learn Artif Intell 2(6):124–135. doi:10.14738/tmlai.26.812

    Google Scholar 

  33. Temitayo F, Stephen O, Abimbola A (2012) Hybrid GA-SVM for efficient feature selection in E-mail classification. ISSN 3(3):2222–1719. www.iiste.org

  34. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  35. Yin H, Qiao J, Fu P, Xia X (2014) Face feature selection with binary particle swarm optimization and support vector machine. J Inf Hiding Multimed Signal Process 5(4):731–739. http://bit.kuas.edu.tw/~jihmsp/2014/vol5/JIH-MSP-2014-04-014.pdf

  36. Zhang Y, Li H, Niranjan M, Rockett P (2008) Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering. Springer, Berlin, pp. 325–336. doi:10.1007/978-3-540-78671-9_28

Download references

Acknowledgement

The author would like to acknowledge the University of Dammam, Dammam, Kingdom of Saudi Arabia for some of the facilities utilized during the course of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunday Olusanya Olatunji.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Olatunji, S.O. Improved email spam detection model based on support vector machines. Neural Comput & Applic 31, 691–699 (2019). https://doi.org/10.1007/s00521-017-3100-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3100-y

Keywords

Navigation