Abstract
Email has become extremely popular among people nowadays. In fact, it has been reported to be the cheapest, popular and fastest means of communication in recent times. Despite the huge benefits of emails, unfortunately its usage has been bedeviled with the huge presence of unsolicited and sometimes fraudulent emails which must be promptly detected and isolated through what is popularly referred to as spam detection system. Spam detection is highly needed to protect email users and prevents several negative usages to which emails have been subjected to of recent. Unfortunately, due to the adaptive nature of unsolicited emails through the use of mailing tools, the effectiveness of the spam detecting tools has often been limited and sometimes rendered ineffective, hence the need for better spam detection tools to achieve better spam detection accuracy. Several spam detection models have been proposed and tested in the literature, but still the reported accuracy indicated that there is still need for more work in this direction in order to achieve better accuracy. In this work, support vector machines-based model is proposed for spam detection while paying attention to appropriately search for the optimal parameters to achieve better performance. Experimental results show that the proposed model outperformed all the earlier proposed models on the same popular dataset employed in this work. Accuracy of 95.87 and 94.06% was obtained for training and testing sets, respectively. The 94.06% testing accuracy represents an improvement of 3.11% over the best reported model in the literature that had an accuracy of 91.22% on the same dataset used in this work.
Similar content being viewed by others
References
Abu-Nimeh S, Nappa D, Wang X, Nair S (2008) Bayesian additive regression trees-based spam detection for enhanced email privacy. In: 2008 third international conference on availability, reliability and security. IEEE, pp. 1044–1051. doi:10.1109/ARES.2008.136
Adewumi AAAA, Owolabi TO, Alade IOIO, Olatunji SO (2016) Estimation of physical, mechanical and hydrological properties of permeable concrete using computational intelligence approach. Appl Soft Comput 42:342–350. doi:10.1016/j.asoc.2016.02.009
Akande KOKO, Owolabi TO, Olatunji SO (2015) Investigating the effect of correlation-based feature selection on the performance of support vector machines in reservoir characterization. J Nat Gas Sci Eng 22:515–522. doi:10.1016/j.jngse.2015.01.007
Akande KO, Olatunji SO, Owolabi TO, AbdulRaheem A (2015a) Comparative analysis of feature selection-based machine learning techniques in reservoir characterization. CPAPER, Society of Petroleum Engineers. doi:10.2118/178006-MS
Akande KO, Olatunji SO, Owolabi TO, AbdulRaheem A (2015b) Feature selection-based ANN for improved characterization of carbonate reservoir. CPAPER, Society of Petroleum Engineers. doi:10.2118/178029-MS
Akande KO, Owolabi TO, Twaha S, Olatunji SO (2014) Performance comparison of SVM and ANN in predicting compressive strength of concrete. IOSR J Comput Eng 16(5):88–94
Ariaeinejad R, Sadeghian A (2011) Spam detection system: a new approach based on interval type-2 fuzzy sets. In: 2011 24th Canadian conference on electrical and computer engineering(CCECE). IEEE, pp. 000379–000384. doi:10.1109/CCECE.2011.6030477
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
Fernandez R, Picard RW (2002) Dialog act classification from prosodic features using support vector machines. In: Speech Prosody. Conference paper, Aix-en Provence, France, Dialog Act
Gupta SM (2007) Support vector machines based modelling of concrete strength. World Acad Sci Eng Technol 36:305–311
Ibitoye M, Hamzaid N, Abdul Wahab A, Hasnan N, Olatunji S, Davis G (2016) Estimation of electrically-evoked knee torque from mechanomyography using support vector regression. Sensors 16(7):1115. doi:10.3390/s16071115
Idris I, Selamat A (2014) Improved email spam detection model with negative selection algorithm and particle swarm optimization. Appl Soft Comput 22:11–27. doi:10.1016/j.asoc.2014.05.002
Özgür L, Güngör T, Gürgen F (2004) Spam mail detection using artificial neural network and Bayesian filter, 505–510. doi:10.1007/978-3-540-28651-6_74
Hopkins M, Reeber E, Forman G, Suermondt J (1999) SpamBase dataset. Hewlett-Packard Labs; 1501 Page Mill Rd.; Palo Alto; CA 94304. https://archive.ics.uci.edu/ml/datasets/Spambase
Milano P, Chicco D (2012) Support vector machines in bioinformatics: a survey. A technical report, pp 1–35. https://s3-us-west-2.amazonaws.com/mlsurveys/125.pdf. Accessed June 2017
Ni L-P, Ni Z-W, Gao Y-Z (2011) Stock trend prediction based on fractal feature selection and support vector machine. Expert Syst Appl 38(5):5569–5576. http://www.sciencedirect.com/science/article/B6V03-51F7PMS-B/2/f3645bc7144b2047233ac753849dccce
Olatunji SO, Hossain A (2012) Support vector machines based model for predicting software maintainability of object-oriented software systems. J Inf Commun Technol 2(5), 23–32. http://www.jict.co.uk/volume-2-issue-5-may-2012
Olatunji SO, Selamat A, Abdulraheem A, Abdul Raheem AA (2014) A hybrid model through the fusion of type-2 fuzzy logic systems, and extreme learning machines for modelling permeability prediction. Inf Fusion 16(2014):29–45. doi:10.1016/j.inffus.2012.06.001
Owolabi T, Akande K, Olatunji S (2014) Estimation of superconducting transition temperature T C for superconductors of the doped MgB2 system from the crystal lattice parameters using support vector regression. J Supercond Novel Magn. doi:10.1007/s10948-014-2891-7
Owolabi TO, Akande KO, Olatunji SO (2015) Estimation of surface energies of hexagonal close packed metals using computational intelligence technique. Appl Soft Comput 31:360–368. doi:10.1016/j.asoc.2015.03.009
Owolabi TO, Akande KOKO, Olatunji SO (2016) Application of computational intelligence technique for estimating superconducting transition temperature of YBCO superconductors. Appl Soft Comput 43:143–149. doi:10.1016/j.asoc.2016.02.005
Owolabi TO, Akande KO, Olatunji SO (2014) Estimation of the atomic radii of periodic elements using support vector machine. Int J Adv Inf Sci Technol 28(28):39–49
Owolabi TO, Akande KO, Olatunji SO (2014) Prediction of superconducting transition temperatures for fe-based superconductors using support vector machine. Adv Phys Theories Appl 35:12–26
Owolabi TO, Akande KO, Olatunji SO (2014) Support vector machines approach for estimating work function of semiconductors: addressing the limitation of metallic plasma model. Appl Phys Res 6(5):122
Owolabi TO, Akande KO, Olatunji SO (2015) Development and validation of surface energies estimator (SEE) using computational intelligence technique. Comput Mater Sci 101:143–151. doi:10.1016/j.commatsci.2015.01.020
Owolabi TO, Akande KO, Olatunji SO (2015) Estimation of surface energies of transition metal carbides using machine learning approach. Int J Mater Sci Eng. doi:10.17706/ijmse.2015.3.2.104-119
Owolabi TO, Akande KO, Olatunji SO (2016) Computational intelligence method of estimating solid–liquid interfacial energy of materials at their melting temperatures. J Intell Fuzzy Syst 31:519–527
Owolabi TO, Akande KO, Sunday OO (2015) Modeling of average surface energy estimator using computational intelligence technique. Multidiscip Modell Mater Struct 11(2):284–296. doi:10.1108/MMMS-12-2014-0059
Owolabi TO, Faiz M, Olatunji SO, Popoola IK (2016) Computational intelligence method of determining the energy band gap of doped ZnO semiconductor. Mater Des 101:277–284. doi:10.1016/j.matdes.2016.03.116
Rojas DA, Ramos OL, Saby JE (2016) Recognition of Spanish vowels through imagined speech by using spectral analysis and SVM. J Inf Hiding Multimed Signal Process 7(4):889–897. http://bit.kuas.edu.tw/~jihmsp/2016/vol7/JIH-MSP-2016-04-020.pdf
Canu S, Grandvalet Y, Guigue V, Rakotomamonjy A (2008) SVM and kernel methods matlab toolbox. A free SVM toolbox. http://asi.insa-rouen.fr/enseignants/~arakoto/toolbox/. Accessed June 2017
Olatunji SO, Arif H (2015) Identification of erythemato-squamous skin diseases using support vector machines and extreme learning machines: a comparative study towards effective diagnosis. Trans Mach Learn Artif Intell 2(6):124–135. doi:10.14738/tmlai.26.812
Temitayo F, Stephen O, Abimbola A (2012) Hybrid GA-SVM for efficient feature selection in E-mail classification. ISSN 3(3):2222–1719. www.iiste.org
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Yin H, Qiao J, Fu P, Xia X (2014) Face feature selection with binary particle swarm optimization and support vector machine. J Inf Hiding Multimed Signal Process 5(4):731–739. http://bit.kuas.edu.tw/~jihmsp/2014/vol5/JIH-MSP-2014-04-014.pdf
Zhang Y, Li H, Niranjan M, Rockett P (2008) Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering. Springer, Berlin, pp. 325–336. doi:10.1007/978-3-540-78671-9_28
Acknowledgement
The author would like to acknowledge the University of Dammam, Dammam, Kingdom of Saudi Arabia for some of the facilities utilized during the course of this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Rights and permissions
About this article
Cite this article
Olatunji, S.O. Improved email spam detection model based on support vector machines. Neural Comput & Applic 31, 691–699 (2019). https://doi.org/10.1007/s00521-017-3100-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3100-y