ABSTRACT
Email has become a necessity for this new generation for official communication purposes. As the use of Internet is becoming more and more the risk of being caught into its darker side is so common. The major concern is spam, which is growing exponentially, and the users are becoming victim of it on daily basis. This paper proposes a hybrid machine learning classification model for the spam classification on the spambase dataset. This model uses the four classification algorithms namely Ensemble Classification, Decision Tree, Random Forest and Support Vector Machine (SVM). There are two phases; First phase deals with the classification of spambase dataset in two classes i.e. spam and ham with Decision Tree machine learning algorithm and the second phase comprises of classification improvisation of the output produced by phase one with four machine learning algorithms i.e. Decision Tree, Random Forest, Support Vector Machine (SVM) and Ensemble Learning. The experiment shows a very promising result with improvised accuracy in second phase.
- Criminisi, J. Shotton, E. Konukoglu, et al., Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Foundations and Trends R in Computer Graphics and Vision, vol. 7, Issue no. 2, pp. 81, 2012.Google Scholar
- Alpaydin, E. Introduction to machine learning. The MIT Press, Cambridge, Massachusetts, 2014.Google ScholarDigital Library
- Alurkar, A., Ranade, S. and Joshi, S. et al. A proposed data science approach for email spam classification using machine learning techniques. 2017 Internet of Things Business Models, Users, and Networks, (2017).Google Scholar
- Chae, M., Alsadoon, A., Prasad, P. and Elchouemi, A. Spam filtering email classification (SFECM) using gain and graph mining algorithm. 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC), (2017).Google ScholarCross Ref
- Deng, F., Guo, S., Zhou, R. and Chen, J. Sensor Multifault Diagnosis With Improved Support Vector Machines. IEEE Transactions on Automation Science and Engineering 14, 2 (2017), 1053--1063.Google Scholar
- G. Biau, L. Devroye, and G. Lugosi, Consistency of random forests and other averaging classifiers," Journal of Machine Learning Research, vol. 9, no.Sep, pp. 2015--2033, 2008.Google Scholar
- Hopkins, M., Reeber, E., Forman, G. and Suermondt, J. UCI Machine Learning Repository: Spambase Data Set. Archive.ics.uci.edu, 2005. https://archive.ics.uci.edu/ml/datasets/Spambase.Google Scholar
- Hopkins, M., Reeber, E., Forman, G. and Suermondt, J. UCI Machine Learning Repository: Spambase Data Set. Archive.ics.uci.edu, 1999. https://archive.ics.uci.edu/ml/datasets/spambase.Google Scholar
- J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, vol. 1.Springer seriesin statistics New York, 2001.Google Scholar
- Krebs, B. Spam Nation.Sourcebooks, 2014.Google Scholar
- L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5{32, 2001.Google ScholarDigital Library
- M. Denil, D. Matheson, and N. De Freitas, Narrowing the gap: Random forests in theory and in practice, in International conference on machine learning, pp. 665{673, 2014.Google Scholar
- Mitchell, T., Buchanan, B., DeJong, G., Dietterich, T., Rosenbloom, P. and Waibel, A. Machine learning. Tioga Publ. Co., Palo Alto, Calif., 1990.Google Scholar
- R. E. Schapire and Y. Freund, Boosting: Foundations and algorithms. MIT press, 2012.Google Scholar
- Shalev-Shwartz, S. and Ben-David, S. Understanding machine learning: from Theory to Algorithm. Cambridge University Press, Cambridge, 2014.Google ScholarCross Ref
- Schryen, G. 2007. Anti-spam measures. Springer.Google Scholar
- T. G. Dietterich et al., \Ensemble learning," Thehandbook of brain theory and neural networks, vol. 2, pp. 110{125, 2002.Google Scholar
- Wang, S., Chaovalitwongse, W. and Babuska, R. Machine Learning Algorithms in Bipedal Robot Control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 5 (2012), 728--743.Google ScholarDigital Library
- Wijayanto, A. and Takdir 2014. Fighting cyber crime in email spamming: An evaluation of fuzzy clustering approach to classify spam messages. 2014 International Conference on Information Technology Systems and Innovation (ICITSI). (2014).Google ScholarCross Ref
Index Terms
- Machine Learning Techniques for Classification of Spambase Dataset: A Hybrid Approach
Recommendations
Machine learning-based classification: an analysis based on COVID-19 transmission electron microscopy images
Virus is a type of microorganism which provides adverse effect on the human society. Viruses replicate within the human cells quickly. Currently, the effects of very dangerous infectious viruses are a major issue throughout the globe. Coronavirus (CV) is ...
Analysis of Machine Learning Techniques for Anomaly-Based Intrusion Detection
Determining the machine learning (ML) technique that performs best on new datasets is an important factor in the design of effective anomaly-based intrusion detection systems. This study therefore evaluated four machine learning algorithms (naive ...
Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails
Detection of the spam emails within a set of email files has become challenging task for researchers. Identification of an effective classifier is based not only on high accuracy of detection but also on low false alarm rates, and the need to use as few ...
Comments