skip to main content
10.1145/3386164.3389089acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscsicConference Proceedingsconference-collections
research-article

Machine Learning Techniques for Classification of Spambase Dataset: A Hybrid Approach

Authors Info & Claims
Published:06 June 2020Publication History

ABSTRACT

Email has become a necessity for this new generation for official communication purposes. As the use of Internet is becoming more and more the risk of being caught into its darker side is so common. The major concern is spam, which is growing exponentially, and the users are becoming victim of it on daily basis. This paper proposes a hybrid machine learning classification model for the spam classification on the spambase dataset. This model uses the four classification algorithms namely Ensemble Classification, Decision Tree, Random Forest and Support Vector Machine (SVM). There are two phases; First phase deals with the classification of spambase dataset in two classes i.e. spam and ham with Decision Tree machine learning algorithm and the second phase comprises of classification improvisation of the output produced by phase one with four machine learning algorithms i.e. Decision Tree, Random Forest, Support Vector Machine (SVM) and Ensemble Learning. The experiment shows a very promising result with improvised accuracy in second phase.

References

  1. Criminisi, J. Shotton, E. Konukoglu, et al., Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Foundations and Trends R in Computer Graphics and Vision, vol. 7, Issue no. 2, pp. 81, 2012.Google ScholarGoogle Scholar
  2. Alpaydin, E. Introduction to machine learning. The MIT Press, Cambridge, Massachusetts, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alurkar, A., Ranade, S. and Joshi, S. et al. A proposed data science approach for email spam classification using machine learning techniques. 2017 Internet of Things Business Models, Users, and Networks, (2017).Google ScholarGoogle Scholar
  4. Chae, M., Alsadoon, A., Prasad, P. and Elchouemi, A. Spam filtering email classification (SFECM) using gain and graph mining algorithm. 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC), (2017).Google ScholarGoogle ScholarCross RefCross Ref
  5. Deng, F., Guo, S., Zhou, R. and Chen, J. Sensor Multifault Diagnosis With Improved Support Vector Machines. IEEE Transactions on Automation Science and Engineering 14, 2 (2017), 1053--1063.Google ScholarGoogle Scholar
  6. G. Biau, L. Devroye, and G. Lugosi, Consistency of random forests and other averaging classifiers," Journal of Machine Learning Research, vol. 9, no.Sep, pp. 2015--2033, 2008.Google ScholarGoogle Scholar
  7. Hopkins, M., Reeber, E., Forman, G. and Suermondt, J. UCI Machine Learning Repository: Spambase Data Set. Archive.ics.uci.edu, 2005. https://archive.ics.uci.edu/ml/datasets/Spambase.Google ScholarGoogle Scholar
  8. Hopkins, M., Reeber, E., Forman, G. and Suermondt, J. UCI Machine Learning Repository: Spambase Data Set. Archive.ics.uci.edu, 1999. https://archive.ics.uci.edu/ml/datasets/spambase.Google ScholarGoogle Scholar
  9. J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, vol. 1.Springer seriesin statistics New York, 2001.Google ScholarGoogle Scholar
  10. Krebs, B. Spam Nation.Sourcebooks, 2014.Google ScholarGoogle Scholar
  11. L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5{32, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Denil, D. Matheson, and N. De Freitas, Narrowing the gap: Random forests in theory and in practice, in International conference on machine learning, pp. 665{673, 2014.Google ScholarGoogle Scholar
  13. Mitchell, T., Buchanan, B., DeJong, G., Dietterich, T., Rosenbloom, P. and Waibel, A. Machine learning. Tioga Publ. Co., Palo Alto, Calif., 1990.Google ScholarGoogle Scholar
  14. R. E. Schapire and Y. Freund, Boosting: Foundations and algorithms. MIT press, 2012.Google ScholarGoogle Scholar
  15. Shalev-Shwartz, S. and Ben-David, S. Understanding machine learning: from Theory to Algorithm. Cambridge University Press, Cambridge, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  16. Schryen, G. 2007. Anti-spam measures. Springer.Google ScholarGoogle Scholar
  17. T. G. Dietterich et al., \Ensemble learning," Thehandbook of brain theory and neural networks, vol. 2, pp. 110{125, 2002.Google ScholarGoogle Scholar
  18. Wang, S., Chaovalitwongse, W. and Babuska, R. Machine Learning Algorithms in Bipedal Robot Control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 5 (2012), 728--743.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wijayanto, A. and Takdir 2014. Fighting cyber crime in email spamming: An evaluation of fuzzy clustering approach to classify spam messages. 2014 International Conference on Information Technology Systems and Innovation (ICITSI). (2014).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Machine Learning Techniques for Classification of Spambase Dataset: A Hybrid Approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ISCSIC 2019: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control
        September 2019
        397 pages
        ISBN:9781450376617
        DOI:10.1145/3386164

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 June 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        ISCSIC 2019 Paper Acceptance Rate77of152submissions,51%Overall Acceptance Rate192of401submissions,48%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader