skip to main content
10.1145/3607947.3607979acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesic3Conference Proceedingsconference-collections
research-article

Semantic Analysis and Classification of Emails through Informative Selection of Features and Ensemble AI Model

Published: 28 September 2023 Publication History

Abstract

The emergence of novel types of communication, such as email, has been brought on by the development of the internet, which radically concentrated the way in that individuals communicate socially and with one another. It is now establishing itself as a crucial aspect of the communication network which has been adopted by a variety of commercial enterprises such as retail outlets. So in this research paper, we have built a unique spam-detection methodology based on email-body sentiment analysis. The proposed hybrid model is put into practice and preprocessing the data, extracting the properties, and categorizing data are all steps in the process. To examine the emotive and sequential aspects of texts, we use word embedding and a bi-directional LSTM network. this model frequently shortens the training period, then utilizes the Convolution Layer to extract text features at a higher level for the BiLSTM network. Our model performs better than previous versions, with an accuracy rate of 97–98%. In addition, we show that our model beats not just some well-known machine learning classifiers but also cutting-edge methods for identifying spam communications, demonstrating its superiority on its own. Suggested Ensemble model’s results are examined in terms of recall, accuracy, and precision

References

[1]
Rayan Salah Hag Ali and Neamat El Gayar. 2019. Sentiment analysis using unlabeled email data. In 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). IEEE, 328–333.
[2]
Ali Shafigh Aski and Navid Khalilzadeh Sourati. 2016. Proposed efficient algorithm to filter spam using machine learning techniques. Pacific Science Review A: Natural Science and Engineering 18, 2 (2016), 145–149.
[3]
Huwaida T Elshoush and Esraa A Dinar. 2019. Using adaboost and stochastic gradient descent (sgd) algorithms with R and orange software for filtering e-mail spam. In 2019 11th Computer Science and Electronic Engineering (CEEC). IEEE, 41–46.
[4]
Weimiao Feng, Jianguo Sun, Liguo Zhang, Cuiling Cao, and Qing Yang. 2016. A support vector machine based naive Bayes algorithm for spam filtering. In 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC). IEEE, 1–8.
[5]
Pranjul Garg and Nancy Girdhar. 2021. A Systematic Review on Spam Filtering Techniques based on Natural Language Processing Framework. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 30–35.
[6]
Adam Kavon Ghazi-Tehrani and Henry N Pontell. 2021. Phishing evolves: Analyzing the enduring cybercrime. Victims & Offenders 16, 3 (2021), 316–342.
[7]
Radicati Group 2015. Email Statistics Report 2015–2019. Radicati Group. Accessed August 13 (2015), 2019.
[8]
Maryam Hina, Mohsin Ali, and Javed. 2021. Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning. IEEE Access 9 (2021), 98398–98411.
[9]
Maryam Hina, Mohsin Ali, Abdul Rehman Javed, Fahad Ghabban, Liaqat Ali Khan, and Zunera Jalil. 2021. Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning. IEEE Access 9 (2021), 98398–98411.
[10]
Weicong Kong, Zhao Yang Dong, Youwei Jia, David J Hill, Yan Xu, and Yuan Zhang. 2017. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE transactions on smart grid 10, 1 (2017), 841–851.
[11]
T Kumaresan and C Palanisamy. 2017. E-mail spam classification using S-cuckoo search and support vector machine. International Journal of Bio-Inspired Computation 9, 3 (2017), 142–156.
[12]
Nuha H Marza, Mehdi E Manaa, and Hussein A Lafta. 2021. Classification of spam emails using deep learning. In 2021 1st Babylon International Conference on Information Technology and Science (BICITS). IEEE, 63–68.
[13]
Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. In 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, 234–239.
[14]
Sarwat Nizamani, Nasrullah Memon, Mathies Glasdam, and Dong Duong Nguyen. 2014. Detection of fraudulent emails by employing advanced feature abundance. Egyptian Informatics Journal 15, 3 (2014), 169–174.
[15]
V Priya, I Sumaiya Thaseen, Thippa Reddy Gadekallu, Mohamed K Aboudaif, and Emad Abouel Nasr. 2021. Robust attack detection approach for IIoT using ensemble classifier. arXiv preprint arXiv:2102.01515 (2021).
[16]
Justinas Rastenis, Simona Ramanauskaitė, Justinas Janulevičius, Antanas Čenys, Asta Slotkienė, and Kęstutis Pakrijauskas. 2020. E-mail-based phishing attack taxonomy. Applied Sciences 10, 7 (2020), 2363.
[17]
Karthika D Renuka and P Visalakshi. 2014. Latent semantic indexing based SVM model for email spam classification. (2014).
[18]
Shuvendu Roy, Sk Imran Hossain, MAH Akhand, and N Siddique. 2018. Sequence modeling for intelligent typing assistant with Bangla and English keyboard. In 2018 International Conference on Innovation in Engineering and Technology (ICIET). IEEE, 1–6.
[19]
Tara N Sainath, Oriol Vinyals, Andrew Senior, and Haşim Sak. 2015. Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). Ieee, 4580–4584.
[20]
Anuj Kumar Singh, Shashi Bhushan, and Sonakshi Vij. 2019. Filtering spam messages and mails using fuzzy C means algorithm. In 2019 4th International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU). IEEE, 1–5.
[21]
Kristina Toutanova and Colin Cherry. 2009. A global model for joint lemmatization and part-of-speech prediction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 486–494.
[22]
Tian Xia. 2020. A constant time complexity spam detection algorithm for boosting throughput on rule-based filtering systems. IEEE Access 8 (2020), 82653–82661.
[23]
Yan Zhang, PengFei Liu, and JingTao Yao. 2019. Three-way email spam filtering with game-theoretic rough sets. In 2019 International conference on computing, networking and communications (ICNC). IEEE, 552–556.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing
August 2023
783 pages
ISBN:9798400700224
DOI:10.1145/3607947
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bidirectional LSTM
  2. CNN
  3. Dataset
  4. GRU
  5. Gaussian Naive Bayes
  6. KNN
  7. LSTM
  8. SVM
  9. Word-Embeddings

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IC3 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 45
    Total Downloads
  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media