research-article

Semantic Analysis and Classification of Emails through Informative Selection of Features and Ensemble AI Model

Authors:

Shivangi Sachan,

Khushbu Doulani,

Mainak AdhikariAuthors Info & Claims

IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing

Pages 181 - 187

https://doi.org/10.1145/3607947.3607979

Published: 28 September 2023 Publication History

Abstract

The emergence of novel types of communication, such as email, has been brought on by the development of the internet, which radically concentrated the way in that individuals communicate socially and with one another. It is now establishing itself as a crucial aspect of the communication network which has been adopted by a variety of commercial enterprises such as retail outlets. So in this research paper, we have built a unique spam-detection methodology based on email-body sentiment analysis. The proposed hybrid model is put into practice and preprocessing the data, extracting the properties, and categorizing data are all steps in the process. To examine the emotive and sequential aspects of texts, we use word embedding and a bi-directional LSTM network. this model frequently shortens the training period, then utilizes the Convolution Layer to extract text features at a higher level for the BiLSTM network. Our model performs better than previous versions, with an accuracy rate of 97–98%. In addition, we show that our model beats not just some well-known machine learning classifiers but also cutting-edge methods for identifying spam communications, demonstrating its superiority on its own. Suggested Ensemble model’s results are examined in terms of recall, accuracy, and precision

References

[1]

Rayan Salah Hag Ali and Neamat El Gayar. 2019. Sentiment analysis using unlabeled email data. In 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). IEEE, 328–333.

[2]

Ali Shafigh Aski and Navid Khalilzadeh Sourati. 2016. Proposed efficient algorithm to filter spam using machine learning techniques. Pacific Science Review A: Natural Science and Engineering 18, 2 (2016), 145–149.

[3]

Huwaida T Elshoush and Esraa A Dinar. 2019. Using adaboost and stochastic gradient descent (sgd) algorithms with R and orange software for filtering e-mail spam. In 2019 11th Computer Science and Electronic Engineering (CEEC). IEEE, 41–46.

[4]

Weimiao Feng, Jianguo Sun, Liguo Zhang, Cuiling Cao, and Qing Yang. 2016. A support vector machine based naive Bayes algorithm for spam filtering. In 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC). IEEE, 1–8.

[5]

Pranjul Garg and Nancy Girdhar. 2021. A Systematic Review on Spam Filtering Techniques based on Natural Language Processing Framework. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 30–35.

[6]

Adam Kavon Ghazi-Tehrani and Henry N Pontell. 2021. Phishing evolves: Analyzing the enduring cybercrime. Victims & Offenders 16, 3 (2021), 316–342.

[7]

Radicati Group 2015. Email Statistics Report 2015–2019. Radicati Group. Accessed August 13 (2015), 2019.

[8]

Maryam Hina, Mohsin Ali, and Javed. 2021. Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning. IEEE Access 9 (2021), 98398–98411.

[9]

Maryam Hina, Mohsin Ali, Abdul Rehman Javed, Fahad Ghabban, Liaqat Ali Khan, and Zunera Jalil. 2021. Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning. IEEE Access 9 (2021), 98398–98411.

[10]

Weicong Kong, Zhao Yang Dong, Youwei Jia, David J Hill, Yan Xu, and Yuan Zhang. 2017. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE transactions on smart grid 10, 1 (2017), 841–851.

[11]

T Kumaresan and C Palanisamy. 2017. E-mail spam classification using S-cuckoo search and support vector machine. International Journal of Bio-Inspired Computation 9, 3 (2017), 142–156.

Digital Library

[12]

Nuha H Marza, Mehdi E Manaa, and Hussein A Lafta. 2021. Classification of spam emails using deep learning. In 2021 1st Babylon International Conference on Information Technology and Science (BICITS). IEEE, 63–68.

[13]

Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. In 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, 234–239.

[14]

Sarwat Nizamani, Nasrullah Memon, Mathies Glasdam, and Dong Duong Nguyen. 2014. Detection of fraudulent emails by employing advanced feature abundance. Egyptian Informatics Journal 15, 3 (2014), 169–174.

[15]

V Priya, I Sumaiya Thaseen, Thippa Reddy Gadekallu, Mohamed K Aboudaif, and Emad Abouel Nasr. 2021. Robust attack detection approach for IIoT using ensemble classifier. arXiv preprint arXiv:2102.01515 (2021).

[16]

Justinas Rastenis, Simona Ramanauskaitė, Justinas Janulevičius, Antanas Čenys, Asta Slotkienė, and Kęstutis Pakrijauskas. 2020. E-mail-based phishing attack taxonomy. Applied Sciences 10, 7 (2020), 2363.

[17]

Karthika D Renuka and P Visalakshi. 2014. Latent semantic indexing based SVM model for email spam classification. (2014).

[18]

Shuvendu Roy, Sk Imran Hossain, MAH Akhand, and N Siddique. 2018. Sequence modeling for intelligent typing assistant with Bangla and English keyboard. In 2018 International Conference on Innovation in Engineering and Technology (ICIET). IEEE, 1–6.

[19]

Tara N Sainath, Oriol Vinyals, Andrew Senior, and Haşim Sak. 2015. Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). Ieee, 4580–4584.

[20]

Anuj Kumar Singh, Shashi Bhushan, and Sonakshi Vij. 2019. Filtering spam messages and mails using fuzzy C means algorithm. In 2019 4th International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU). IEEE, 1–5.

[21]

Kristina Toutanova and Colin Cherry. 2009. A global model for joint lemmatization and part-of-speech prediction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 486–494.

[22]

Tian Xia. 2020. A constant time complexity spam detection algorithm for boosting throughput on rule-based filtering systems. IEEE Access 8 (2020), 82653–82661.

[23]

Yan Zhang, PengFei Liu, and JingTao Yao. 2019. Three-way email spam filtering with game-theoretic rough sets. In 2019 International conference on computing, networking and communications (ICNC). IEEE, 552–556.

Cited By

Recommendations

New under-sampling methods to address the problem of unbalanced sentiment classification: application on Arabic datasets

This paper presents the study we have carried out to address the problem of unbalanced datasets in supervised sentiment classification in an Arabic context. We propose three different methods to under-sample the majority class documents. Our goal is to ...
Classification Technique and its Combination with Clustering and Association Rule Mining in Educational Data Mining — A survey
Abstract
Educational data mining (EDM) is the application of data mining in the educational field. EDM is used to classify, analyze, and predict the students’ academic performance, and students’ dropout rate, as well as instructors’performance in order to ...
Development of predictive model of diabetic using supervised machine learning classification algorithm of ensemble voting

Predicting the health status of patients suffering from diabetic is an important task in the health sector because the medical history of diabetic evidenced that it is a slow killer. If data collection is enough, suitable, and noise-free, such ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing

August 2023

783 pages

ISBN:9798400700224

DOI:10.1145/3607947

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IC3 2023

IC3 2023: 2023 Fifteenth International Conference on Contemporary Computing

August 3 - 5, 2023

Noida, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
45
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten