Spam Email Categorization with NLP and Using Federated Deep Learning

Ul Haq, Ikram; Black, Paul; Gondal, Iqbal; Kamruzzaman, Joarder; Watters, Paul; Kayes, A. S. M.

doi:10.1007/978-3-031-22137-8_2

Ikram Ul Haq¹³,
Paul Black¹³,
Iqbal Gondal¹³,
Joarder Kamruzzaman¹³,
Paul Watters¹⁴ &
…
A. S. M. Kayes¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13726))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

741 Accesses
2 Citations

Abstract

Emails are the most popular and efficient communication method that makes them vulnerable to misuse. Federated learning (FL) provides a decentralized machine learning (ML) model, where a central server coordinates clients that collaboratively train a shared ML model. This paper proposes Federated Phishing Filtering (FPF) technique based on federated learning, natural language processing, and deep learning. FL for intelligent algorithms fuses trained models of ML algorithms from multiple sites for collective learning. This approach improves ML performance by utilizing large collective training data sets across the corporate client base, resulting in higher phishing email detection accuracy. FPF techniques preserve email privacy using local feature extraction on client email servers. Thus, the contents of emails do not need to be transmitted across the network or stored on third-party servers. We have applied FL and Natural Language Processing (NLP) for email phishing detection. This technique provides four training modes that perform FL without sharing email content. Our research categorizes emails as benign, spam, and phishing. Empirical evaluations with publicly available datasets show that accuracy is improved by the use of our Federated Deep Learning model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lastdrager, E.: Achieving a consensual definition of phishing based on a systematic review of the literature, Crime Science (2014)
Google Scholar
Drake, C.E., Oliver, J.J., Koontz, E.J.: Anatomy of a phishing email. In: CEAS (2004)
Google Scholar
Kairouz, P., et al.: Advances and open problems in federated learning, arXiv preprint arXiv:1912.04977 (2019)
Chirita, P.-A., Diederich, J., Nejdl, W.: Mailrank: using ranking for spam detection. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 373–380 (2005)
Google Scholar
Tseng, C.-Y., Chen, M.-S.: Incremental SVM model for spam detection on dynamic email social networks. In: 2009 International Conference on Computational Science and Engineering, vol. 4, pp. 128–135. IEEE (2009)
Google Scholar
Sasaki, M., Shinnou, H.: Spam detection using text clustering. In: 2005 International Conference on Cyberworlds (CW 2005), pp. 4-pp. IEEE (2005)
Google Scholar
Damiani, E., di Vimercati, S.D.C., Paraboschi, S., Samarati, P.: An open digest-based technique for spam detection. ISCA PDCS 2004, 559–564 (2004)
Google Scholar
Mohammad, R.M., Thabtah, F., McCluskey, L.: Phishing websites features. University of Huddersfield, School of Computing and Engineering (2015)
Google Scholar
Qaroush, A., Khater, I.M., Washaha, M.: Identifying spam e-mail based-on statistical header features and sender behavior. In: Proceedings of the CUBE International Information Technology Conference, pp. 771–778 (2012)
Google Scholar
Abhila, B., Koushika, M., Joseph, M.N., Dhanalakshmi, R.: Spam detection system using supervised ML. In: 2021 International Conference - ICSCAN, pp. 1–5 (2021)
Google Scholar
Verma, R., Shashidhar, N.K., Hossain, N.: Automatic phishing email detection based on natural language processing techniques, Mar. 5 2015, uS Patent App. 14/015,524
Google Scholar
Buber, E., Diri, B., Sahingoz, O.K.: Detecting phishing attacks from url by using NLP techniques. In: 2017 International Conference on Computer Science and Engineering (UBMK), pp. 337–342. IEEE (2017)
Google Scholar
Seymour, J., Tully, P.: Weaponizing data science for social engineering: automated e2e spear phishing on twitter. Black Hat USA 37, 1–39 (2016)
Google Scholar
Mehta, S., et al.: Concept drift in streaming data classification: algorithms, platforms and issues. Procedia Comput. Sci. 122, 804–811 (2017)
Article Google Scholar
Gepperth, A., Hammer, B.: Incremental learning algorithms and applications. In: European Symposium on Artificial Neural Networks (ESANN) (2016)
Google Scholar
Zliobaite, I.: Learning under concept drift: an overview (2010). arXiv preprint arXiv:1010.4784
Trstenjak, B., Mikac, S., Donko, D.: Knn with tf-idf based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)
Article Google Scholar
Gopalakrishnan, R., Venkateswarlu, A.: Machine Learning for Mobile: Practical guide to building intelligent mobile applications powered by machine learning (2018)
Google Scholar
Heaton, J., Polson, N.G., Witte, J.H.: Deep learning in finance, arXiv preprint arXiv:1602.06561 (2016)
Roy, A., et al.: Systems and information engineering design symposium (SIEDS). IEEE 2018, 129–134 (2018)
Google Scholar
Hartmann, F.: Federated learning, l´ınea (2018). https://orian.github.io/federated-learning/. Ultimo acceso. 15 Oct 2019

Download references

Acknowledgment

This research is an Industry Co-Funded Project, sponsored by Oceania Cyber Security and was conducted in the Internet Commerce Security Lab (ICSL).

Author information

Authors and Affiliations

School of Science, Engineering and Information Technology, ICSL, Federation University, Melbourne, Australia
Ikram Ul Haq, Paul Black, Iqbal Gondal & Joarder Kamruzzaman
Department of Computer Science and Information Technology, La Trobe University, Ballarat, Australia
Paul Watters & A. S. M. Kayes

Authors

Ikram Ul Haq
View author publications
You can also search for this author in PubMed Google Scholar
Paul Black
View author publications
You can also search for this author in PubMed Google Scholar
Iqbal Gondal
View author publications
You can also search for this author in PubMed Google Scholar
Joarder Kamruzzaman
View author publications
You can also search for this author in PubMed Google Scholar
Paul Watters
View author publications
You can also search for this author in PubMed Google Scholar
A. S. M. Kayes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ikram Ul Haq .

Editor information

Editors and Affiliations

The University of Adelaide, Adelaide, SA, Australia
Weitong Chen
The University of New South Wales, Sydney, NSW, Australia
Lina Yao
Macquarie University, Sydney, NSW, Australia
Taotao Cai
Griffith University, Brisbane, QLD, Australia
Shirui Pan
Microsoft, Beijing, China
Tao Shen
The University of Queensland, Brisbane, QLD, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ul Haq, I., Black, P., Gondal, I., Kamruzzaman, J., Watters, P., Kayes, A.S.M. (2022). Spam Email Categorization with NLP and Using Federated Deep Learning. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13726. Springer, Cham. https://doi.org/10.1007/978-3-031-22137-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-22137-8_2
Published: 24 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22136-1
Online ISBN: 978-3-031-22137-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics