Abstract
Emails are the most popular and efficient communication method that makes them vulnerable to misuse. Federated learning (FL) provides a decentralized machine learning (ML) model, where a central server coordinates clients that collaboratively train a shared ML model. This paper proposes Federated Phishing Filtering (FPF) technique based on federated learning, natural language processing, and deep learning. FL for intelligent algorithms fuses trained models of ML algorithms from multiple sites for collective learning. This approach improves ML performance by utilizing large collective training data sets across the corporate client base, resulting in higher phishing email detection accuracy. FPF techniques preserve email privacy using local feature extraction on client email servers. Thus, the contents of emails do not need to be transmitted across the network or stored on third-party servers. We have applied FL and Natural Language Processing (NLP) for email phishing detection. This technique provides four training modes that perform FL without sharing email content. Our research categorizes emails as benign, spam, and phishing. Empirical evaluations with publicly available datasets show that accuracy is improved by the use of our Federated Deep Learning model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lastdrager, E.: Achieving a consensual definition of phishing based on a systematic review of the literature, Crime Science (2014)
Drake, C.E., Oliver, J.J., Koontz, E.J.: Anatomy of a phishing email. In: CEAS (2004)
Kairouz, P., et al.: Advances and open problems in federated learning, arXiv preprint arXiv:1912.04977 (2019)
Chirita, P.-A., Diederich, J., Nejdl, W.: Mailrank: using ranking for spam detection. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 373–380 (2005)
Tseng, C.-Y., Chen, M.-S.: Incremental SVM model for spam detection on dynamic email social networks. In: 2009 International Conference on Computational Science and Engineering, vol. 4, pp. 128–135. IEEE (2009)
Sasaki, M., Shinnou, H.: Spam detection using text clustering. In: 2005 International Conference on Cyberworlds (CW 2005), pp. 4-pp. IEEE (2005)
Damiani, E., di Vimercati, S.D.C., Paraboschi, S., Samarati, P.: An open digest-based technique for spam detection. ISCA PDCS 2004, 559–564 (2004)
Mohammad, R.M., Thabtah, F., McCluskey, L.: Phishing websites features. University of Huddersfield, School of Computing and Engineering (2015)
Qaroush, A., Khater, I.M., Washaha, M.: Identifying spam e-mail based-on statistical header features and sender behavior. In: Proceedings of the CUBE International Information Technology Conference, pp. 771–778 (2012)
Abhila, B., Koushika, M., Joseph, M.N., Dhanalakshmi, R.: Spam detection system using supervised ML. In: 2021 International Conference - ICSCAN, pp. 1–5 (2021)
Verma, R., Shashidhar, N.K., Hossain, N.: Automatic phishing email detection based on natural language processing techniques, Mar. 5 2015, uS Patent App. 14/015,524
Buber, E., Diri, B., Sahingoz, O.K.: Detecting phishing attacks from url by using NLP techniques. In: 2017 International Conference on Computer Science and Engineering (UBMK), pp. 337–342. IEEE (2017)
Seymour, J., Tully, P.: Weaponizing data science for social engineering: automated e2e spear phishing on twitter. Black Hat USA 37, 1–39 (2016)
Mehta, S., et al.: Concept drift in streaming data classification: algorithms, platforms and issues. Procedia Comput. Sci. 122, 804–811 (2017)
Gepperth, A., Hammer, B.: Incremental learning algorithms and applications. In: European Symposium on Artificial Neural Networks (ESANN) (2016)
Zliobaite, I.: Learning under concept drift: an overview (2010). arXiv preprint arXiv:1010.4784
Trstenjak, B., Mikac, S., Donko, D.: Knn with tf-idf based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)
Gopalakrishnan, R., Venkateswarlu, A.: Machine Learning for Mobile: Practical guide to building intelligent mobile applications powered by machine learning (2018)
Heaton, J., Polson, N.G., Witte, J.H.: Deep learning in finance, arXiv preprint arXiv:1602.06561 (2016)
Roy, A., et al.: Systems and information engineering design symposium (SIEDS). IEEE 2018, 129–134 (2018)
Hartmann, F.: Federated learning, l´ınea (2018). https://orian.github.io/federated-learning/. Ultimo acceso. 15 Oct 2019
Acknowledgment
This research is an Industry Co-Funded Project, sponsored by Oceania Cyber Security and was conducted in the Internet Commerce Security Lab (ICSL).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ul Haq, I., Black, P., Gondal, I., Kamruzzaman, J., Watters, P., Kayes, A.S.M. (2022). Spam Email Categorization with NLP and Using Federated Deep Learning. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13726. Springer, Cham. https://doi.org/10.1007/978-3-031-22137-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-22137-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22136-1
Online ISBN: 978-3-031-22137-8
eBook Packages: Computer ScienceComputer Science (R0)