Skip to main content

Spam Email Categorization with NLP and Using Federated Deep Learning

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13726))

Included in the following conference series:

Abstract

Emails are the most popular and efficient communication method that makes them vulnerable to misuse. Federated learning (FL) provides a decentralized machine learning (ML) model, where a central server coordinates clients that collaboratively train a shared ML model. This paper proposes Federated Phishing Filtering (FPF) technique based on federated learning, natural language processing, and deep learning. FL for intelligent algorithms fuses trained models of ML algorithms from multiple sites for collective learning. This approach improves ML performance by utilizing large collective training data sets across the corporate client base, resulting in higher phishing email detection accuracy. FPF techniques preserve email privacy using local feature extraction on client email servers. Thus, the contents of emails do not need to be transmitted across the network or stored on third-party servers. We have applied FL and Natural Language Processing (NLP) for email phishing detection. This technique provides four training modes that perform FL without sharing email content. Our research categorizes emails as benign, spam, and phishing. Empirical evaluations with publicly available datasets show that accuracy is improved by the use of our Federated Deep Learning model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lastdrager, E.: Achieving a consensual definition of phishing based on a systematic review of the literature, Crime Science (2014)

    Google Scholar 

  2. Drake, C.E., Oliver, J.J., Koontz, E.J.: Anatomy of a phishing email. In: CEAS (2004)

    Google Scholar 

  3. Kairouz, P., et al.: Advances and open problems in federated learning, arXiv preprint arXiv:1912.04977 (2019)

  4. Chirita, P.-A., Diederich, J., Nejdl, W.: Mailrank: using ranking for spam detection. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 373–380 (2005)

    Google Scholar 

  5. Tseng, C.-Y., Chen, M.-S.: Incremental SVM model for spam detection on dynamic email social networks. In: 2009 International Conference on Computational Science and Engineering, vol. 4, pp. 128–135. IEEE (2009)

    Google Scholar 

  6. Sasaki, M., Shinnou, H.: Spam detection using text clustering. In: 2005 International Conference on Cyberworlds (CW 2005), pp. 4-pp. IEEE (2005)

    Google Scholar 

  7. Damiani, E., di Vimercati, S.D.C., Paraboschi, S., Samarati, P.: An open digest-based technique for spam detection. ISCA PDCS 2004, 559–564 (2004)

    Google Scholar 

  8. Mohammad, R.M., Thabtah, F., McCluskey, L.: Phishing websites features. University of Huddersfield, School of Computing and Engineering (2015)

    Google Scholar 

  9. Qaroush, A., Khater, I.M., Washaha, M.: Identifying spam e-mail based-on statistical header features and sender behavior. In: Proceedings of the CUBE International Information Technology Conference, pp. 771–778 (2012)

    Google Scholar 

  10. Abhila, B., Koushika, M., Joseph, M.N., Dhanalakshmi, R.: Spam detection system using supervised ML. In: 2021 International Conference - ICSCAN, pp. 1–5 (2021)

    Google Scholar 

  11. Verma, R., Shashidhar, N.K., Hossain, N.: Automatic phishing email detection based on natural language processing techniques, Mar. 5 2015, uS Patent App. 14/015,524

    Google Scholar 

  12. Buber, E., Diri, B., Sahingoz, O.K.: Detecting phishing attacks from url by using NLP techniques. In: 2017 International Conference on Computer Science and Engineering (UBMK), pp. 337–342. IEEE (2017)

    Google Scholar 

  13. Seymour, J., Tully, P.: Weaponizing data science for social engineering: automated e2e spear phishing on twitter. Black Hat USA 37, 1–39 (2016)

    Google Scholar 

  14. Mehta, S., et al.: Concept drift in streaming data classification: algorithms, platforms and issues. Procedia Comput. Sci. 122, 804–811 (2017)

    Article  Google Scholar 

  15. Gepperth, A., Hammer, B.: Incremental learning algorithms and applications. In: European Symposium on Artificial Neural Networks (ESANN) (2016)

    Google Scholar 

  16. Zliobaite, I.: Learning under concept drift: an overview (2010). arXiv preprint arXiv:1010.4784

  17. Trstenjak, B., Mikac, S., Donko, D.: Knn with tf-idf based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)

    Article  Google Scholar 

  18. Gopalakrishnan, R., Venkateswarlu, A.: Machine Learning for Mobile: Practical guide to building intelligent mobile applications powered by machine learning (2018)

    Google Scholar 

  19. Heaton, J., Polson, N.G., Witte, J.H.: Deep learning in finance, arXiv preprint arXiv:1602.06561 (2016)

  20. Roy, A., et al.: Systems and information engineering design symposium (SIEDS). IEEE 2018, 129–134 (2018)

    Google Scholar 

  21. Hartmann, F.: Federated learning, l´ınea (2018). https://orian.github.io/federated-learning/. Ultimo acceso. 15 Oct 2019

Download references

Acknowledgment

This research is an Industry Co-Funded Project, sponsored by Oceania Cyber Security and was conducted in the Internet Commerce Security Lab (ICSL).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ikram Ul Haq .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ul Haq, I., Black, P., Gondal, I., Kamruzzaman, J., Watters, P., Kayes, A.S.M. (2022). Spam Email Categorization with NLP and Using Federated Deep Learning. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13726. Springer, Cham. https://doi.org/10.1007/978-3-031-22137-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22137-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22136-1

  • Online ISBN: 978-3-031-22137-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics