skip to main content
10.1145/3603287.3651193acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article
Free Access

Towards Improving Phishing Detection System Using Human in the Loop Deep Learning Model

Published:27 April 2024Publication History

ABSTRACT

Phishing attacks are cyber attacks that deceive victims into revealing sensitive information or downloading malware. They serve as a gateway to various malware attacks, including ransomware attacks. These attacks cause millions of dollars in losses for individuals and organizations annually. The frequency of phishing attacks continues to rise, with attackers constantly developing new techniques to bypass detection systems. One example is hidden malicious links within seemingly legitimate web pages, making them difficult for humans to detect, such as browser-in-the-browser attacks (BiTB). Therefore, relying solely on fixed detection systems can make one vulnerable to phishing attacks. Therefore, the critical need for a system that can continuously improve over time arises. This paper proposes enhancing a detection system by incorporating human feedback. To achieve this, we have designed a human-in-the-loop deep learning active system that uses human feedback to enhance the model's performance. We use PhishTransformer as our initial model. We then gathered new data for testing and accessed it through our browser extension. Subsequently, we collect new data for each version of the model. The initial model is retrained three times with the new data, saving the model after each iteration. We then retest the model using the test data and train the next version. The evaluation of each model version is based on the following metrics: accuracy, loss, precision, recall, and F1 score. Our model shows an improvement of around 5% of all metrics from the base model into the Version 3 model.

References

  1. Amazon. 2020. Alexa Dataset. https://www.alexa.com/topsitesGoogle ScholarGoogle Scholar
  2. Sultan Asiri, Yang Xiao, Saleh Alzahrani, Shuhui Li, and Tieshan Li. 2023. A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks. IEEE Access 11 (2023), 6421--6443.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sultan Asiri, Yang Xiao, and Tieshan Li. 2023. PhishTransformer: A Novel Approach to Detect Phishing Attacks Using URL Collection and Transformer. Electronics 13, 1 (2023), 30.Google ScholarGoogle ScholarCross RefCross Ref
  4. Marco Balduzzi, Manuel Egele, Engin Kirda, Davide Balzarotti, and Christopher Kruegel. 2010. A Solution for the Automated Detection of Clickjacking Attacks. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (Beijing, China) (ASIACCS '10). Association for Computing Machinery, New York, NY, USA, 135--144. https://doi.org/10.1145/1755688.1755706Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CyberTalk. 2022. 14 Phishing Red Flags to Watch for in 2022. https://www.cybertalk.org/2022/02/28/14-phishing-red-flags-to-watch-for-in-2022/Google ScholarGoogle Scholar
  6. Phishtank Developer. 2022. Phishtank Dataset. https://phishtank.org/developerinfo.phplGoogle ScholarGoogle Scholar
  7. Docker Container. 2022. Accelerated, Containerized Application Development. https://www.docker.com/Google ScholarGoogle Scholar
  8. Andrea Draghetti. 2022. Phishing Army: The Blocklist to Filter Phishing! https://www.phishing.army/Google ScholarGoogle Scholar
  9. Federal Bureau of Investigation. 2021. FBI Releases the Internet Crime Complaint Center 2020 Internet Crime Report Including COVID-19 Scam Statistics. https://www.fbi.gov/news/press-releases/fbi-releases-the-internet-crime-complaint-center-2020-internet-crime-report-including-covid-19-scam-statisticsGoogle ScholarGoogle Scholar
  10. Jian Feng, Lianyang Zou, Ou Ye, and Jingzhou Han. 2020. Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning. IEEE Access 8 (2020), 221214--221224.Google ScholarGoogle ScholarCross RefCross Ref
  11. Sid Ghodke. 2018. Alexa Top 1 Million Sites. https://www.kaggle.com/datasets/cheedcheed/top1mGoogle ScholarGoogle Scholar
  12. Shantanu Godbole, Abhay Harpale, Sunita Sarawagi, and Soumen Chakrabarti. 2004. Document Classification Through Interactive Supervision of Document and Term Labels. In Knowledge Discovery in Databases: PKDD 2004, Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 185--196.Google ScholarGoogle Scholar
  13. Google Chrome Developers. 2022. Chrome Extensions. https://developer.chrome.com/docs/extensionsGoogle ScholarGoogle Scholar
  14. Google gVisor. 2022. What is gVisor? https://gvisor.dev/docs/Google ScholarGoogle Scholar
  15. Ankit Kumar Jain and Brij B Gupta. 2018. Towards Detection of Phishing Websites on Client-side Using Machine Learning Based Approach. Telecommunication Systems 68 (2018), 687--700.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing Text Classification Models. arXiv:1612.03651 [cs.CL]Google ScholarGoogle Scholar
  17. Twin Karmakharm, Nikolaos Aletras, and Kalina Bontcheva. 2019. Journalist-in-the-loop: Continuous Learning as a Service for Rumour Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations. Hong Kong, China, 115--120.Google ScholarGoogle Scholar
  18. Hung Le, Quang Pham, Doyen Sahoo, and Steven CH Hoi. 2018. URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection. arXiv preprint arXiv:1802.03162 (2018).Google ScholarGoogle Scholar
  19. Jiwei Li, Alexander H Miller, Sumit Chopra, Marc'Aurelio Ranzato, and Jason Weston. 2016. Dialogue Learning with Human-in-the-loop. arXiv preprint arXiv:1611.09823 (2016).Google ScholarGoogle Scholar
  20. Jian Mao, Wenqian Tian, Pei Li, Tao Wei, and Zhenkai Liang. 2017. Phishingalarm: Robust and Efficient Phishing Detection via Page Component Similarity. IEEE Access 5 (2017), 17020--17030.Google ScholarGoogle ScholarCross RefCross Ref
  21. mrd0x Developers. 2021. Browser In The Browser (BITB) Attack. https://mrd0x.com/browser-in-the-browser-phishing-attack/Google ScholarGoogle Scholar
  22. OpenPhish Developer. 2020. OpenPhish Dataset. https://openphish.comGoogle ScholarGoogle Scholar
  23. Koceilah Rekouche. 2011. Early Phishing. arXiv:1106.4692Google ScholarGoogle Scholar
  24. Smita Sindhu, Sunil Patil Parameshwar, Arya Sreevalsan, Faiz Rahman, and Ms Saritha AN. 2020. Phishing Detection Using Random Forest, SVM and Neural Network with Backpropagation. In 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE). IEEE, Bengaluru, India, 391--394.Google ScholarGoogle Scholar
  25. Farid Tajaddodianfar, Jack W Stokes, and Arun Gururajan. 2020. Texception: A Character/word-level Deep Learning Model for Phishing URL Detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Barcelona, Spain, 2857--2861.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., Long Beach, CA, USA.Google ScholarGoogle Scholar
  27. Gavin Wright and Madelyn Bacon. 2021. What is a Watering Hole Attack? https://www.techtarget.com/searchsecurity/definition/watering-hole-attackGoogle ScholarGoogle Scholar
  28. Jianting Yuan, Guanxin Chen, Shengwei Tian, and Xinjun Pei. 2021. Malicious URL Detection Based on a Parallel Neural Joint Model. IEEE Access 9 (2021), 9464--9472.Google ScholarGoogle ScholarCross RefCross Ref
  29. Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning Language Models from Human Preferences. arXiv preprint arXiv:1909.08593 (2019).Google ScholarGoogle Scholar

Index Terms

  1. Towards Improving Phishing Detection System Using Human in the Loop Deep Learning Model

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ACM SE '24: Proceedings of the 2024 ACM Southeast Conference
            April 2024
            337 pages
            ISBN:9798400702372
            DOI:10.1145/3603287

            Copyright © 2024 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 April 2024

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            ACM SE '24 Paper Acceptance Rate44of137submissions,32%Overall Acceptance Rate178of377submissions,47%
          • Article Metrics

            • Downloads (Last 12 months)19
            • Downloads (Last 6 weeks)19

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader