research-article

Free Access

Towards Improving Phishing Detection System Using Human in the Loop Deep Learning Model

Authors:
Sultan Asiri

The University of Alabama, Tuscaloosa, Alabama, USA

The University of Alabama, Tuscaloosa, Alabama, USA

0000-0002-7405-7646
View Profile

,
Yang Xiao

The University of Alabama, Tuscaloosa, Alabama, USA

The University of Alabama, Tuscaloosa, Alabama, USA

0000-0001-8549-6794
View Profile

,
Saleh Alzahrani

The University of Alabama, Tuscaloosa, Alabama, USA

The University of Alabama, Tuscaloosa, Alabama, USA

0000-0001-8380-2487
View Profile

ACM SE '24: Proceedings of the 2024 ACM Southeast ConferenceApril 2024Pages 77–85https://doi.org/10.1145/3603287.3651193

Published:27 April 2024Publication History

ACM SE '24: Proceedings of the 2024 ACM Southeast Conference

Pages 77–85

ABSTRACT

Phishing attacks are cyber attacks that deceive victims into revealing sensitive information or downloading malware. They serve as a gateway to various malware attacks, including ransomware attacks. These attacks cause millions of dollars in losses for individuals and organizations annually. The frequency of phishing attacks continues to rise, with attackers constantly developing new techniques to bypass detection systems. One example is hidden malicious links within seemingly legitimate web pages, making them difficult for humans to detect, such as browser-in-the-browser attacks (BiTB). Therefore, relying solely on fixed detection systems can make one vulnerable to phishing attacks. Therefore, the critical need for a system that can continuously improve over time arises. This paper proposes enhancing a detection system by incorporating human feedback. To achieve this, we have designed a human-in-the-loop deep learning active system that uses human feedback to enhance the model's performance. We use PhishTransformer as our initial model. We then gathered new data for testing and accessed it through our browser extension. Subsequently, we collect new data for each version of the model. The initial model is retrained three times with the new data, saving the model after each iteration. We then retest the model using the test data and train the next version. The evaluation of each model version is based on the following metrics: accuracy, loss, precision, recall, and F1 score. Our model shows an improvement of around 5% of all metrics from the base model into the Version 3 model.

References

Amazon. 2020. Alexa Dataset. https://www.alexa.com/topsitesGoogle Scholar
Sultan Asiri, Yang Xiao, Saleh Alzahrani, Shuhui Li, and Tieshan Li. 2023. A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks. IEEE Access 11 (2023), 6421--6443.Google ScholarCross Ref
Sultan Asiri, Yang Xiao, and Tieshan Li. 2023. PhishTransformer: A Novel Approach to Detect Phishing Attacks Using URL Collection and Transformer. Electronics 13, 1 (2023), 30.Google ScholarCross Ref
Marco Balduzzi, Manuel Egele, Engin Kirda, Davide Balzarotti, and Christopher Kruegel. 2010. A Solution for the Automated Detection of Clickjacking Attacks. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (Beijing, China) (ASIACCS '10). Association for Computing Machinery, New York, NY, USA, 135--144. https://doi.org/10.1145/1755688.1755706Google ScholarDigital Library
CyberTalk. 2022. 14 Phishing Red Flags to Watch for in 2022. https://www.cybertalk.org/2022/02/28/14-phishing-red-flags-to-watch-for-in-2022/Google Scholar
Phishtank Developer. 2022. Phishtank Dataset. https://phishtank.org/developerinfo.phplGoogle Scholar
Docker Container. 2022. Accelerated, Containerized Application Development. https://www.docker.com/Google Scholar
Andrea Draghetti. 2022. Phishing Army: The Blocklist to Filter Phishing! https://www.phishing.army/Google Scholar
Federal Bureau of Investigation. 2021. FBI Releases the Internet Crime Complaint Center 2020 Internet Crime Report Including COVID-19 Scam Statistics. https://www.fbi.gov/news/press-releases/fbi-releases-the-internet-crime-complaint-center-2020-internet-crime-report-including-covid-19-scam-statisticsGoogle Scholar
Jian Feng, Lianyang Zou, Ou Ye, and Jingzhou Han. 2020. Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning. IEEE Access 8 (2020), 221214--221224.Google ScholarCross Ref
Sid Ghodke. 2018. Alexa Top 1 Million Sites. https://www.kaggle.com/datasets/cheedcheed/top1mGoogle Scholar
Shantanu Godbole, Abhay Harpale, Sunita Sarawagi, and Soumen Chakrabarti. 2004. Document Classification Through Interactive Supervision of Document and Term Labels. In Knowledge Discovery in Databases: PKDD 2004, Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 185--196.Google Scholar
Google Chrome Developers. 2022. Chrome Extensions. https://developer.chrome.com/docs/extensionsGoogle Scholar
Google gVisor. 2022. What is gVisor? https://gvisor.dev/docs/Google Scholar
Ankit Kumar Jain and Brij B Gupta. 2018. Towards Detection of Phishing Websites on Client-side Using Machine Learning Based Approach. Telecommunication Systems 68 (2018), 687--700.Google ScholarDigital Library
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing Text Classification Models. arXiv:1612.03651 [cs.CL]Google Scholar
Twin Karmakharm, Nikolaos Aletras, and Kalina Bontcheva. 2019. Journalist-in-the-loop: Continuous Learning as a Service for Rumour Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations. Hong Kong, China, 115--120.Google Scholar
Hung Le, Quang Pham, Doyen Sahoo, and Steven CH Hoi. 2018. URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection. arXiv preprint arXiv:1802.03162 (2018).Google Scholar
Jiwei Li, Alexander H Miller, Sumit Chopra, Marc'Aurelio Ranzato, and Jason Weston. 2016. Dialogue Learning with Human-in-the-loop. arXiv preprint arXiv:1611.09823 (2016).Google Scholar
Jian Mao, Wenqian Tian, Pei Li, Tao Wei, and Zhenkai Liang. 2017. Phishingalarm: Robust and Efficient Phishing Detection via Page Component Similarity. IEEE Access 5 (2017), 17020--17030.Google ScholarCross Ref
mrd0x Developers. 2021. Browser In The Browser (BITB) Attack. https://mrd0x.com/browser-in-the-browser-phishing-attack/Google Scholar
OpenPhish Developer. 2020. OpenPhish Dataset. https://openphish.comGoogle Scholar
Koceilah Rekouche. 2011. Early Phishing. arXiv:1106.4692Google Scholar
Smita Sindhu, Sunil Patil Parameshwar, Arya Sreevalsan, Faiz Rahman, and Ms Saritha AN. 2020. Phishing Detection Using Random Forest, SVM and Neural Network with Backpropagation. In 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE). IEEE, Bengaluru, India, 391--394.Google Scholar
Farid Tajaddodianfar, Jack W Stokes, and Arun Gururajan. 2020. Texception: A Character/word-level Deep Learning Model for Phishing URL Detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Barcelona, Spain, 2857--2861.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., Long Beach, CA, USA.Google Scholar
Gavin Wright and Madelyn Bacon. 2021. What is a Watering Hole Attack? https://www.techtarget.com/searchsecurity/definition/watering-hole-attackGoogle Scholar
Jianting Yuan, Guanxin Chen, Shengwei Tian, and Xinjun Pei. 2021. Malicious URL Detection Based on a Parallel Neural Joint Model. IEEE Access 9 (2021), 9464--9472.Google ScholarCross Ref
Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning Language Models from Human Preferences. arXiv preprint arXiv:1909.08593 (2019).Google Scholar

Index Terms

Towards Improving Phishing Detection System Using Human in the Loop Deep Learning Model
1. Computing methodologies
  1. Machine learning
2. Security and privacy

Recommendations

Enterprise Credential Spear-phishing attack detection
Highlights
- Detecting spear-phishing email attacks based on sender domains.
- Figuring out ...
Abstract
The latest report by Kaspersky on email Spam and targeted Phishing attacks, by percentage, highlights the need of an urgent solution. Attachment-driven Spear-phishing struggles to succeed against many email providers’ malware-...
Graphical abstract

Display Omitted
Read More
An overview of phishing attacks and their detection techniques

With rapid spread of the internet and cyber space, it has gained numerous applications and has been used as a powerful tool for social collaborations, communications and trades. The internet has superior performance to the traditional ways, as well as ...
Read More
A Framework to Protect Against Phishing Attacks
ICEMIS'20: Proceedings of the 6th International Conference on Engineering & MIS 2020

Phishing is a social engineering attack which aims to manipulate people and encourage them to expose their confidential information. The most common methods and techniques used for phishing are emails, chats or websites. Furthermore, there are various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM SE '24: Proceedings of the 2024 ACM Southeast Conference
April 2024
337 pages
ISBN:9798400702372
DOI:10.1145/3603287
Organizing Chair:
Dan Lo,
Program Chair:
Eric Gamess
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 April 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Active Learning
Browsers in the Browser (BiTB)
Deep Learning
Detection Systems
Phishing Attacks
Real-time
Tiny Uniform Resource Locators
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ACM SE '24 Paper Acceptance Rate44of137submissions,32%Overall Acceptance Rate178of377submissions,47%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 19
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Improving Phishing Detection System Using Human in the Loop Deep Learning Model

ACM SE '24: Proceedings of the 2024 ACM Southeast Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Enterprise Credential Spear-phishing attack detection

An overview of phishing attacks and their detection techniques

A Framework to Protect Against Phishing Attacks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards Improving Phishing Detection System Using Human in the Loop Deep Learning Model

ACM SE '24: Proceedings of the 2024 ACM Southeast Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Enterprise Credential Spear-phishing attack detection

An overview of phishing attacks and their detection techniques

A Framework to Protect Against Phishing Attacks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media