ABSTRACT
Phishing attacks are cyber attacks that deceive victims into revealing sensitive information or downloading malware. They serve as a gateway to various malware attacks, including ransomware attacks. These attacks cause millions of dollars in losses for individuals and organizations annually. The frequency of phishing attacks continues to rise, with attackers constantly developing new techniques to bypass detection systems. One example is hidden malicious links within seemingly legitimate web pages, making them difficult for humans to detect, such as browser-in-the-browser attacks (BiTB). Therefore, relying solely on fixed detection systems can make one vulnerable to phishing attacks. Therefore, the critical need for a system that can continuously improve over time arises. This paper proposes enhancing a detection system by incorporating human feedback. To achieve this, we have designed a human-in-the-loop deep learning active system that uses human feedback to enhance the model's performance. We use PhishTransformer as our initial model. We then gathered new data for testing and accessed it through our browser extension. Subsequently, we collect new data for each version of the model. The initial model is retrained three times with the new data, saving the model after each iteration. We then retest the model using the test data and train the next version. The evaluation of each model version is based on the following metrics: accuracy, loss, precision, recall, and F1 score. Our model shows an improvement of around 5% of all metrics from the base model into the Version 3 model.
- Amazon. 2020. Alexa Dataset. https://www.alexa.com/topsitesGoogle Scholar
- Sultan Asiri, Yang Xiao, Saleh Alzahrani, Shuhui Li, and Tieshan Li. 2023. A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks. IEEE Access 11 (2023), 6421--6443.Google ScholarCross Ref
- Sultan Asiri, Yang Xiao, and Tieshan Li. 2023. PhishTransformer: A Novel Approach to Detect Phishing Attacks Using URL Collection and Transformer. Electronics 13, 1 (2023), 30.Google ScholarCross Ref
- Marco Balduzzi, Manuel Egele, Engin Kirda, Davide Balzarotti, and Christopher Kruegel. 2010. A Solution for the Automated Detection of Clickjacking Attacks. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (Beijing, China) (ASIACCS '10). Association for Computing Machinery, New York, NY, USA, 135--144. https://doi.org/10.1145/1755688.1755706Google ScholarDigital Library
- CyberTalk. 2022. 14 Phishing Red Flags to Watch for in 2022. https://www.cybertalk.org/2022/02/28/14-phishing-red-flags-to-watch-for-in-2022/Google Scholar
- Phishtank Developer. 2022. Phishtank Dataset. https://phishtank.org/developerinfo.phplGoogle Scholar
- Docker Container. 2022. Accelerated, Containerized Application Development. https://www.docker.com/Google Scholar
- Andrea Draghetti. 2022. Phishing Army: The Blocklist to Filter Phishing! https://www.phishing.army/Google Scholar
- Federal Bureau of Investigation. 2021. FBI Releases the Internet Crime Complaint Center 2020 Internet Crime Report Including COVID-19 Scam Statistics. https://www.fbi.gov/news/press-releases/fbi-releases-the-internet-crime-complaint-center-2020-internet-crime-report-including-covid-19-scam-statisticsGoogle Scholar
- Jian Feng, Lianyang Zou, Ou Ye, and Jingzhou Han. 2020. Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning. IEEE Access 8 (2020), 221214--221224.Google ScholarCross Ref
- Sid Ghodke. 2018. Alexa Top 1 Million Sites. https://www.kaggle.com/datasets/cheedcheed/top1mGoogle Scholar
- Shantanu Godbole, Abhay Harpale, Sunita Sarawagi, and Soumen Chakrabarti. 2004. Document Classification Through Interactive Supervision of Document and Term Labels. In Knowledge Discovery in Databases: PKDD 2004, Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 185--196.Google Scholar
- Google Chrome Developers. 2022. Chrome Extensions. https://developer.chrome.com/docs/extensionsGoogle Scholar
- Google gVisor. 2022. What is gVisor? https://gvisor.dev/docs/Google Scholar
- Ankit Kumar Jain and Brij B Gupta. 2018. Towards Detection of Phishing Websites on Client-side Using Machine Learning Based Approach. Telecommunication Systems 68 (2018), 687--700.Google ScholarDigital Library
- Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing Text Classification Models. arXiv:1612.03651 [cs.CL]Google Scholar
- Twin Karmakharm, Nikolaos Aletras, and Kalina Bontcheva. 2019. Journalist-in-the-loop: Continuous Learning as a Service for Rumour Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations. Hong Kong, China, 115--120.Google Scholar
- Hung Le, Quang Pham, Doyen Sahoo, and Steven CH Hoi. 2018. URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection. arXiv preprint arXiv:1802.03162 (2018).Google Scholar
- Jiwei Li, Alexander H Miller, Sumit Chopra, Marc'Aurelio Ranzato, and Jason Weston. 2016. Dialogue Learning with Human-in-the-loop. arXiv preprint arXiv:1611.09823 (2016).Google Scholar
- Jian Mao, Wenqian Tian, Pei Li, Tao Wei, and Zhenkai Liang. 2017. Phishingalarm: Robust and Efficient Phishing Detection via Page Component Similarity. IEEE Access 5 (2017), 17020--17030.Google ScholarCross Ref
- mrd0x Developers. 2021. Browser In The Browser (BITB) Attack. https://mrd0x.com/browser-in-the-browser-phishing-attack/Google Scholar
- OpenPhish Developer. 2020. OpenPhish Dataset. https://openphish.comGoogle Scholar
- Koceilah Rekouche. 2011. Early Phishing. arXiv:1106.4692Google Scholar
- Smita Sindhu, Sunil Patil Parameshwar, Arya Sreevalsan, Faiz Rahman, and Ms Saritha AN. 2020. Phishing Detection Using Random Forest, SVM and Neural Network with Backpropagation. In 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE). IEEE, Bengaluru, India, 391--394.Google Scholar
- Farid Tajaddodianfar, Jack W Stokes, and Arun Gururajan. 2020. Texception: A Character/word-level Deep Learning Model for Phishing URL Detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Barcelona, Spain, 2857--2861.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., Long Beach, CA, USA.Google Scholar
- Gavin Wright and Madelyn Bacon. 2021. What is a Watering Hole Attack? https://www.techtarget.com/searchsecurity/definition/watering-hole-attackGoogle Scholar
- Jianting Yuan, Guanxin Chen, Shengwei Tian, and Xinjun Pei. 2021. Malicious URL Detection Based on a Parallel Neural Joint Model. IEEE Access 9 (2021), 9464--9472.Google ScholarCross Ref
- Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning Language Models from Human Preferences. arXiv preprint arXiv:1909.08593 (2019).Google Scholar
Index Terms
- Towards Improving Phishing Detection System Using Human in the Loop Deep Learning Model
Recommendations
Enterprise Credential Spear-phishing attack detection
Highlights- Detecting spear-phishing email attacks based on sender domains.
- Figuring out ...
AbstractThe latest report by Kaspersky on email Spam and targeted Phishing attacks, by percentage, highlights the need of an urgent solution. Attachment-driven Spear-phishing struggles to succeed against many email providers’ malware-...
Graphical abstractDisplay Omitted
An overview of phishing attacks and their detection techniques
With rapid spread of the internet and cyber space, it has gained numerous applications and has been used as a powerful tool for social collaborations, communications and trades. The internet has superior performance to the traditional ways, as well as ...
A Framework to Protect Against Phishing Attacks
ICEMIS'20: Proceedings of the 6th International Conference on Engineering & MIS 2020Phishing is a social engineering attack which aims to manipulate people and encourage them to expose their confidential information. The most common methods and techniques used for phishing are emails, chats or websites. Furthermore, there are various ...
Comments