Phishing URL Detection with Prototypical Neural Network Disentangled by Triplet Sampling

Bu, Seok-Jun; Cho, Sung-Bae

doi:10.1007/978-3-031-42519-6_13

Seok-Jun Bu¹⁸ &
Sung-Bae Cho¹⁸

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 748))

Included in the following conference series:

232 Accesses

Abstract

Phishing attacks continue to pose a significant threat to internet security, with phishing URLs being among the most prevalent attacks. Detecting these URLs is challenging, as attackers constantly evolve their tactics. Few-shot learning has emerged as a promising approach for learning from limited data, making it ideal for the task of phishing URL detection. In this paper, we propose a prototypical network (DPN) disentangled by triplet sampling that learns disentangled URL prototypes to improve the accuracy of phishing detection with limited data. The key idea is to capture the underlying structure and characteristics of URLs, making it highly effective in detecting phishing URLs. This method involves sampling triplets of anchor, positive, and negative URLs to train the network, which encourages the embedding space to be more separable between phishing and benign URLs. To evaluate the proposed method, we have collected and assessed a real-world dataset consisting of one million URLs, and additionally utilized two benchmark URL datasets. Our method outperforms the state-of-the-art models, achieving accuracies of 98.0% in a 2-way 50-shot task and 98.32% in a 2-way 5000-shot task. Moreover, the experiments highlight the advantages of using a disentangled representation of URLs, where t-SNE visualizations reveal distinct and well-separated URL prototypes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Purwanto, R.W., Pal, A., Blair, A., Jha, S.: PhishSim: aiding phishing website detection with a feature-free tool. IEEE Trans. Inf. Forensics Secur. 17, 1497–1512 (2022)
Google Scholar
da Silva, C.M.R., Fernandes, B.J.T., Feitosa, E.L., Garcia, V.C.: Piracema. io: A rules-based tree model for phishing prediction. Expert Syst. Appl. 191, 116239 (2022)
Article Google Scholar
Huang, L., Jia, S., Balcetis, E., Zhu, Q.: Advert: an adaptive and data-driven attention enhancement mechanism for phishing prevention. IEEE Trans. Inf. Forensics Secur. 17, 2585–2597 (2022)
Google Scholar
Anand, A., Gorde, K., Moniz, J.R.A., Park, N., Chakraborty, T., Chu, B.-T.: Phishing URL detection with oversampling based on text generative adversarial networks. In: IEEE International Conference on Big Data, pp. 1168–1177. IEEE (2018)
Google Scholar
Shirazi, H., Muramudalige, S.R., Ray, I., Jayasumana, A.P., Wang, H.: Adversarial autoencoder data synthesis for enhancing machine learning-based phishing detection algorithms. IEEE Trans. Serv. Comput. 16, 2411–2422 (2023)
Google Scholar
Liu, C., et al.: Learning a few-shot embedding model with contrastive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 10, pp. 8635–8643
Google Scholar
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)
Google Scholar
Jiang, W., Huang, K., Geng, J., Deng, X.: Multi-scale metric learning for few-shot learning. IEEE Trans. Circuits Syst. Video Technol. 31(3), 1091–1102 (2020)
Google Scholar
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. in Neural Information Processing systems, vol. 30 (2017)
Google Scholar
Wang, P., Tang, Z., Wang, J.: A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling. Comput. Secur. 106, 102273 (2021)
Article Google Scholar
Chai, Y., Du, L., Qiu, J., Yin, L., Tian, Z.: Dynamic prototype network based on sample adaptation for few-shot malware detection. IEEE Trans. Knowl. Data Eng. (2022)
Google Scholar
Le, H., Pham, Q., Sahoo, D., Hoi, S.C.: URLNet: learning a URL representation with deep learning for malicious URL detection, arXiv preprint arXiv:1802.03162 (2018)
Tajaddodianfar, F., Stokes, J.W., Gururajan, A.: Texception: a character/word-level deep learning model for phishing URL detection. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2857–2861. IEEE (2020)
Google Scholar
Chou, E.J., Gururajan, A., Laine, K., Goel, N.K., Bertiger, A., Stokes, J.W.: Privacy-preserving phishing web page classification via fully homomorphic encryption. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2792–2796. IEEE (2020)
Google Scholar
Bu, S.-J., Cho, S.-B.: Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing URL detection. Electronics 10(12), 1492 (2021)
Article Google Scholar
Arachie, C., Huang, B.: Adversarial label learning. AAAI Conf. on Artificial Intelligence 33(01), 3183–3190 (2019)
Article MATH Google Scholar
Park, K.-W., Bu, S.-J., Cho, S.-B.: Evolutionary optimization of neuro-symbolic integration for phishing URL detection. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 88–100. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86271-8_8
Bu, S.-J., Cho, S.-B.: Integrating deep learning with first-order logic programmed constraints for zero-day phishing attack detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2685–2689. IEEE (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by the Yonsei Fellow Program funded by Lee Youn Jae, Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub), and Air Force Defense Research Sciences Program funded by Air Force Office of Scientific Research.

Author information

Authors and Affiliations

Department of Computer Science, Yonsei University, Seoul, 03722, Republic of Korea
Seok-Jun Bu & Sung-Bae Cho

Authors

Seok-Jun Bu
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Bae Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung-Bae Cho .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Deusto, Bilbao, Spain
Pablo García Bringas
School of Industrial, Computer and Aerospace Engineering, University of Leon, León, Spain
Hilde Pérez García
Department of Mechanical Engineering, University of La Rioja, Logroño, Spain
Francisco Javier Martínez de Pisón
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Francisco Martínez Álvarez
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Alicia Troncoso Lora
Applied Computational Intelligence, University of Burgos, Burgos, Burgos, Spain
Álvaro Herrero
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
José Luis Calvo Rolle
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
Héctor Quintián
Faculty of Science, University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bu, SJ., Cho, SB. (2023). Phishing URL Detection with Prototypical Neural Network Disentangled by Triplet Sampling. In: García Bringas, P., et al. International Joint Conference 16th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2023) 14th International Conference on EUropean Transnational Education (ICEUTE 2023). CISIS ICEUTE 2023 2023. Lecture Notes in Networks and Systems, vol 748. Springer, Cham. https://doi.org/10.1007/978-3-031-42519-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-42519-6_13
Published: 27 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42518-9
Online ISBN: 978-3-031-42519-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics