Abstract
Phishing attacks continue to pose a significant threat to internet security, with phishing URLs being among the most prevalent attacks. Detecting these URLs is challenging, as attackers constantly evolve their tactics. Few-shot learning has emerged as a promising approach for learning from limited data, making it ideal for the task of phishing URL detection. In this paper, we propose a prototypical network (DPN) disentangled by triplet sampling that learns disentangled URL prototypes to improve the accuracy of phishing detection with limited data. The key idea is to capture the underlying structure and characteristics of URLs, making it highly effective in detecting phishing URLs. This method involves sampling triplets of anchor, positive, and negative URLs to train the network, which encourages the embedding space to be more separable between phishing and benign URLs. To evaluate the proposed method, we have collected and assessed a real-world dataset consisting of one million URLs, and additionally utilized two benchmark URL datasets. Our method outperforms the state-of-the-art models, achieving accuracies of 98.0% in a 2-way 50-shot task and 98.32% in a 2-way 5000-shot task. Moreover, the experiments highlight the advantages of using a disentangled representation of URLs, where t-SNE visualizations reveal distinct and well-separated URL prototypes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Purwanto, R.W., Pal, A., Blair, A., Jha, S.: PhishSim: aiding phishing website detection with a feature-free tool. IEEE Trans. Inf. Forensics Secur. 17, 1497–1512 (2022)
da Silva, C.M.R., Fernandes, B.J.T., Feitosa, E.L., Garcia, V.C.: Piracema. io: A rules-based tree model for phishing prediction. Expert Syst. Appl. 191, 116239 (2022)
Huang, L., Jia, S., Balcetis, E., Zhu, Q.: Advert: an adaptive and data-driven attention enhancement mechanism for phishing prevention. IEEE Trans. Inf. Forensics Secur. 17, 2585–2597 (2022)
Anand, A., Gorde, K., Moniz, J.R.A., Park, N., Chakraborty, T., Chu, B.-T.: Phishing URL detection with oversampling based on text generative adversarial networks. In: IEEE International Conference on Big Data, pp. 1168–1177. IEEE (2018)
Shirazi, H., Muramudalige, S.R., Ray, I., Jayasumana, A.P., Wang, H.: Adversarial autoencoder data synthesis for enhancing machine learning-based phishing detection algorithms. IEEE Trans. Serv. Comput. 16, 2411–2422 (2023)
Liu, C., et al.: Learning a few-shot embedding model with contrastive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 10, pp. 8635–8643
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)
Jiang, W., Huang, K., Geng, J., Deng, X.: Multi-scale metric learning for few-shot learning. IEEE Trans. Circuits Syst. Video Technol. 31(3), 1091–1102 (2020)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. in Neural Information Processing systems, vol. 30 (2017)
Wang, P., Tang, Z., Wang, J.: A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling. Comput. Secur. 106, 102273 (2021)
Chai, Y., Du, L., Qiu, J., Yin, L., Tian, Z.: Dynamic prototype network based on sample adaptation for few-shot malware detection. IEEE Trans. Knowl. Data Eng. (2022)
Le, H., Pham, Q., Sahoo, D., Hoi, S.C.: URLNet: learning a URL representation with deep learning for malicious URL detection, arXiv preprint arXiv:1802.03162 (2018)
Tajaddodianfar, F., Stokes, J.W., Gururajan, A.: Texception: a character/word-level deep learning model for phishing URL detection. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2857–2861. IEEE (2020)
Chou, E.J., Gururajan, A., Laine, K., Goel, N.K., Bertiger, A., Stokes, J.W.: Privacy-preserving phishing web page classification via fully homomorphic encryption. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2792–2796. IEEE (2020)
Bu, S.-J., Cho, S.-B.: Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing URL detection. Electronics 10(12), 1492 (2021)
Arachie, C., Huang, B.: Adversarial label learning. AAAI Conf. on Artificial Intelligence 33(01), 3183–3190 (2019)
Park, K.-W., Bu, S.-J., Cho, S.-B.: Evolutionary optimization of neuro-symbolic integration for phishing URL detection. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 88–100. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86271-8_8
Bu, S.-J., Cho, S.-B.: Integrating deep learning with first-order logic programmed constraints for zero-day phishing attack detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2685–2689. IEEE (2021)
Acknowledgements
This work was supported by the Yonsei Fellow Program funded by Lee Youn Jae, Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub), and Air Force Defense Research Sciences Program funded by Air Force Office of Scientific Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bu, SJ., Cho, SB. (2023). Phishing URL Detection with Prototypical Neural Network Disentangled by Triplet Sampling. In: García Bringas, P., et al. International Joint Conference 16th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2023) 14th International Conference on EUropean Transnational Education (ICEUTE 2023). CISIS ICEUTE 2023 2023. Lecture Notes in Networks and Systems, vol 748. Springer, Cham. https://doi.org/10.1007/978-3-031-42519-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-42519-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42518-9
Online ISBN: 978-3-031-42519-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)