Skip to main content

Abstract

Phishing attacks continue to pose a significant threat to internet security, with phishing URLs being among the most prevalent attacks. Detecting these URLs is challenging, as attackers constantly evolve their tactics. Few-shot learning has emerged as a promising approach for learning from limited data, making it ideal for the task of phishing URL detection. In this paper, we propose a prototypical network (DPN) disentangled by triplet sampling that learns disentangled URL prototypes to improve the accuracy of phishing detection with limited data. The key idea is to capture the underlying structure and characteristics of URLs, making it highly effective in detecting phishing URLs. This method involves sampling triplets of anchor, positive, and negative URLs to train the network, which encourages the embedding space to be more separable between phishing and benign URLs. To evaluate the proposed method, we have collected and assessed a real-world dataset consisting of one million URLs, and additionally utilized two benchmark URL datasets. Our method outperforms the state-of-the-art models, achieving accuracies of 98.0% in a 2-way 50-shot task and 98.32% in a 2-way 5000-shot task. Moreover, the experiments highlight the advantages of using a disentangled representation of URLs, where t-SNE visualizations reveal distinct and well-separated URL prototypes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Purwanto, R.W., Pal, A., Blair, A., Jha, S.: PhishSim: aiding phishing website detection with a feature-free tool. IEEE Trans. Inf. Forensics Secur. 17, 1497–1512 (2022)

    Google Scholar 

  2. da Silva, C.M.R., Fernandes, B.J.T., Feitosa, E.L., Garcia, V.C.: Piracema. io: A rules-based tree model for phishing prediction. Expert Syst. Appl. 191, 116239 (2022)

    Article  Google Scholar 

  3. Huang, L., Jia, S., Balcetis, E., Zhu, Q.: Advert: an adaptive and data-driven attention enhancement mechanism for phishing prevention. IEEE Trans. Inf. Forensics Secur. 17, 2585–2597 (2022)

    Google Scholar 

  4. Anand, A., Gorde, K., Moniz, J.R.A., Park, N., Chakraborty, T., Chu, B.-T.: Phishing URL detection with oversampling based on text generative adversarial networks. In: IEEE International Conference on Big Data, pp. 1168–1177. IEEE (2018)

    Google Scholar 

  5. Shirazi, H., Muramudalige, S.R., Ray, I., Jayasumana, A.P., Wang, H.: Adversarial autoencoder data synthesis for enhancing machine learning-based phishing detection algorithms. IEEE Trans. Serv. Comput. 16, 2411–2422 (2023)

    Google Scholar 

  6. Liu, C., et al.: Learning a few-shot embedding model with contrastive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 10, pp. 8635–8643

    Google Scholar 

  7. Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)

    Google Scholar 

  8. Jiang, W., Huang, K., Geng, J., Deng, X.: Multi-scale metric learning for few-shot learning. IEEE Trans. Circuits Syst. Video Technol. 31(3), 1091–1102 (2020)

    Google Scholar 

  9. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. in Neural Information Processing systems, vol. 30 (2017)

    Google Scholar 

  10. Wang, P., Tang, Z., Wang, J.: A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling. Comput. Secur. 106, 102273 (2021)

    Article  Google Scholar 

  11. Chai, Y., Du, L., Qiu, J., Yin, L., Tian, Z.: Dynamic prototype network based on sample adaptation for few-shot malware detection. IEEE Trans. Knowl. Data Eng. (2022)

    Google Scholar 

  12. Le, H., Pham, Q., Sahoo, D., Hoi, S.C.: URLNet: learning a URL representation with deep learning for malicious URL detection, arXiv preprint arXiv:1802.03162 (2018)

  13. Tajaddodianfar, F., Stokes, J.W., Gururajan, A.: Texception: a character/word-level deep learning model for phishing URL detection. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2857–2861. IEEE (2020)

    Google Scholar 

  14. Chou, E.J., Gururajan, A., Laine, K., Goel, N.K., Bertiger, A., Stokes, J.W.: Privacy-preserving phishing web page classification via fully homomorphic encryption. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2792–2796. IEEE (2020)

    Google Scholar 

  15. Bu, S.-J., Cho, S.-B.: Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing URL detection. Electronics 10(12), 1492 (2021)

    Article  Google Scholar 

  16. Arachie, C., Huang, B.: Adversarial label learning. AAAI Conf. on Artificial Intelligence 33(01), 3183–3190 (2019)

    Article  MATH  Google Scholar 

  17. Park, K.-W., Bu, S.-J., Cho, S.-B.: Evolutionary optimization of neuro-symbolic integration for phishing URL detection. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 88–100. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86271-8_8

  18. Bu, S.-J., Cho, S.-B.: Integrating deep learning with first-order logic programmed constraints for zero-day phishing attack detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2685–2689. IEEE (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Yonsei Fellow Program funded by Lee Youn Jae, Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub), and Air Force Defense Research Sciences Program funded by Air Force Office of Scientific Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sung-Bae Cho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bu, SJ., Cho, SB. (2023). Phishing URL Detection with Prototypical Neural Network Disentangled by Triplet Sampling. In: García Bringas, P., et al. International Joint Conference 16th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2023) 14th International Conference on EUropean Transnational Education (ICEUTE 2023). CISIS ICEUTE 2023 2023. Lecture Notes in Networks and Systems, vol 748. Springer, Cham. https://doi.org/10.1007/978-3-031-42519-6_13

Download citation

Publish with us

Policies and ethics