Effective Malicious URL Detection by Using Generative Adversarial Networks

Geng, Jinbu; Li, Shuhao; Liu, Zhicheng; Cheng, Zhenyu; Fan, Li

doi:10.1007/978-3-031-09917-5_23

Jinbu Geng^11,12,
Shuhao Li^11,12,
Zhicheng Liu^11,13,
Zhenyu Cheng¹¹ &
…
Li Fan¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13362))

Included in the following conference series:

International Conference on Web Engineering

1541 Accesses

Abstract

Malicious URL, a.k.a. malicious website, pose a great threat to Web security. In particular, concept drift caused by variants of malicious URL degrades the performance of existing detection methods based on the available attack patterns. In this paper, We conduct an extensive measurement study of the realistic URL and find that the hierarchical semantics feature is suitable for identifying malicious URL. Therefore, we propose URLGAN, a deep neural network model equipped with the hierarchical semantics features, to detect distinguish between malicious and normal URL. Firstly, we embed the entire URL into a hierarchical semantics structure. Secondly, hierarchical semantics features are extracted from the hierarchical semantics structure through BERT. Then, the extracted features are combined with features generated by the generator, similar but slightly different, to enable the condition discriminator to extract the essential difference between normal and malicious URL. Notably, with the features generated by the generator, we enhance the robustness of the system to detect malicious URL variants. Extensive experiments on the public dataset and our data collected from specific targets demonstrate that our method achieves superior performance to other methods and protects specific targets from the susceptibility of malicious URL.

This work is supported by the National Key Research and Development Program of China (Grant No. 2018YFB0804704), and the National Natural Science Foundation of China (Grant No. U1736218).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

A Novel Web Anomaly Detection Approach Based on Semantic Structure

FedURL: A BERT-based Federated Malicious URL Detection Framework

References

Survey shows phishing attacks are up and few are spared. https://mypage.webroot.com/rs/557-FSI-195/images/IDG_Report_Increased_Phising_Attacks.pdf. Accessed 11 Feb 2022
Afzal, S., Asim, M., Javed, A.R., Beg, M.O., Baker, T.: URLdeepDetect: a deep learning approach for detecting malicious URLs using semantic vector models. J. Netw. Syst. Manage. 29(3), 1–27 (2021). https://doi.org/10.1007/s10922-021-09587-8
Article Google Scholar
Anand, A., Gorde, K., Moniz, J.R.A., Park, N., Chakraborty, T., Chu, B.T.: Phishing URL detection with oversampling based on text generative adversarial networks. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 1168–1177. IEEE (2018)
Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)
Article Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Kolari, P., Finin, T., Joshi, A., et al.: SVMs for the blogosphere: blog identification and splog detection. In: AAAI Spring Symposium on Computational Approaches to Analysing Weblogs (2006)
Google Scholar
Le, A., Markopoulou, A., Faloutsos, M.: PhishDef: URL names say it all. In: 2011 Proceedings IEEE INFOCOM, pp. 191–195. IEEE (2011)
Google Scholar
Le, H., Pham, Q., Sahoo, D., Hoi, S.C.: URLNet: learning a URL representation with deep learning for malicious URL detection. arXiv preprint arXiv:1802.03162 (2018)
Liu, Z., Li, S., Zhang, Y., Yun, X., Cheng, Z.: Efficient malware originated traffic classification by using generative adversarial networks. In: 2020 IEEE Symposium on Computers and Communications (ISCC), pp. 1–7. IEEE (2020)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 681–688 (2009)
Google Scholar
Mamun, M.S.I., Rathore, M.A., Lashkari, A.H., Stakhanova, N., Ghorbani, A.A.: Detecting malicious URLs using lexical analysis. In: Chen, J., Piuri, V., Su, C., Yung, M. (eds.) NSS 2016. LNCS, vol. 9955, pp. 467–482. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46298-1_30
Chapter Google Scholar
Marchal, S., François, J., State, R., Engel, T.: PhishStorm: detecting phishing with streaming analytics. IEEE Trans. Netw. Serv. Manage. 11(4), 458–471 (2014)
Article Google Scholar
Patil, D.R., Patil, J.: Survey on malicious web pages detection techniques. Int. J. u- e-Serv. Sci. Technol. 8(5), 195–206 (2015)
Article MathSciNet Google Scholar
Patil, P., Rane, R., Bhalekar, M.: Detecting spam and phishing mails using SVM and obfuscation URL detection algorithm. In: 2017 International Conference on Inventive Systems and Control (ICISC), pp. 1–4. IEEE (2017)
Google Scholar
Prakash, P., Kumar, M., Kompella, R.R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. In: 2010 Proceedings IEEE INFOCOM, pp. 1–5. IEEE (2010)
Google Scholar
Sahoo, D., Liu, C., Hoi, S.C.: Malicious URL detection using machine learning: a survey. arXiv preprint arXiv:1701.07179 (2017)
Yun, X., Huang, J., Wang, Y., Zang, T., Zhou, Y., Zhang, Y.: Khaos: an adversarial neural network DGA with high anti-detection ability. IEEE Trans. Inf. Forensics Secur. 15, 2225–2240 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Jinbu Geng, Shuhao Li, Zhicheng Liu, Zhenyu Cheng & Li Fan
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Jinbu Geng & Shuhao Li
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing, China
Zhicheng Liu

Authors

Jinbu Geng
View author publications
You can also search for this author in PubMed Google Scholar
Shuhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Li Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuhao Li .

Editor information

Editors and Affiliations

Polytechnic University of Bari, Bari, Italy
Tommaso Di Noia
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea (Republic of)
In-Young Ko
Johannes Kepler University Linz, Linz, Austria
Markus Schedl
Polytechnic University of Bari, Bari, Italy
Carmelo Ardito

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geng, J., Li, S., Liu, Z., Cheng, Z., Fan, L. (2022). Effective Malicious URL Detection by Using Generative Adversarial Networks. In: Di Noia, T., Ko, IY., Schedl, M., Ardito, C. (eds) Web Engineering. ICWE 2022. Lecture Notes in Computer Science, vol 13362. Springer, Cham. https://doi.org/10.1007/978-3-031-09917-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-09917-5_23
Published: 01 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09916-8
Online ISBN: 978-3-031-09917-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effective Malicious URL Detection by Using Generative Adversarial Networks