Content-aware malicious webpage detection using convolutional neural network

Chang, Yen-Jen; Tsai, Kun-Lin; Jiang, Wei-Cheng; Liu, Meng-Kun

doi:10.1007/s11042-023-15559-8

Content-aware malicious webpage detection using convolutional neural network

Published: 14 June 2023

Volume 83, pages 8145–8163, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yen-Jen Chang¹,
Kun-Lin Tsai²,
Wei-Cheng Jiang ORCID: orcid.org/0000-0003-4432-8801² &
…
Meng-Kun Liu¹

176 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Malicious websites often install malware on user devices to gather user information or to disrupt device operations, violate user privacy, or adversely affect company interests. Many commercial tools are available to prevent malicious webpages from accessing devices; however, current versions of these tools may become useless as soon as a new generation of malware is released. In this study, a content-aware malicious webpage detection (CAMD) method was developed; this CAMD method can verify whether a webpage is malicious by applying a novel webpage contextual visualization process, which retrieves the critical codes of webpages, transforms those codes into one-dimensional grayscale images, and applies convolutional neural networks to detect any malicious webpages. To verify the feasibility of proposed CAMD, 50000 normal and 50000 malicious webpages from the VirusTotal website were used. The results indicated that the proposed CAMD achieved an accuracy of > 98%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malicious Webpage Classification

Malicious Webpage Classification Using Deep Learning Technique

Scope of Visual-Based Similarity Approach Using Convolutional Neural Network on Phishing Website Detection

References

Abdi F, Wenjuan L (2017) Malicious URL detection using convolutional neural network. Int J Comput Sci Eng Inf Technol 7:01–08. https://doi.org/10.5121/ijcseit.2017.7601.7
Google Scholar
Aleroud A, Zhou L (2017) Phishing environments, techniques, and countermeasures: a survey. Comput Secur 68:160–196
Article Google Scholar
Bakkouri I, Afdel K (2019) Multi-scale CNN based on region proposals for efficient breast abnormality recognition. Multimed Tools Appl 78:12939–12960
Article Google Scholar
Bakkouri I, Afdel K (2020) Computer-aided diagnosis (CAD) system based on multi-layer feature fusion network for skin lesion recognition in dermoscopy images. Multimed Tools Appl 79(29–30):20483–20518
Article Google Scholar
Bakkouri I, Afdel K (2022) MLCA2F: multi-level context attentional feature fusion for COVID-19 lesion segmentation from CT scans. SIViP :1–8
Canali D, Cova M, Vigna G, Kruegel C (2011) Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Inproceedings of the 20th international conference on World Wide Web pp 197–206
Chiba D, Tobe K, Mori T, Goto S (2012) Detecting malicious websites by learning IP address features. In: 2012 IEEE/IPSJ 12th international symposium on applications and the internet, IEEE, pp 29–39
Cova M, Kruegel C, Vigna G (2010) Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Inproceedings of the 19th international conference on World wide web. pp 281–290
Cui Q, Zhang Z, Shi Y, Ni W, Zeng M, Zhou M (2021) Dynamic multichannel access based on deep reinforcement learning in distributed wireless networks. IEEE Syst J 16(4):5831–5834
Article Google Scholar
Dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Inproceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78
Fukushima Y, Hori Y, Sakurai K (2011) Proactive blacklisting for malicious web sites by reputation evaluation based on domain and IP address registration. In: 2011 IEEE 10th international conference on trust, security and privacy in computing and communications, IEEE, pp 352–361
Goldberg Y, Levy O (2014) word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv:1402.3722
Heartfield R, Loukas G (2015) A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput Surv (CSUR) 48(3):1–39
Article Google Scholar
Huang LS, Moshchuk A, Wang HJ, Schecter S, Jackson C (2012) Clickjacking: attacks and defenses. In: In 21st USENIX, security symposium (USENIX Security), vol 12, pp 413–428
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Liu D, Lee JH (2020) CNN Based malicious website detection by invalidating multiple web spams. IEEE Access 8:97258–97266
Article Google Scholar
Manan WNW, Kahar MNM, Ali NM (2020) A survey on current malicious javascript behavior of infected web content in detection of malicious web pages. In: IOP conference series: materials science and engineering, IOP Publishing, vol 769, No. 1, p 012074
Oh I, Rho S, Moon S, Son S, Lee H, Chung J (2021) Creating pro-level AI for a real-time fighting game using deep reinforcement learning. IEEE Trans Games 14(2):212–220
Article Google Scholar
Patil DR, Patil JB (2015) Survey on malicious web pages detection techniques. Int J u e-Serv Sci Technol 8(5):195–206
Article MathSciNet Google Scholar
Peng T, Harris I, Sawa Y (2018) Detecting phishing attacks using natural language processing and machine learning. In: 2018 IEEE 12th international conference on semantic computing (ICSC), IEEE, pp 300–301
Purkait S (2012) Phishing counter measures and their effectiveness–literature review. Information Management & Computer Security
Sahoo D, Liu C, Hoi SC (2017) Malicious URL detection using machine learning: a survey. arXiv:1701.07179
Saxe J, Berlin K (2017) eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv:1702.08568
Saxe J, Harang R, Wild C, Sanders H (2018) A deep learning approach to fast, format-agnostic detection of malicious web content. In: 2018 IEEE security and privacy workshops (SPW), IEEE, pp 8–14
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sinha S, Bailey M, Jahanian F (2008) Shades of Grey: On the effectiveness of reputation-based “blacklists”. In: 2008 3rd international conference on malicious and unwanted software (MALWARE) IEEE, pp 57–64
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A (2015) Going deeper with convolutions. In: Inproceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Total V (2019) Virus total. URL https://www.virustotal.com
Verma R, Crane D, Gnawali O (2018) Phishing during and after disaster: hurricane harvey. In: 2018 Resilience Week (RWS) IEEE, pp 88–94
Yan X, Xu Y, Cui B, Zhang S, Guo T, Li C (2020) Learning URL embedding for malicious website detection. IEEE Trans Ind Inf 16 (10):6673–6681
Article Google Scholar
Yang W, Zuo W, Cui B (2019) Detecting malicious URLs via a keyword-based convolutional gated-recurrent-unit neural network. IEEE Access 7:29891–29900
Article Google Scholar
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:649–657
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Chung Hsing University, Taichung City, Taiwan
Yen-Jen Chang & Meng-Kun Liu
Department of Electrical Engineering, Tunghai University, Taichung City, Taiwan
Kun-Lin Tsai & Wei-Cheng Jiang

Authors

Yen-Jen Chang
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Lin Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Cheng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Meng-Kun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Cheng Jiang.

Ethics declarations

Conflict of Interests

Detailed information of all authors’ receives research support is listing as: This study was funded by the National Science and Technology Council, Taiwan, under Grant MOST 110-2634-F-005-006- and 110-2221-E-029-027-. No other author has reported a potential conflict of interest relevant to this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chang, YJ., Tsai, KL., Jiang, WC. et al. Content-aware malicious webpage detection using convolutional neural network. Multimed Tools Appl 83, 8145–8163 (2024). https://doi.org/10.1007/s11042-023-15559-8

Download citation

Received: 31 December 2022
Revised: 07 March 2023
Accepted: 15 April 2023
Published: 14 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15559-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Content-aware malicious webpage detection using convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Malicious Webpage Classification

Malicious Webpage Classification Using Deep Learning Technique

Scope of Visual-Based Similarity Approach Using Convolutional Neural Network on Phishing Website Detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Content-aware malicious webpage detection using convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Malicious Webpage Classification

Malicious Webpage Classification Using Deep Learning Technique

Scope of Visual-Based Similarity Approach Using Convolutional Neural Network on Phishing Website Detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation