Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network

Wang, Huan-huan; Yu, Long; Tian, Sheng-wei; Peng, Yong-fang; Pei, Xin-jun

doi:10.1007/s10489-019-01433-4

Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network

Published: 21 February 2019

Volume 49, pages 3016–3026, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Huan-huan Wang¹,
Long Yu ORCID: orcid.org/0000-0001-9041-0801²,
Sheng-wei Tian^1,3,
Yong-fang Peng¹ &
…
Xin-jun Pei³

1222 Accesses
30 Citations
Explore all metrics

Abstract

This paper proposes a bidirectional LSTM algorithm (CBIR) based on convolutional neural network and independent recurrent neural network. The algorithm extracts the “texture fingerprint” feature used to express the similarity of the content of the URL binary file of the malicious webpages, and uses the word vector tool word2vec to train the URL word vector feature and extract the URL static vocabulary feature. The “texture fingerprint” feature, the URL word vector feature and the URL static vocabulary feature are merged, and the malicious webpages is analyzed and detected based on the CBIR algorithm model. Experimental results show that compared with other methods, the proposed CBIR algorithm has improved the accuracy of malicious webpages detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Laith Alzubaidi, Jinglan Zhang, … Laith Farhan

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Abdul Basit, Maham Zafar, … Kashif Kifayat

Deepfakes: current and future trends

Article Open access 19 February 2024

Ángel Fernández Gambín, Anis Yazidi, … Youcef Djenouri

References

Akiyama M, Yagi T, Yada T et al (2017) Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots [J]. Comput Sec
Jeeva SC, Rajsingh EB (2016) Intelligent phishing url detection using association rule mining [J]. Human-centric Comput Inform Sci 6(1):1–19
Article Google Scholar
Le H, Pham Q, Sahoo D, et al. (2018) URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection [J]
Banescu S, Wuchner T, Salem A et al (2015) A framework for empirical evaluation of malware detection resilience against behavior obfuscation [C]// international conference on Malicious and unwanted software. IEEE:40–47
Bhagyashree E, Tanuja K (2015) Phishing URL detection: a machine learning and web mining-based approach [J]. Int J Comput Appl 123(13):46–50
Google Scholar
Olalere M, Abdullah MT, Mahmod R et al (2017) Identification and evaluation of discriminative lexical features of malware URL for real-time classification [C]// international conference on computer and communication engineering. IEEE:90–95
Hong-Zhou S, Zhou Z, Qing-Yun L, Peng Q (2014) Light-weight self-learning for URL classification [J]. J Commun 35(9):32–39
Google Scholar
Hong-Zhou S, Zhou Z, Qing-Yun L, Peng Q (2016) Survey on Malicious webpage detection research [J]. Chin J Comput 39(3):529–542
MathSciNet Google Scholar
Lin H-L, Wei L, Wang W-P, Yin-Liang Y, Lin Z (2015) Efficient segment pattern based method for malicious URL detection [J]. J Commun 36(s1):141–148
Google Scholar
Yanbing L, Wei S, Wang Y et al (2014) A multiple string matching algorithm for large-scale URL filtering [J]. Chin J Comput 5:1159–1169
Google Scholar
Hao Z (2013) The Research and Implementation of Malicious webpagess Detection from Search Engine Based on Decision Tree [D]. Hunan University
Langville AN, Meyer CD (2011) Google's PageRank and beyond [J]. Math Intell 30(1):68–69
Google Scholar
Poomagal S, Hamsapriya T (2011) K-means for search results clustering using URL and tag contents [C]// international conference on process automation, control and computing. IEEE:1–7
Gibson RK, Gillan K, Greffet F et al (2013) Party organizational change and ICTs: the growth of a virtual grassroots?[J]. New Media Soc 15(1):31–51
Article Google Scholar
Zheng LX, Qing-Shan LI, Su-Ke LI et al (2012) Phishing URL detection based on domain name information [J]. Comput Eng 38(10):108–110
Google Scholar
Wang Q-S (2008) Design and implementation of HTTP Trojan horse network monitoring system based on client honeypot technology [D]. Beijing University
Shiraishi Y, Kamizono M, Hirotomo M, et al. (2014) Detection of Malicious PDF Files by Windows API Hook-based Network Monitoring [J]. D - Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition)
Chen K, Wen-De KE, Wang AG et al (2015) Research on behavior analysis system based on sandbox technology [J]. Comput Technol Dev
Kamarudin ANA, Ranaivo-Malançon B (2015) Simple internet filtering access for kids using naïve Bayes and blacklisted URLs [C]// International Knowledge Conference
Sun B, Akiyama M, Yagi T, et al. (2016) AutoBLG: Automatic URL blacklist generator using search space expansion and filters [C]// IEEE, 625–631
Konte M, Perdisci R, Feamster N (2015) ASwatch:an AS reputation system to expose bulletproof hosting ASes [J]. ACM SIGCOMM Comput Comm Rev 45(5):625–638
Article Google Scholar
Xue Y, Li Y, Yao Y et al (2016) Phishing sites detection based on Url correlation [C]// international conference on cloud computing and intelligence systems. IEEE:244–248
Feroz MN, Mengel S (2015) Phishing URL detection using URL ranking [C]// IEEE international congress on big data. IEEE Comput Soc:635–638
Rajitha K, Vijayalakshmi D (2018) Suspicious URLs Filtering Using Optimal RT-PFL: A Novel Feature Selection Based Web URL Detection [M]// Smart Computing and Informatics
Luo S, Shengwei T, Yu L, Yu J, Hua S (2018) Android malicious code classification using deep belief network [J]. KSII Trans Intern Inform Syst 12(1):454–475
Google Scholar
Shengwei T, Xingfa Z, Long P et al (2018) Causal relationship extraction based on bidirectional LSTM in Uighur language [J]. J Electron Inf Technol 40(1):200–208
Google Scholar
Shengwei T, Yue Q, Long Q, Ibrahim T, Champion F (2018) Bi-LSTM-based Uighur personal pronouns referential decomposition [J]. Acta Electron Sin 46(07):1691–1699
Google Scholar
Mamun MSI, Rathore MA, Lashkari AH et al (2016) Detecting Malicious URLs using lexical analysis [C]// international conference on network and system security. Springer, Cham:467–482
Sahoo D, Liu C, Hoi S C H. Malicious URL (2017) Detection using Machine Learning: A Survey [J]
Liu G, Qiu B, Liu W (2017) Automatic Detection of Phishing Target from Phishing Webpage [J]. 57(11):4153–4156
Fatt JCS, Leng CK, Nah SS (2015) Phishdentity: leverage website favicon to offset polymorphic phishing website [C]// international conference on availability. IEEE:114–119
Dewan P, Kumaraguru P (2015) Detecting Malicious content on Facebook [J]. Comput Therm Sci
Jain AK, Gupta BB (2016) A novel approach to protect against phishing attacks at client side using auto-updated white-list [J]. EURASIP J Inf Secur 2016(1):9
Article Google Scholar
Jain AK, Gupta BB (2018) PHISH-SAFE: URL Features-Based Phishing Detection System Using Machine Learning [J]
Nepali R K , Wang Y (2016) You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter.[C]// Hawaii International Conference on System Sciences. IEEE
Seymour J, Tully P (2018) Generative Models for Spear Phishing Posts on Social Media [J]
Saxe J , Berlin K 2017 eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys [J]
Nguyen M , Nguyen T , Nguyen T H (2018) A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing [J]

Download references

Acknowledgements

We would like to thank all the participants in our study that provided useful and detailed feedback. Also, I would thank all my tutors and my team for the research.

This work is partially supported by the Science the Technology Talent Training Project of Xinjiang Uygur Autonomous Region (QN2016YX0051), the Scientific Research Innovation Project of Education Innovation Plan for Graduate Students in Xinjiang Uygur Autonomous Region (XJGRI2017007), the Cernet Next Generation Internet Technology Innovation Project (NGII20170420), Tianshan Youth Program (2017Q011).

Author information

Authors and Affiliations

School of Software, Xinjiang University, No.499, Xibei Road, Saybagh District, Urumqi, Xinjiang, 830008, People’s Republic of China
Huan-huan Wang, Sheng-wei Tian & Yong-fang Peng
Network Center, Xinjiang University, No.666, Shengli Road, Tianshan District, Urumqi, Xinjiang, 830046, People’s Republic of China
Long Yu
School of Information Science and Engineering, Xinjiang University, No.666, Shengli Road, Tianshan District, Urumqi, Xinjiang, 830046, People’s Republic of China
Sheng-wei Tian & Xin-jun Pei

Authors

Huan-huan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Long Yu
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-wei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yong-fang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xin-jun Pei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Hh., Yu, L., Tian, Sw. et al. Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network. Appl Intell 49, 3016–3026 (2019). https://doi.org/10.1007/s10489-019-01433-4

Download citation

Published: 21 February 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10489-019-01433-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Deepfakes: current and future trends

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Deepfakes: current and future trends

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation