Abstract
This paper proposes a bidirectional LSTM algorithm (CBIR) based on convolutional neural network and independent recurrent neural network. The algorithm extracts the “texture fingerprint” feature used to express the similarity of the content of the URL binary file of the malicious webpages, and uses the word vector tool word2vec to train the URL word vector feature and extract the URL static vocabulary feature. The “texture fingerprint” feature, the URL word vector feature and the URL static vocabulary feature are merged, and the malicious webpages is analyzed and detected based on the CBIR algorithm model. Experimental results show that compared with other methods, the proposed CBIR algorithm has improved the accuracy of malicious webpages detection.
Similar content being viewed by others
References
Akiyama M, Yagi T, Yada T et al (2017) Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots [J]. Comput Sec
Jeeva SC, Rajsingh EB (2016) Intelligent phishing url detection using association rule mining [J]. Human-centric Comput Inform Sci 6(1):1–19
Le H, Pham Q, Sahoo D, et al. (2018) URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection [J]
Banescu S, Wuchner T, Salem A et al (2015) A framework for empirical evaluation of malware detection resilience against behavior obfuscation [C]// international conference on Malicious and unwanted software. IEEE:40–47
Bhagyashree E, Tanuja K (2015) Phishing URL detection: a machine learning and web mining-based approach [J]. Int J Comput Appl 123(13):46–50
Olalere M, Abdullah MT, Mahmod R et al (2017) Identification and evaluation of discriminative lexical features of malware URL for real-time classification [C]// international conference on computer and communication engineering. IEEE:90–95
Hong-Zhou S, Zhou Z, Qing-Yun L, Peng Q (2014) Light-weight self-learning for URL classification [J]. J Commun 35(9):32–39
Hong-Zhou S, Zhou Z, Qing-Yun L, Peng Q (2016) Survey on Malicious webpage detection research [J]. Chin J Comput 39(3):529–542
Lin H-L, Wei L, Wang W-P, Yin-Liang Y, Lin Z (2015) Efficient segment pattern based method for malicious URL detection [J]. J Commun 36(s1):141–148
Yanbing L, Wei S, Wang Y et al (2014) A multiple string matching algorithm for large-scale URL filtering [J]. Chin J Comput 5:1159–1169
Hao Z (2013) The Research and Implementation of Malicious webpagess Detection from Search Engine Based on Decision Tree [D]. Hunan University
Langville AN, Meyer CD (2011) Google's PageRank and beyond [J]. Math Intell 30(1):68–69
Poomagal S, Hamsapriya T (2011) K-means for search results clustering using URL and tag contents [C]// international conference on process automation, control and computing. IEEE:1–7
Gibson RK, Gillan K, Greffet F et al (2013) Party organizational change and ICTs: the growth of a virtual grassroots?[J]. New Media Soc 15(1):31–51
Zheng LX, Qing-Shan LI, Su-Ke LI et al (2012) Phishing URL detection based on domain name information [J]. Comput Eng 38(10):108–110
Wang Q-S (2008) Design and implementation of HTTP Trojan horse network monitoring system based on client honeypot technology [D]. Beijing University
Shiraishi Y, Kamizono M, Hirotomo M, et al. (2014) Detection of Malicious PDF Files by Windows API Hook-based Network Monitoring [J]. D - Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition)
Chen K, Wen-De KE, Wang AG et al (2015) Research on behavior analysis system based on sandbox technology [J]. Comput Technol Dev
Kamarudin ANA, Ranaivo-Malançon B (2015) Simple internet filtering access for kids using naïve Bayes and blacklisted URLs [C]// International Knowledge Conference
Sun B, Akiyama M, Yagi T, et al. (2016) AutoBLG: Automatic URL blacklist generator using search space expansion and filters [C]// IEEE, 625–631
Konte M, Perdisci R, Feamster N (2015) ASwatch:an AS reputation system to expose bulletproof hosting ASes [J]. ACM SIGCOMM Comput Comm Rev 45(5):625–638
Xue Y, Li Y, Yao Y et al (2016) Phishing sites detection based on Url correlation [C]// international conference on cloud computing and intelligence systems. IEEE:244–248
Feroz MN, Mengel S (2015) Phishing URL detection using URL ranking [C]// IEEE international congress on big data. IEEE Comput Soc:635–638
Rajitha K, Vijayalakshmi D (2018) Suspicious URLs Filtering Using Optimal RT-PFL: A Novel Feature Selection Based Web URL Detection [M]// Smart Computing and Informatics
Luo S, Shengwei T, Yu L, Yu J, Hua S (2018) Android malicious code classification using deep belief network [J]. KSII Trans Intern Inform Syst 12(1):454–475
Shengwei T, Xingfa Z, Long P et al (2018) Causal relationship extraction based on bidirectional LSTM in Uighur language [J]. J Electron Inf Technol 40(1):200–208
Shengwei T, Yue Q, Long Q, Ibrahim T, Champion F (2018) Bi-LSTM-based Uighur personal pronouns referential decomposition [J]. Acta Electron Sin 46(07):1691–1699
Mamun MSI, Rathore MA, Lashkari AH et al (2016) Detecting Malicious URLs using lexical analysis [C]// international conference on network and system security. Springer, Cham:467–482
Sahoo D, Liu C, Hoi S C H. Malicious URL (2017) Detection using Machine Learning: A Survey [J]
Liu G, Qiu B, Liu W (2017) Automatic Detection of Phishing Target from Phishing Webpage [J]. 57(11):4153–4156
Fatt JCS, Leng CK, Nah SS (2015) Phishdentity: leverage website favicon to offset polymorphic phishing website [C]// international conference on availability. IEEE:114–119
Dewan P, Kumaraguru P (2015) Detecting Malicious content on Facebook [J]. Comput Therm Sci
Jain AK, Gupta BB (2016) A novel approach to protect against phishing attacks at client side using auto-updated white-list [J]. EURASIP J Inf Secur 2016(1):9
Jain AK, Gupta BB (2018) PHISH-SAFE: URL Features-Based Phishing Detection System Using Machine Learning [J]
Nepali R K , Wang Y (2016) You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter.[C]// Hawaii International Conference on System Sciences. IEEE
Seymour J, Tully P (2018) Generative Models for Spear Phishing Posts on Social Media [J]
Saxe J , Berlin K 2017 eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys [J]
Nguyen M , Nguyen T , Nguyen T H (2018) A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing [J]
Acknowledgements
We would like to thank all the participants in our study that provided useful and detailed feedback. Also, I would thank all my tutors and my team for the research.
This work is partially supported by the Science the Technology Talent Training Project of Xinjiang Uygur Autonomous Region (QN2016YX0051), the Scientific Research Innovation Project of Education Innovation Plan for Graduate Students in Xinjiang Uygur Autonomous Region (XJGRI2017007), the Cernet Next Generation Internet Technology Innovation Project (NGII20170420), Tianshan Youth Program (2017Q011).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Hh., Yu, L., Tian, Sw. et al. Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network. Appl Intell 49, 3016–3026 (2019). https://doi.org/10.1007/s10489-019-01433-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01433-4