Skip to main content
Log in

Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper proposes a bidirectional LSTM algorithm (CBIR) based on convolutional neural network and independent recurrent neural network. The algorithm extracts the “texture fingerprint” feature used to express the similarity of the content of the URL binary file of the malicious webpages, and uses the word vector tool word2vec to train the URL word vector feature and extract the URL static vocabulary feature. The “texture fingerprint” feature, the URL word vector feature and the URL static vocabulary feature are merged, and the malicious webpages is analyzed and detected based on the CBIR algorithm model. Experimental results show that compared with other methods, the proposed CBIR algorithm has improved the accuracy of malicious webpages detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Akiyama M, Yagi T, Yada T et al (2017) Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots [J]. Comput Sec

  2. Jeeva SC, Rajsingh EB (2016) Intelligent phishing url detection using association rule mining [J]. Human-centric Comput Inform Sci 6(1):1–19

    Article  Google Scholar 

  3. Le H, Pham Q, Sahoo D, et al. (2018) URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection [J]

  4. Banescu S, Wuchner T, Salem A et al (2015) A framework for empirical evaluation of malware detection resilience against behavior obfuscation [C]// international conference on Malicious and unwanted software. IEEE:40–47

  5. Bhagyashree E, Tanuja K (2015) Phishing URL detection: a machine learning and web mining-based approach [J]. Int J Comput Appl 123(13):46–50

    Google Scholar 

  6. Olalere M, Abdullah MT, Mahmod R et al (2017) Identification and evaluation of discriminative lexical features of malware URL for real-time classification [C]// international conference on computer and communication engineering. IEEE:90–95

  7. Hong-Zhou S, Zhou Z, Qing-Yun L, Peng Q (2014) Light-weight self-learning for URL classification [J]. J Commun 35(9):32–39

    Google Scholar 

  8. Hong-Zhou S, Zhou Z, Qing-Yun L, Peng Q (2016) Survey on Malicious webpage detection research [J]. Chin J Comput 39(3):529–542

    MathSciNet  Google Scholar 

  9. Lin H-L, Wei L, Wang W-P, Yin-Liang Y, Lin Z (2015) Efficient segment pattern based method for malicious URL detection [J]. J Commun 36(s1):141–148

    Google Scholar 

  10. Yanbing L, Wei S, Wang Y et al (2014) A multiple string matching algorithm for large-scale URL filtering [J]. Chin J Comput 5:1159–1169

    Google Scholar 

  11. Hao Z (2013) The Research and Implementation of Malicious webpagess Detection from Search Engine Based on Decision Tree [D]. Hunan University

  12. Langville AN, Meyer CD (2011) Google's PageRank and beyond [J]. Math Intell 30(1):68–69

    Google Scholar 

  13. Poomagal S, Hamsapriya T (2011) K-means for search results clustering using URL and tag contents [C]// international conference on process automation, control and computing. IEEE:1–7

  14. Gibson RK, Gillan K, Greffet F et al (2013) Party organizational change and ICTs: the growth of a virtual grassroots?[J]. New Media Soc 15(1):31–51

    Article  Google Scholar 

  15. Zheng LX, Qing-Shan LI, Su-Ke LI et al (2012) Phishing URL detection based on domain name information [J]. Comput Eng 38(10):108–110

    Google Scholar 

  16. Wang Q-S (2008) Design and implementation of HTTP Trojan horse network monitoring system based on client honeypot technology [D]. Beijing University

  17. Shiraishi Y, Kamizono M, Hirotomo M, et al. (2014) Detection of Malicious PDF Files by Windows API Hook-based Network Monitoring [J]. D - Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition)

  18. Chen K, Wen-De KE, Wang AG et al (2015) Research on behavior analysis system based on sandbox technology [J]. Comput Technol Dev

  19. Kamarudin ANA, Ranaivo-Malançon B (2015) Simple internet filtering access for kids using naïve Bayes and blacklisted URLs [C]// International Knowledge Conference

  20. Sun B, Akiyama M, Yagi T, et al. (2016) AutoBLG: Automatic URL blacklist generator using search space expansion and filters [C]// IEEE, 625–631

  21. Konte M, Perdisci R, Feamster N (2015) ASwatch:an AS reputation system to expose bulletproof hosting ASes [J]. ACM SIGCOMM Comput Comm Rev 45(5):625–638

    Article  Google Scholar 

  22. Xue Y, Li Y, Yao Y et al (2016) Phishing sites detection based on Url correlation [C]// international conference on cloud computing and intelligence systems. IEEE:244–248

  23. Feroz MN, Mengel S (2015) Phishing URL detection using URL ranking [C]// IEEE international congress on big data. IEEE Comput Soc:635–638

  24. Rajitha K, Vijayalakshmi D (2018) Suspicious URLs Filtering Using Optimal RT-PFL: A Novel Feature Selection Based Web URL Detection [M]// Smart Computing and Informatics

  25. Luo S, Shengwei T, Yu L, Yu J, Hua S (2018) Android malicious code classification using deep belief network [J]. KSII Trans Intern Inform Syst 12(1):454–475

    Google Scholar 

  26. Shengwei T, Xingfa Z, Long P et al (2018) Causal relationship extraction based on bidirectional LSTM in Uighur language [J]. J Electron Inf Technol 40(1):200–208

    Google Scholar 

  27. Shengwei T, Yue Q, Long Q, Ibrahim T, Champion F (2018) Bi-LSTM-based Uighur personal pronouns referential decomposition [J]. Acta Electron Sin 46(07):1691–1699

    Google Scholar 

  28. Mamun MSI, Rathore MA, Lashkari AH et al (2016) Detecting Malicious URLs using lexical analysis [C]// international conference on network and system security. Springer, Cham:467–482

  29. Sahoo D, Liu C, Hoi S C H. Malicious URL (2017) Detection using Machine Learning: A Survey [J]

  30. Liu G, Qiu B, Liu W (2017) Automatic Detection of Phishing Target from Phishing Webpage [J]. 57(11):4153–4156

  31. Fatt JCS, Leng CK, Nah SS (2015) Phishdentity: leverage website favicon to offset polymorphic phishing website [C]// international conference on availability. IEEE:114–119

  32. Dewan P, Kumaraguru P (2015) Detecting Malicious content on Facebook [J]. Comput Therm Sci

  33. Jain AK, Gupta BB (2016) A novel approach to protect against phishing attacks at client side using auto-updated white-list [J]. EURASIP J Inf Secur 2016(1):9

    Article  Google Scholar 

  34. Jain AK, Gupta BB (2018) PHISH-SAFE: URL Features-Based Phishing Detection System Using Machine Learning [J]

  35. Nepali R K , Wang Y (2016) You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter.[C]// Hawaii International Conference on System Sciences. IEEE

  36. Seymour J, Tully P (2018) Generative Models for Spear Phishing Posts on Social Media [J]

  37. Saxe J , Berlin K 2017 eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys [J]

  38. Nguyen M , Nguyen T , Nguyen T H (2018) A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing [J]

Download references

Acknowledgements

We would like to thank all the participants in our study that provided useful and detailed feedback. Also, I would thank all my tutors and my team for the research.

This work is partially supported by the Science the Technology Talent Training Project of Xinjiang Uygur Autonomous Region (QN2016YX0051), the Scientific Research Innovation Project of Education Innovation Plan for Graduate Students in Xinjiang Uygur Autonomous Region (XJGRI2017007), the Cernet Next Generation Internet Technology Innovation Project (NGII20170420), Tianshan Youth Program (2017Q011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Hh., Yu, L., Tian, Sw. et al. Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network. Appl Intell 49, 3016–3026 (2019). https://doi.org/10.1007/s10489-019-01433-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01433-4

Keywords

Navigation