Skip to main content

Advertisement

Log in

A deep learner model for multi-language webshell detection

  • Regular contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Webshell attacks are becoming more and more prevalent every year. Webshells are malicious scripts injected into web servers in the aim to confiscate persistent and remote access through simple HTTP requests on web browsers. Through webshells, attackers can remotely access confidential data and execute system commands. Actually, threat actors use webshells as an initial foothold to compromise the network infrastructure and cause dramatic damages. The impacts of webshell attacks are enormous, ranging from basic malicious actions, such as exposing sensitive data and upload more dangerous malware, to cause denial of services and compromise external networks and hence put the whole infrastructure at risk. Webshell attacks are hazardous since they can persist for a long time without being noticed by inexperienced administrators and ordinary malware scanners. In the literature, several machine learning-based models were proposed for the detection of PHP webshells. In this paper, we propose and experiment the ability of a simple deep learner model for the detection of multi-language webshells. The aim is to highlight existing challenges on the detection of webshell attacks and outline the way forward. Through analyzing source file scripts, the proposed model is designed to be able to distinguish webshells from benign files. Due to the absence of benchmark datasets for webshell detection, we collected a reasonable in size dataset for the validation process. We compared the performance of the proposed model with recent state-of-the-art systems. We also experimented source-code and opcode-based PHP detection models and the impact of presence of near-duplicates in datasets. Experimental results showed that: (1) the proposed deep learner outperforms all the experimented systems for four tested languages: PHP, JSP, ASP and ASPX with more than 98.27% of accuracy, (2) source-code based detection models are more effective than opcode-based detection models for PHP webshells, (3) the presence of near-duplicates causes higher but biased performance of webshell detection models and (4) more attention should be paid for the detection of webshells with advanced coding tricks such as letter slicing and code splitting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability Statement

The datasets generated and analyzed during the current study are publically available in the Mendeley repository at https://dx.doi.org/10.17632/wt8m6bcwbr.2.

Notes

  1. DAws Advanced Shell available at: https://github.com/dotcppfile/DAws.

  2. Github link: https://github.com.

  3. Note that VLD captures opcode arrays with additional information and parameters; opcodes are described in arrays as capital letters, and this enables their distinction from other information and parameters.

References

  1. Ahsan, M.M., Mahmud, M.A.P., Saha, P.K., Gupta, K.D., Siddique, Z.: Effect of data scaling methods on machine learning algorithms and model performance. Technologies (2021). https://doi.org/10.3390/technologies9030052

    Article  Google Scholar 

  2. Allamanis, M.: The adverse effects of code duplication in machine learning models of code. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 143–153. Onward! 2019, ACM, New York, NY, USA (2019). https://doi.org/10.1145/3359591.3359735

  3. Avast: Avast software: Free antivirus is your first step to online freedom. [online], available: (1995). https://www.avast.com/

  4. Bengfort, B., Bilbro, R., Ojeda, T.: Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning, 1st edn. O’Reilly Media Inc. (2018)

  5. Cui, H., Huang, D., Fang, Y., Liu, L., Huang, C.: Webshell detection based on random forest–gradient boosting decision tree algorithm. In: 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). pp. 153–160. IEEE CS (2018). https://doi.org/10.1109/DSC.2018.00030

  6. Fang, Y., Qiu, Y., Liu, L., Huang, C.: Detecting webshell based on random forest with fasttext. In: Proceedings of the 2018 International Conference on Computing and Artificial Intelligence. 52–56. ICCAI 2018, ACM, New York, NY, USA (2018). https://doi.org/10.1145/3194452.3194470

  7. Guo, Y., Marco-Gisbert, H., Keir, P.: Mitigating webshell attacks through machine learning techniques. Fut. Internet 12(1), 1–16 (2020)

    Google Scholar 

  8. Hannousse, A., Yahiouche, S.: Handling webshell attacks: a systematic mapping and survey. Comput. Secur. 108, 102366 (2021). https://doi.org/10.1016/j.cose.2021.102366

    Article  Google Scholar 

  9. Hannousse, A., Yahiouche, S.: Multi-language webshell dataset. Mendeley Data, V1 (2021). https://doi.org/10.17632/wt8m6bcwbr.1

  10. Hannousse, A., Yahiouche, S.: RF-DNN\(^{2}\): An ensemble learner for effective detection of PHP Webshells. In: Proceedings of the International Conference on Artificial Intelligence for Cyber Security Systems and Privacy. pp. 1–6. AI-CSP’21, IEEE CS (2021). https://doi.org/10.1109/AI-CSP52968.2021.9671226

  11. Hannousse, A., Yahiouche, S., Nait-Hamoud, M.C: Twenty-two years since revealing cross-site scripting attacks: a systematic mapping and a comprehensive survey. CoRR, arXiv:2205.08425v2, 1–52 (2022)

  12. Kang, W., Zhong, S., Chen, K., Lai, J., Xu, G.: Rf-adacost: Webshell detection method that combines statistical features and opcode. In: Proceedings of the 3rd International Conference on Frontiers in Cyber Security. pp. 667–682. FCS 2020, Springer Singapore, Singapore (2020). https://doi.org/10.1007/978-981-15-9739-8_49

  13. Leal, L.: Webshell in fake plugin /blnmrpb/ directory, [online], available: (2020). https://blog.sucuri.net/2020/01/webshell-in-fake-plugin-blnmrpb-directory.html

  14. Li W., Zhang Z., Wang L.: A dynamic and heterogeneous web application to defense webshell attacks by using diversified PHP code. In: Proceedings of the 4th International Conference on Communication and Information Processing. 107–111. ICCIP ’18. ACM (2018). https://doi.org/10.1145/3290420.3290438

  15. Li, Y., Huang, J., Ikusan, A., Mitchell, M., Zhang, J., Dai, R.: Shellbreaker: automatically detecting php-based malicious web shells. Comput. Secur. 87, 1–11 (2019). https://doi.org/10.1016/j.cose.2019.101595

    Article  Google Scholar 

  16. Lopes, C.V., Maj, P., Martins, P., Saini, V., Yang, D., Zitny, J., Sajnani, H., Vitek, J.: Déjàvu: a map of code duplicates on github. Proc. ACM Prog. Lang. (2017). https://doi.org/10.1145/3133908

    Article  Google Scholar 

  17. Lv, Z.H., Yan, H.B., Mei, R.: Automatic and accurate detection of webshell based on convolutional neural network. In: Proceedings of the 15th International Annual Conference on Cyber Security, pp. 73–85. CNCERT 2018, Springer Singapore (2019). https://doi.org/10.1007/978-981-13-6621-5_6

  18. Microsoft 365 Defender Research Team: Web shell attacks continue to rise, [online], available: (2021). https://www.microsoft.com/security/blog/2021/02/11/web-shell-attacks-continue-to-rise/

  19. Mumtaz, H., Alshayeb, M., Mahmood, S., Niazi, M.: An empirical study to improve software security through the application of code refactoring. Inf. Softw. Technol. 96, 112–125 (2018). https://doi.org/10.1016/j.infsof.2017.11.010

    Article  Google Scholar 

  20. Naderi-Afooshteh, A., Kwon, Y., Nguyen-Tuong, A., Bagheri-Marzijarani, M., Davidson, J.W.: Cubismo: Decloaking server-side malware via cubist program analysis. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 430–443. ACSAC ’19, ACM (2019). https://doi.org/10.1145/3359789.3359821

  21. OWASP: Owasp top 10: The ten most critical web application security risks. Tech. rep., OWASP Foundation (2017). https://owasp.org/www-project-top-ten/

  22. Qihoo 360: 360 total security: Protection antivirus gratuitet. [online], available: (2014). https://www.360totalsecurity.com

  23. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). https://doi.org/10.5555/2627435.2670313

    Article  MATH  Google Scholar 

  24. Starov, O., Dahse, J., Ahmad, S.S., Holz, T., Nikiforakis, N.: No honor among thieves: A large-scale analysis of malicious web shells. In: Proceedings of the 25th International Conference on World Wide Web, pp. 1021–1032. WWW ’16, ACM (2016). https://doi.org/10.1145/2872427.2882992

  25. Sun, X., Lu, X., Dai, H.: A matrix decomposition based webshell detection method. In: Proceedings of the 2017 International Conference on Cryptography, Security and Privacy, pp. 66–70. ICCSP ’17, ACM (2017). https://doi.org/10.1145/3058060.3058083

  26. Tu T.D., Guang C., Xiaojun G., Wubin P.: Webshell detection techniques in web applications. In: Proceedings of the fifth International Conference on Computing, Communications and Networking Technologies, pp. 1–7. ICCCNT’14, IEEE CS (2014). https://doi.org/10.1109/ICCCNT.2014.6963152

  27. VirusTotal: Free online virus, malware and url scanner, [online], available: (2016). https://www.virustotal.com/

  28. W3Techs: Usage statistics of server-side programming languages for websites, [online], available: (2021). https://w3techs.com/technologies/overview/programming_language

  29. Wainer, J., Cawley, G.: Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst. Appl. 182, 115222 (2021). https://doi.org/10.1016/j.eswa.2021.115222

    Article  Google Scholar 

  30. Wang, C., Yang, H., Zhao, Z., Gong, L., Li, Z.: The Research and Improvement in the Detection of PHP Variable WebShell based on Information Entropy. J. Comput. 28, 62–68 (2017). https://doi.org/10.3966/199115992017102805006

    Article  Google Scholar 

  31. Wrench, P., Irwin, B.: Detecting derivative malware samples using deobfuscation-assisted similarity analysis. SAIEE Africa Res. J. 107(2), 65–77 (2016). https://doi.org/10.23919/SAIEE.2016.8531543

    Article  Google Scholar 

  32. Wu, Y., Sun, Y., Huang, C., Jia, P., Liu, L., Schrittwieser, S.: Session-based webshell detection using machine learning in web logs. Secur. Commun. Netw. 2019, 1–11 (2019). https://doi.org/10.1155/2019/3093809

    Article  Google Scholar 

  33. Yadav, T., Rao, A.M.: Technical Aspects of Cyber Kill Chain. In: Proceedings of the International Symposium on Security in Computing and Communication, pp. 438–452. SSCC 2015. Springer (2015). https://doi.org/10.1007/978-3-319-22915-7_40

  34. Zhu, T., Weng, Z., Fu, L., Ruan, L.: A web shell detection method based on multiview feature fusion. Appl. Sci. 10(18), 6274 (2020). https://doi.org/10.3390/app10186274

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelhakim Hannousse.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any study with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 List of sources used for dataset collection

Sources used for collecting Webshells

1.:

https://github.com/tennc/webshell

2.:

https://github.com/JohnTroony/php-webshells

3.:

https://github.com/xl7dev/webshell

4.:

https://github.com/tutorial0/webshell

5.:

https://github.com/bartblaze/PHP-backdoors

6.:

https://github.com/BlackArch/webshells

7.:

https://github.com/nikicat/web-malware-collection

8.:

https://github.com/fuzzdb-project/fuzzdb

9.:

https://github.com/lcatro/PHP-webshell-Bypass-WAF

10.:

https://github.com/linuxsec/indoxploit-shell

11.:

https://github.com/b374k/b374k

12.:

https://github.com/LuciferoO/webshell-collector

13.:

https://github.com/malwares/webshell

14.:

https://github.com/tanjiti/webshell-Sample

15.:

https://github.com/JoyChou93/webshell

16.:

https://github.com/webshellpub/awsome-webshell

17.:

https://github.com/xypiie/webshell

18.:

https://github.com/leett1/Programe/

19.:

https://github.com/lhlsec/webshell

20.:

https://github.com/ysrc/webshell-sample

21.:

https://github.com/feihong-cs/JspMaster-Deprecated

22.:

https://github.com/threedr3am/JSP-Webshells

Sources used for collecting normal files

1.:

https://github.com/WordPress/WordPress

2.:

https://github.com/yiisoft/yii2

3.:

https://github.com/johnshen/PHPcms

4.:

https://github.com/joomla/joomla-cms

5.:

https://github.com/laravel/laravel

6.:

https://github.com/learnstartup/4tweb

7.:

https://github.com/phpmyadmin/phpmyadmin

8.:

https://github.com/rainrocka/xinhu

9.:

https://github.com/octobercms/october

10.:

https://github.com/alkacon/opencms-core

11.:

https://github.com/craftcms/cms

12.:

https://github.com/croogo/croogo

13.:

https://github.com/doorgets/CMS

14.:

https://github.com/smarty-php/smarty

15.:

https://github.com/source-trace/phpcms

16.:

https://github.com/symfony/symfony

17.:

https://github.com/typecho/typecho

18.:

https://github.com/leett1/Programe/

19.:

https://github.com/rpeterclark/aspunit

20.:

https://github.com/dluxem/LiberumASP

21.:

https://github.com/aspLite/aspLite

22.:

https://github.com/coldstone/easyasp

23.:

https://github.com/amasad/sane

24.:

https://github.com/sextondb/ClassicASPUnit

25.:

https://github.com/ASP-Ajaxed/asp-ajaxed

26.:

https://www.codewithc.com

27.:

https://www.kashipara.com

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hannousse, A., Nait-Hamoud, M.C. & Yahiouche, S. A deep learner model for multi-language webshell detection. Int. J. Inf. Secur. 22, 47–61 (2023). https://doi.org/10.1007/s10207-022-00615-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-022-00615-5

Keywords