Abstract
There is a growing tendency for cybercriminals to abuse legitimate tools installed on the target computers for cyberattacks. In particular, the use of PowerShell provided by Microsoft has been increasing every year and has become a threat. In previous studies, a method to detect malicious PowerShell commands using character-level deep learning was proposed. The proposed method combines traditional natural language processing and character-level convolutional neural networks. This method, however, requires time for dynamic analysis. This paper proposes a method to classify unknown PowerShell without dynamic analysis. Our method uses feature vectors extracted from malicious and benign PowerShell scripts using word-level language models for classification. The datasets were generated from benign and malicious PowerShell scripts obtained from Hybrid Analysis, and benign PowerShell scripts obtained from GitHub, which are imbalanced. The experimental result shows that the combination of the LSI and XGBoost produces the highest detection rate. The maximum accuracy achieves approximately 0.95 on the imbalanced dataset. Furthermore, over 50% of unknown malicious PowerShell scripts could be detected in time series analysis without dynamic analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
GitHub. https://github.co.jp/
Hybrid Analysis. https://www.hybrid-analysis.com/
Powerdrive. https://github.com/denisugarte/PowerDrive
Virus Total. https://www.virustotal.com/
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). https://doi.org/10.1145/1961189.1961199
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Hendler, D., Kels, S., Rubin, A.: Detecting malicious PowerShell commands using deep neural networks. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, AsiaCCS 2018, Incheon, Republic of Korea, June 04–08, 2018, pp. 187–197 (2018). https://doi.org/10.1145/3196494.3196511
Ito, R., Mimura, M.: Detecting unknown malware from ASCII strings with natural language processing techniques. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 1–8 (2019)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014, pp. 1188–1196 (2014). http://proceedings.mlr.press/v32/le14.html
McAfee: McAfee labs threats report August 2019 (August 2019). https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-aug-2019.pdf
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301.3781
Mimura, M., Suga, Y.: Filtering malicious JavaScript code with doc2vec on an imbalanced dataset. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 24–31 (2019)
Mimura, M., Miura, H.: Detecting unseen malicious VBA macros with NLP techniques. JIP 27, 555–563 (2019). https://doi.org/10.2197/ipsjjip.27.555
Mimura, M., Ohminami, T.: Towards efficient detection of malicious VBA macros with LSI. In: Attrapadung, N., Yagi, T. (eds.) IWSEC 2019. LNCS, vol. 11689, pp. 168–185. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26834-3_10
Mimura, M., Ohminami, T.: Using LSI to detect unknown malicious VBA macros. J. Inf. Process. 28 (2020)
Miura, H., Mimura, M., Tanaka, H.: Macros finder: do you remember LOVELETTER? In: Su, C., Kikuchi, H. (eds.) ISPEC 2018. LNCS, vol. 11125, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99807-7_1
Ndichu, S., Kim, S., Ozawa, S., Misu, T., Makishima, K.: A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors. Appl. Soft Comput. 84, 105721 (2019). https://doi.org/10.1016/j.asoc.2019.105721
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://dl.acm.org/citation.cfm?id=2078195
Rubin, A., Kels, S., Hendler, D.: AMSI-based detection of malicious PowerShell code using contextual embeddings. arXiv e-prints arXiv:1905.09538 (May 2019)
Rusak, G., Al-Dujaili, A., O’Reilly, U.: AST-based deep learning for detecting malicious PowerShell. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15–19, 2018, pp. 2276–2278 (2018). https://doi.org/10.1145/3243734.3278496
Symantec: Symantec 2019 Internet security threat report (February 2019). https://docs.broadcom.com/docs/istr-24-2019-en
Ugarte, D., Maiorca, D., Cara, F., Giacinto, G.: PowerDrive: accurate de-obfuscation and analysis of PowerShell malware. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 240–259. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_12
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tajiri, Y., Mimura, M. (2020). Detection of Malicious PowerShell Using Word-Level Language Models. In: Aoki, K., Kanaoka, A. (eds) Advances in Information and Computer Security. IWSEC 2020. Lecture Notes in Computer Science(), vol 12231. Springer, Cham. https://doi.org/10.1007/978-3-030-58208-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-58208-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58207-4
Online ISBN: 978-3-030-58208-1
eBook Packages: Computer ScienceComputer Science (R0)