JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript

Fass, Aurore; Krawczyk, Robert P.; Backes, Michael; Stock, Ben

doi:10.1007/978-3-319-93411-2_14

JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript

Aurore Fass¹⁶,
Robert P. Krawczyk¹⁷,
Michael Backes¹⁸ &
…
Ben Stock¹⁸

Conference paper
First Online: 08 June 2018

2256 Accesses
41 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10885))

Abstract

JavaScript is a browser scripting language initially created to enhance the interactivity of web sites and to improve their user-friendliness. However, as it offloads the work to the user’s browser, it can be used to engage in malicious activities such as Crypto Mining, Drive-by Download attacks, or redirections to web sites hosting malicious software. Given the prevalence of such nefarious scripts, the anti-virus industry has increased the focus on their detection. The attackers, in turn, make increasing use of obfuscation techniques, so as to hinder analysis and the creation of corresponding signatures. Yet these malicious samples share syntactic similarities at an abstract level, which enables to bypass obfuscation and detect even unknown malware variants.

In this paper, we present JaSt, a low-overhead solution that combines the extraction of features from the abstract syntax tree with a random forest classifier to detect malicious JavaScript instances. It is based on a frequency analysis of specific patterns, which are either predictive of benign or of malicious samples. Even though the analysis is entirely static, it yields a high detection accuracy of almost 99.5% and has a low false-negative rate of 0.54%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Malware don’t need Coffee, https://malware.dontneedcoffee.com.
2.
Alexa top sites, http://www.alexa.com/topsites.

References

Atom: Atom the hackable text editor for the 21st Century. https://atom.io. Accessed 21 Feb 2018
Backes, M., Nauman, M.: LUNA: quantifying and leveraging uncertainty in android malware analysis through Bayesian machine learning. In: Euro S&P (2017)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: International Conference on World Wide Web (2011)
Google Scholar
Cao, Y., Pan, X., Chen, Y., Zhuge, J.: JShield: towards real-time and vulnerability-based detection of polluted drive-by download attacks. In: Annual Computer Security Applications Conference (ACSAC) (2014)
Google Scholar
Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: fast and precise in-browser javascript malware detection. In: USENIX (2011)
Google Scholar
Gastwirth, J.L.: The estimation of the Lorenz curve and Gini index. Rev. Econ. Stat. 54, 306–316 (1972)
Article MathSciNet Google Scholar
Hao, Y., Liang, H., Zhang, D., Zhao, Q., Cui, B.: JavaScript malicious codes analysis based on naive Bayes classification. In: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (2014)
Google Scholar
Hidayat, A.: ECMAScript Parsing Infrastructure for Multipurpose Analysis. http://esprima.org. Accessed 05 Apr 2017
AV-TEST - The Independent IT-Security Institute: New malware. https://www.av-test.org/en/statistics/malware. Accessed 01 Feb 2018
Invernizzi, L., Benvenuti, S., Cova, M., Comparetti, P.M., Kruegel, C., Vigna, G.: EvilSeed: a guided approach to finding malicious web pages. In: S&P (2012)
Google Scholar
Joseph, A.D., Laskov, P., Roli, F., Tygar, J.D., Nelson, B.: Machine learning methods for computer security. In: Dagstuhl Manifestos (2013)
Google Scholar
Jules, D.S.: JS inspect Detect copy-pasted and structurally similar code. https://github.com/danielstjules/jsinspect. Accessed 19 Feb 2018
Kantchelian, A., Tygar, J.D., Joseph, A.D.: Evasion and hardening of tree ensemble classifiers. In: International Conference on Machine Learning (2016)
Google Scholar
Kaplan, S., Livshits, B., Zorn, B., Siefert, C., Curtsinger, C.: “NoFus: Automatically Detecting” + String.fromCharCode(32) + “ObFuSCateD ”. toLowerCase() + “JavaScript Code”. Microsoft Research Technical Report (2011)
Google Scholar
Kapravelos, A., Shoshitaishvili, Y., Cova, M., Krügel, C., Vigna, G..: Revolver: an automated approach to the detection of evasive web-based malware. In: USENIX (2013)
Google Scholar
Kar, D., Panigrahi, S., Sundararajan, S.: SQLiGot: detecting SQL injections attacks using graph of tokens and SVM. Comput. Secur. 60, 206–225 (2016)
Article Google Scholar
Kolbitsch, C., Livshits, B., Zorn, B., Seifert, C.: Rozzle: de-cloaking internet malware. In: S&P (2012)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Laskov, P., Šrndić, N.: Static detection of malicious javascript-bearing pdf documents. In: Annual Computer Security Applications Conference (ACSAC) (2011)
Google Scholar
Likarish, P., Jung, E., Jo, I.: Obfuscated malicious javascript detection using classification techniques. In: International Conference on Malicious and Unwanted Software (MALWARE) (2009)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2, 37–63 (2011)
Google Scholar
Rao, V., Hande, K.: A comparative study of static, dynamic and hybrid analysis techniques for android malware detection. Int. J. Eng. Dev. Res. (IJEDR) 5, 1433–1436 (2017)
Google Scholar
Symantec Security Response: Mirai: what you need to know about the botnet behind recent major DDoS attacks. https://www.symantec.com/connect/blogs/mirai-what-you-need-know-about-botnet-behind-recent-major-ddos-attacks. Accessed 02 Feb 2018
Symantec Security Response: Petya ransomware outbreak: Here is what you need to know. https://www.symantec.com/blogs/threat-intelligence/petya-ransomware-wiper. Accessed 14 Feb 2018
Symantec Security Response: What you need to know about the WannaCry Ransomware. https://www.symantec.com/blogs/threat-intelligence/wannacry-ransomware-attack. Accessed 14 Feb 2018
Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: Annual Computer Security Applications Conference (ACSAC) (2010)
Google Scholar
Stock, B., Livshits, B., Zorn, B.: Kizzle: a signature compiler for detecting exploit kits. In: Dependable Systems and Networks (DSN) (2016)
Google Scholar
Šrndić, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: NDSS (2013)
Google Scholar
Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: a content anomaly detector resistant to mimicry attack. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 226–248. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_12
Chapter Google Scholar
Wisse, W., Veenman, C.J.: Scripting DNA: identifying the javascript programmer. Digit. Investig. 15, 61–71 (2015)
Article Google Scholar
Wressnegger, C., Schwenk, G., Arp, D., Rieck, K.: A close look on n-grams in intrusion detection: anomaly detection vs. classification. In: ACM Workshop on Artificial Intelligence and Security (AISec) (2013)
Google Scholar
Xu, W., Zhang, F., Zhu, S.: The power of obfuscation techniques in malicious javascript code: a measurement study. In: International Conference on Malicious and Unwanted Software (MALWARE) (2012)
Google Scholar
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers: a case study on pdf malware classifiers. In: NDSS (2016)
Google Scholar
Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Annual Computer Security Applications Conference (ACSAC) (2012)
Google Scholar
Youden, W.J.: Index for rating diagnostic tests. Cancer 3, 32–35 (1950)
Article Google Scholar

Download references

Acknowledgments

This work would not have been possible without the help of the German Federal Office for Information Security and Kafeine DNC which provided us with materials for our experiments. We would also like to thank the anonymous reviewers of this paper for their well-appreciated feedback. This work was partially supported by the German Federal Ministry of Education and Research (BMBF) through funding for the Center for IT-Security, Privacy and Accountability (CISPA) (FKZ: 16KIS0345).

Author information

Authors and Affiliations

CISPA, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Aurore Fass
German Federal Office for Information Security (BSI), Bonn, Germany
Robert P. Krawczyk
CISPA Helmholtz Center i.G., Saarland Informatics Campus, Saarbrücken, Germany
Michael Backes & Ben Stock

Authors

Aurore Fass
View author publications
You can also search for this author in PubMed Google Scholar
Robert P. Krawczyk
View author publications
You can also search for this author in PubMed Google Scholar
Michael Backes
View author publications
You can also search for this author in PubMed Google Scholar
Ben Stock
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aurore Fass .

Editor information

Editors and Affiliations

Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Cristiano Giuffrida
CEA, Palaiseau, France
Sébastien Bardin
Université Paris-Saclay, Evry, France
Gregory Blanc

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fass, A., Krawczyk, R.P., Backes, M., Stock, B. (2018). JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript. In: Giuffrida, C., Bardin, S., Blanc, G. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2018. Lecture Notes in Computer Science(), vol 10885. Springer, Cham. https://doi.org/10.1007/978-3-319-93411-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-93411-2_14
Published: 08 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93410-5
Online ISBN: 978-3-319-93411-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics