Abstract
In this work, we evaluate the security of heuristic- and machine learning-based classifiers for the detection of malicious JavaScript code. Due to the prevalence of web attacks directed though JavaScript injected into webpages, such defense mechanisms serve as a last-line of defense by classifying individual scripts as either benign or malicious. State-of-the-art classifiers work well at distinguishing currently-known malicious scripts from existing legitimate functionality, often by employing training sets of known benign or malicious samples. However, we observe that real-world attackers can be adaptive, and tailor their attacks to the benign content of the page and the defense mechanisms being used to defend the page.
In this work, we consider a variety of techniques that an adaptive adversary may use to overcome JavaScript classifiers. We introduce a variety of new threat models that consider various types of adaptive adversaries, with varying knowledge of the classifier and dataset being used to detect malicious scripts. We show that while no heuristic defense mechanism is a silver bullet against an adaptive adversary, some techniques are far more effective than others. Thus, our work points to which techniques should be considered best practices in classifying malicious content, and a call to arms for more advanced classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Experimentally, we determined that increasing the maximum tree-edit distance above 20 results in a sharp increase in the detection rate for the generated samples. This applies to both classifiers in our examination.
- 2.
It is in principle possible to extend the algorithm with backtracking during the gadget search process, enabling it to generate an arbitrary number of variants.
References
GitHub - geeksonsecurity/js-malicious-dataset, December 2019. https://github.com/geeksonsecurity/js-malicious-dataset
Biggio, B., Roli, F.: Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recogn. 84, 317–331 (2018)
Calzavara, S., Rabitti, A., Bugliesi, M.: Content security problems?: evaluating the effectiveness of content security policy in the wild. In: CCS (2016)
Curtsinger, C., Livshits, B., Zorn, B.G., Seifert, C.: ZOZZLE: fast and precise in-browser javascript malware detection. In: USENIX Security Symposium (2011)
Demontis, A., et al.: Yes, machine learning can be more secure! a case study on android malware detection. IEEE Trans. Dependable Secure Comput. PP, 1 (2018)
Ersan, E., Malka, L., Kapron, B.M.: Semantically non-preserving transformations for antivirus evaluation. In: FPS (2016)
Falleri, J.R., Morandat, F., Blanc, X., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: ASE (2014)
Fass, A., Backes, M., Stock, B.: HideNoSeek: camouflaging malicious JavaScript in benign ASTs. In: CCS (2019)
Fass, A., Krawczyk, R.P., Backes, M., Stock, B.: JaSt: fully syntactic detection of malicious (Obfuscated) JavaScript. In: DIMVA (2018)
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: CCS (2015)
Giffin, J.T., Jha, S., Miller, B.P.: Automated discovery of mimicry attacks. In: RAID (2006)
Grosse, K., Papernot, N., Manoharan, P., Backes, M., McDaniel, P.: Adversarial examples for malware detection. In: ESORICS (2017)
Hao, Y., Liang, H., Zhang, D., Zhao, Q., Cui, B.: JavaScript malicious codes analysis based on naive bayes classification. In: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (2014)
Hu, W., Tan, Y.: Black-box attacks against RNN based malware detection algorithms. arXiv:1705.08131 [cs], May 2017. http://arxiv.org/abs/1705.08131
Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. arXiv:1702.05983 [cs], February 2017
Leyden, J.: Payment-card-skimming Magecart strikes again: Zero out of five for infecting e-retail sites, October 2018. https://www.theregister.com/2018/10/09/magecart_payment_card_malware/
Kantchelian, A., Tygar, J.D., Joseph, A.D.: Evasion and hardening of tree ensemble classifiers. In: ICML (2016)
Maiorca, D., Biggio, B., Chiappe, M.E., Giacinto, G.: Adversarial detection of flash malware: limitations and open issues. arXiv:1710.10225 [cs], October 2017
Pan, X., Cao, Y., Liu, S., Zhou, Y., Chen, Y., Zhou, T.: CSPAutoGen: black-box enforcement of content security policy upon real-world websites. In: CCS (2016)
Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: ACSAC (2010)
Rosenberg, I., Shabtai, A., Elovici, Y., Rokach, L.: Query-efficient GAN based black-box attack against sequence based machine and deep learning classifiers. arXiv:1804.08778 [cs] (Apr 2018), http://arxiv.org/abs/1804.08778
Rosenberg, I., Shabtai, A., Rokach, L., Elovici, Y.: Generic black-box end-to-end attack against state of the art API call based malware classifiers. In: RAID (2018)
Srndic, N., Laskov, P.: Practical evasion of a learning-based classifier: a case study. In: IEEE S&P (2014)
Staicu, C.A., Pradel, M., Livshits, B.: Synode: Understanding and automatically preventing injection attacks on node. js. In: NDSS (2018)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: CCS (2002)
Acknowledgments
We thank the anonymous reviewers and our shepherd, Yuan Zhang, for their insightful comments. We further thank: Louis Narmour and Devin Dennis for their help in building infrastructure and dataset; Bruce Kapron and Somesh Jha for informative early discussions on the problems tackled in this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Hansen, N., De Carli, L., Davidson, D. (2020). Assessing Adaptive Attacks Against Trained JavaScript Classifiers. In: Park, N., Sun, K., Foresti, S., Butler, K., Saxena, N. (eds) Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-030-63086-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-63086-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63085-0
Online ISBN: 978-3-030-63086-7
eBook Packages: Computer ScienceComputer Science (R0)