research-article

PSP-Mal: Evading Malware Detection via Prioritized Experience-based Reinforcement Learning with Shapley Prior

Authors:
Dazhi Zhan

Army Engineering University of PLA, China

Army Engineering University of PLA, China

0000-0003-2766-3405
View Profile

,
Wei Bai

Army Engineering University of PLA, China

Army Engineering University of PLA, China

0000-0002-9738-0112
View Profile

,
Xin Liu

Army Engineering University of PLA, China

Army Engineering University of PLA, China

0000-0003-3051-4793
View Profile

,
Yue Hu

National University of Defense Technology, China

National University of Defense Technology, China

0000-0002-8115-7020
View Profile

,
Lei Zhang

Academy of Military Sciences, China

Academy of Military Sciences, China

0000-0001-8746-1106
View Profile

,
Shize Guo

Army Engineering University of PLA, China

Army Engineering University of PLA, China

0009-0007-9678-6632
View Profile

,
Zhisong Pan

Army Engineering University of PLA, China

Army Engineering University of PLA, China

0000-0001-8615-7313
View Profile

ACSAC '23: Proceedings of the 39th Annual Computer Security Applications ConferenceDecember 2023Pages 580–593https://doi.org/10.1145/3627106.3627178

Published:04 December 2023Publication History

ACSAC '23: Proceedings of the 39th Annual Computer Security Applications Conference

Pages 580–593

ABSTRACT

With the widespread application of machine learning techniques in malware detection, researchers have proposed various adversarial attack methods to generate adversarial examples (AEs) of malware, thereby evading detection. Previous studies have shown that the reinforcement learning (RL) framework can enable black-box attacks by performing a sequence of function-preserving operations, which produces functional evasive malware samples. However, it is difficult to obtain the useful guidance and feedbacks from the environment for agent training in the black-box scenario, which results in the RL framework being unable to learn the effective evasion policy. In this paper, we propose the Shapley prior and establish a prior-guidance-based RL framework, namely PSP-Mal, to generate AEs against Portable Executable (PE) malware detectors. Our framework improves on existing methods in three aspects: 1) We explore feature effects of the black-box model by computing Shapley values and further propose the Shapley prior to represent the expected impact of operations. 2) A novel prioritized experience utilization mechanism is established regarding the Shapley prior guidance in the RL framework. 3) The actions are expanded into item-content pairs and we use the Thompson sampling to choose effective content, which helps to reduce randomness and ensure repeatability. We compare the attack performance of our framework with other methods, and experimental results demonstrate that our algorithm is more effective. The evasion rates of PSP-Mal against the LightGBM models trained on EMBER and SOREL-20M reach 76.88% and 72.03%, respectively.

References

Naveed Akhtar and Ajmal Mian. 2018. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access 6 (2018), 14410–14430.Google ScholarCross Ref
Hyrum S Anderson, Anant Kharkar, Bobby Filar, and Phil Roth. 2017. Evading machine learning malware detection. Black Hat 2017 (2017).Google Scholar
Hyrum S Anderson and Phil Roth. 2018. Ember: an open dataset for training static pe malware machine learning models. arXiv preprint arXiv:1804.04637 (2018).Google Scholar
Zahra Bazrafshan, Hashem Hashemi, Seyed Mehdi Hazrati Fard, and Ali Hamzeh. 2013. A survey on heuristic malware detection techniques. In the 5th Conference on Information and Knowledge Technology. IEEE, 113–120.Google ScholarCross Ref
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE Security and Privacy Workshops (SPW). IEEE, 1–7.Google ScholarCross Ref
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in Neural Information Processing Systems 24 (2011).Google Scholar
Bingcai Chen, Zhongru Ren, Chao Yu, Iftikhar Hussain, and Jintao Liu. 2019. Adversarial examples for cnn-based malware detectors. IEEE Access 7 (2019), 54360–54371.Google ScholarCross Ref
Jun Chen, Jingfei Jiang, Rongchun Li, and Yong Dou. 2020. Generating adversarial examples for static PE malware detector based on deep reinforcement learning. In Journal of Physics: Conference Series, Vol. 1575. IOP Publishing, 012011.Google Scholar
George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3422–3426.Google ScholarCross Ref
Luca Demetrio, Battista Biggio, Giovanni Lagorio, Fabio Roli, and Alessandro Armando. 2019. Explaining vulnerabilities of deep learning to adversarial malware binaries. Italian Conference on Cybersecurity (2019).Google Scholar
Luca Demetrio, Battista Biggio, Giovanni Lagorio, Fabio Roli, and Alessandro Armando. 2021. Functionality-preserving black-box optimization of adversarial windows malware. IEEE Transactions on Information Forensics and Security 16 (2021), 3469–3478.Google ScholarCross Ref
Luca Demetrio, Scott E Coull, Battista Biggio, Giovanni Lagorio, Alessandro Armando, and Fabio Roli. 2021. Adversarial exemples: a survey and experimental evaluation of practical attacks on machine learning for windows malware detection. ACM Transactions on Privacy and Security 24, 4 (2021), 1–31.Google ScholarDigital Library
Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2020. Sirenattack: Generating adversarial audio for end-to-end acoustic systems. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 357–369.Google ScholarDigital Library
Mohammadreza Ebrahimi, Jason Pacheco, Weifeng Li, James Lee Hu, and Hsinchun Chen. 2021. Binary Black-Box Attacks Against Static Malware Detectors with Reinforcement Learning in Discrete Action Spaces. In IEEE Security and Privacy Workshops (SPW). IEEE, 85–91.Google Scholar
Yong Fang, Yuetian Zeng, Beibei Li, Liang Liu, and Lei Zhang. 2020. DeepDetectNet vs RLAttackNet: An adversarial method to improve deep learning-based static malware detection model. Plos One 15, 4 (2020), e0231626.Google ScholarCross Ref
Zhiyang Fang, Junfeng Wang, Jiaxuan Geng, Yingjie Zhou, and Xuan Kan. 2021. A3CMal: Generating adversarial samples to force targeted misclassification by reinforcement learning. Applied Soft Computing 109 (2021), 107505.Google ScholarDigital Library
Zhiyang Fang, Junfeng Wang, Boya Li, Siqi Wu, Yingjie Zhou, and Haiying Huang. 2019. Evading anti-malware engines with deep reinforcement learning. IEEE Access 7 (2019), 48867–48879.Google ScholarCross Ref
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001), 1189–1232.Google Scholar
Daniel Gibert, Matt Fredrikson, Carles Mateu, Jordi Planes, and Quan Le. 2022. Enhancing the insertion of NOP instructions to obfuscate malware via deep reinforcement learning. Computers & Security 113 (2022), 102543.Google ScholarDigital Library
Daniel Gibert, Carles Mateu, Jordi Planes, and Ramon Vicens. 2018. Classification of malware by using structural entropy on convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
Richard Harang and Ethan M Rudd. 2020. SOREL-20M: A large scale benchmark dataset for malicious PE detection. arXiv preprint arXiv:2012.07634 (2020).Google Scholar
Weiwei Hu and Ying Tan. 2017. Black-box attacks against RNN based malware detection algorithms. arXiv preprint arXiv:1705.08131 (2017).Google Scholar
Masataka Kawai, Kaoru Ota, and Mianxing Dong. 2019. Improved malgan: Avoiding malware detector by leaning cleanware features. In the International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 040–045.Google ScholarCross Ref
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30 (2017).Google ScholarDigital Library
Aminollah Khormali, Ahmed Abusnaina, Songqing Chen, DaeHun Nyang, and Aziz Mohaisen. 2019. COPYCAT: practical adversarial attacks on visualization-based malware detection. arXiv preprint arXiv:1909.09735 (2019).Google Scholar
Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, and Maiorca. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In the 26th European Signal Processing Conference (EUSIPCO). IEEE, 533–537.Google ScholarCross Ref
Jeremy Z Kolter and Marcus A Maloof. 2004. Learning to detect malicious executables in the wild. In the 10th ACM International Conference on Knowledge Discovery and Data Mining. 470–478.Google ScholarDigital Library
Felix Kreuk, Assi Barak, and Aviv-Reuven. 2018. Deceiving end-to-end deep learning malware detectors using adversarial examples. arXiv preprint arXiv:1802.04528 (2018).Google Scholar
Raphael Labaca-Castro, Sebastian Franz, and Gabi Dreo Rodosek. 2021. AIMED-RL: Exploring adversarial malware examples with reinforcement learning. In Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track: European Conference, ECML PKDD 2021. Springer, 37–52.Google ScholarDigital Library
Xintong Li and Qi Li. 2021. An IRL-based malware adversarial generation method to evade anti-malware engines. Computers & Security 104 (2021), 102118.Google ScholarDigital Library
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
Christoph Molnar. 2020. Interpretable machine learning. Lulu. com.Google Scholar
Andrew Y Ng, Stuart Russell, 2000. Algorithms for inverse reinforcement learning.. In International Conference on Machine Learning, Vol. 1. 2.Google Scholar
Fabio Pierazzi, Feargus Pendlebury, Jacopo Cortellazzi, and Lorenzo Cavallaro. 2020. Intriguing properties of adversarial ml attacks in the problem space. In IEEE Symposium on Security and Privacy (SP). IEEE, 1332–1349.Google ScholarCross Ref
Tony Quertier, Benjamin Marais, Stéphane Morucci, and Bertrand Fournel. 2022. MERLIN–Malware Evasion with Reinforcement LearnINg. arXiv preprint arXiv:2203.12980 (2022).Google Scholar
Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles K Nicholas. 2018. Malware detection by eating a whole exe. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Ishai Rosenberg, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2018. Generic black-box end-to-end attack against state of the art API call based malware classifiers. In the 21st International Symposium Research in Attacks, Intrusions and Defense. Springer, 490–510.Google ScholarCross Ref
V Sai Sathyanarayan, Pankaj Kohli, and Bezawada Bruhadeshwar. 2008. Signature generation and detection of malware families. In Information Security and Privacy: 13th Australasian Conference. Springer, 336–349.Google ScholarDigital Library
Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In the 10th International Conference on Malicious and Unwanted Software. IEEE, 11–20.Google ScholarDigital Library
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).Google Scholar
Giorgio Severi, Jim Meyer, Scott E Coull, and Alina Oprea. 2021. Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers.. In USENIX Security Symposium. 1487–1504.Google Scholar
Wei Song, Xuezixiang Li, Sadia Afroz, Deepali Garg, Dmitry Kuznetsov, and Heng Yin. 2020. Mab-malware: A reinforcement learning framework for attacking static malware classifiers. arXiv preprint arXiv:2003.03100 (2020).Google Scholar
Octavian Suciu, Scott E Coull, and Jeffrey Johns. 2019. Exploring adversarial examples in malware detection. In IEEE Security and Privacy Workshops (SPW). IEEE, 8–14.Google ScholarCross Ref
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google Scholar
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarCross Ref
Xiruo Wang and Risto Miikkulainen. 2020. MDEA: Malware detection with evolutionary adversarial learning. In IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8.Google ScholarDigital Library
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning. PMLR, 1995–2003.Google Scholar
Cangshuai Wu, Jiangyong Shi, Yuexiang Yang, and Wenhua Li. 2018. Enhancing machine learning based malware detection model by reinforcement learning. In Proceedings of the 8th International Conference on Communication and Network Security. 74–78.Google ScholarDigital Library
Di Wu, Binxing Fang, Junnan Wang, Qixu Liu, and Xiang Cui. 2019. Evading machine learning botnet detection models via deep reinforcement learning. In IEEE International Conference on Communications (ICC). IEEE, 1–6.Google ScholarCross Ref
Ilsun You and Kangbin Yim. 2010. Malware obfuscation techniques: A brief survey. In the International Conference on Broadband, Wireless Computing, Communication and Applications. IEEE, 297–300.Google ScholarDigital Library
Junkun Yuan, Shaofang Zhou, Lanfen Lin, Feng Wang, and Jia Cui. 2020. Black-box adversarial attacks against deep learning based malware binaries detection with GAN. In the European Conference on Artificial Intelligence. IOS Press, 2536–2542.Google Scholar
Dazhi Zhan, Yexin Duan, Yue Hu, Lujia Yin, Zhisong Pan, and Shize Guo. 2023. AMGmal: Adaptive mask-guided adversarial attack against malware detection with minimal perturbation. Computers & Security 127 (2023), 103103.Google ScholarDigital Library
Lan Zhang, Peng Liu, Yoon-Ho Choi, and Ping Chen. 2022. Semantics-preserving reinforcement learning attack against graph neural networks for malware detection. IEEE Transactions on Dependable and Secure Computing 20, 2 (2022), 1390–1402.Google ScholarDigital Library
Fangtian Zhong, Pengfei Hu, Guoming Zhang, Hong Li, and Xiuzhen Cheng. 2022. Reinforcement learning based adversarial malware example generation against black-box detectors. Computers & Security 121 (2022), 102869.Google ScholarDigital Library

Index Terms

PSP-Mal: Evading Malware Detection via Prioritized Experience-based Reinforcement Learning with Shapley Prior
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Leveraging Reinforcement Learning and Generative Adversarial Networks to Craft Mutants of Windows Malware against Black-box Malware Detectors
SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

To build an effective malware detector, it is required to collect a diversity of malware samples and their evolution, since malware authors always try to evade detectors through strategies of malware mutation. So, this paper explores the ability to ...
Read More
Enhancing Machine Learning Based Malware Detection Model by Reinforcement Learning
ICCNS '18: Proceedings of the 8th International Conference on Communication and Network Security

Malware detection is getting more and more attention due to the rapid growth of new malware. As a result, machine learning (ML) has become a popular way to detect malware variants. However, machine learning models can also be cheated. Through ...
Read More
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISec

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ACSAC '23: Proceedings of the 39th Annual Computer Security Applications Conference
December 2023
836 pages
ISBN:9798400708862
DOI:10.1145/3627106

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 December 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Results Reproduced / v1.1
Author Tags
Shapley value.
adversarial example
evasion attack
malware detection
prioritized experience replay
reinforcement learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate104of497submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

PSP-Mal: Evading Malware Detection via Prioritized Experience-based Reinforcement Learning with Shapley Prior

ACSAC '23: Proceedings of the 39th Annual Computer Security Applications Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Leveraging Reinforcement Learning and Generative Adversarial Networks to Craft Mutants of Windows Malware against Black-box Malware Detectors

Enhancing Machine Learning Based Malware Detection Model by Reinforcement Learning

Malware detection using adaptive data compression