skip to main content
10.1145/3374664.3375741acmconferencesArticle/Chapter ViewAbstractPublication PagescodaspyConference Proceedingsconference-collections
research-article
Public Access

Deceiving Portable Executable Malware Classifiers into Targeted Misclassification with Practical Adversarial Examples

Published: 16 March 2020 Publication History

Abstract

Due to voluminous malware attacks in the cyberspace, machine learning has become popular for automating malware detection and classification. In this work we play devil's advocate by investigating a new type of threats aimed at deceiving multi-class Portable Executable (PE) malware classifiers into targeted misclassification with practical adversarial samples. Using a malware dataset with tens of thousands of samples, we construct three types of PE malware classifiers, the first one based on frequencies of opcodes in the disassembled malware code (opcode classifier), the second one the list of API functions imported by each PE sample (API classifier), and the third one the list of system calls observed in dynamic execution (system call classifier). We develop a genetic algorithm augmented with different support functions to deceive these classifiers into misclassifying a PE sample into any target family. Using an Rbot malware sample whose source code is publicly available, we are able to create practical adversarial samples that can deceive the opcode classifier into targeted misclassification with a successful rate of 75%, the API classifier with a successful rate of 83.3%, and the system call classifier with a successful rate of 91.7%.

References

[1]
https://scipy.github.io/devdocs/generated/scipy.spatial.distance.jensenshannon.html.
[2]
Cuckoo Sandbox. https://cuckoosandbox.org.
[3]
FakeNet. https://github.com/fireeye/flare-fakenet-ng.
[4]
IDA Pro. https://www.hex-rays.com/.
[5]
Microsoft Malware Classification Challenge (BIG 2015). https://www.kaggle.com/c/malware-classification.
[6]
Pin - A Dynamic Binary Instrumentation Tool. https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool.
[7]
Rbot source code 0.0.3. https://github.com/ytisf/theZoo/tree/master/malwares/Source/Original/rBot0.3.3_May2004.
[8]
Stuxnet. https://en.wikipedia.org/wiki/Stuxnet.
[9]
VirusShare.com - Because Sharing is Caring. https://virusshare.com/.
[10]
VirusTotal. https://www.virustotal.com/.
[11]
Y. Aafer, W. Du, and H. Yin. Droidapiminer: Mining api-level features for robust malware detection in android. In International Conference on Security and Privacy in Communication Systems, pages 86--103. Springer, 2013.
[12]
A. Abusnaina, A. Khormali, H. Alasmary, J. Park, A. Anwar, U. Meteriz, and A. Mohaisen. Examining adversarial learning against graph-based iot malware detection systems. arXiv preprint arXiv:1902.04416, 2019.
[13]
F. Ahmed, H. Hameed, M. Z. Shafiq, and M. Farooq. Using spatio-temporal information in api calls with machine learning algorithms for malware detection. In Proceedings of the ACM workshop on Security and artificial intelligence, 2009.
[14]
A. Al-Dujaili, A. Huang, E. Hemberg, and U.-M. O'Reilly. Adversarial deep learning for robust detection of binary encoded malware. In 2018 IEEE Security and Privacy Workshops (SPW), pages 76--82. IEEE, 2018.
[15]
H. Anderson, A. Kharkar, B. Filar, D. Evans, and P. Roth. Learning to evade static PE machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917, 2018.
[16]
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium, 2014.
[17]
AV-TEST. Malware statistics & trends report. https://www.av-test.org/en/statistics/malware/, Accessed in March 2018.
[18]
B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. vS rndić, P. Laskov, G. Giacinto, and F. Roli. Evasion attacks against machine learning at test time. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2013.
[19]
D. Bilar. Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 1(2):156--168, 2007.
[20]
D. Canali, A. Lanzi, D. Balzarotti, C. Kruegel, M. Christodorescu, and E. Kirda. A quantitative study of accuracy in system call-based malware detection. In International Symposium on Software Testing and Analysis. ACM, 2012.
[21]
R. J. Canzanese Jr. Detection and Classification of Malicious Processes Using System Call Analysis. PhD thesis, Drexel University, 2015.
[22]
N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39--57. IEEE, 2017.
[23]
S. Cesare and Y. Xiang. Classification of malware using structured control flow. In Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing-Volume 107, pages 61--70. Australian Computer Society, Inc., 2010.
[24]
P.-Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.-J. Hsieh. Ead: elastic-net attacks to deep neural networks via adversarial examples. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[25]
A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, and F. Roli. Yes, machine learning can be more secure! a case study on android malware detection. IEEE Transactions on Dependable and Secure Computing, 2019.
[26]
Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9185--9193, 2018.
[27]
I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[28]
K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel. Adversarial examples for malware detection. In European Symposium on Research in Computer Security, pages 62--79. Springer, 2017.
[29]
S. Hou, Y. Ye, Y. Song, and M. Abdulhayoglu. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In Proceedings of ACM Conference on Knowledge Discovery and Data Mining, pages 1507--1515, 2017.
[30]
W. Hu and Y. Tan. Generating adversarial malware examples for black-box attacks based on gan. arXiv preprint arXiv:1702.05983, 2017.
[31]
P. Junod, J. Rinaldini, J. Wehrli, and J. Michielin. Obfuscator-LLVM -- software protection for the masses. In B. Wyseur, editor, Proceedings of the IEEE/ACM 1st International Workshop on Software Protection, 2015.
[32]
B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert, and F. Roli. Adversarial malware binaries: Evading deep learning for malware detection in executables. In European Signal Processing Conference. IEEE, 2018.
[33]
J. Z. Kolter and M. A. Maloof. Learning to detect malicious executables in the wild. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 470--478. ACM, 2004.
[34]
D. Kong and G. Yan. Discriminant malware distance learning on structural information for automated malware classification. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.
[35]
F. Kreuk, A. Barak, S. Aviv-Reuven, M. Baruch, B. Pinkas, and J. Keshet. Deceiving end-to-end deep learning malware detectors using adversarial examples. arXiv preprint arXiv:1802.04528, 2018.
[36]
J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145--151, 1991.
[37]
X. Liu, Y. Lin, H. Li, and J. Zhang. Adversarial examples: Attacks on machine learning-based malware visualization detection methods. arXiv preprint arXiv:1808.01546, 2018.
[38]
S. B. Mehdi, A. K. Tanwani, and M. Farooq. Imad: in-execution malware analysis and detection. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1553--1560. ACM, 2009.
[39]
N. Miramirkhani, M. P. Appini, N. Nikiforakis, and M. Polychronakis. Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. In 2017 IEEE Symposium on Security and Privacy (SP), pages 1009--1024. IEEE, 2017.
[40]
A. Mohaisen, O. Alrawi, and M. Mohaisen. Amal: High-fidelity, behavior-based automated malware analysis and classification. Computers & Security, 52, 2015.
[41]
S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[42]
R. Moskovitch, C. Feher, N. Tzachar, E. Berger, M. Gitelman, S. Dolev, and Y. Elovici. Unknown malcode detection using opcode representation. In European conference on intelligence and security informatics. Springer, 2008.
[43]
N. Nissim, A. Cohen, C. Glezer, and Y. Elovici. Detection of malicious pdf files and directions for enhancements: a state-of-the art survey. Computers & Security, 48:246--266, 2015.
[44]
N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2016.
[45]
R. Perdisci, A. Lanzi, and W. Lee. Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executables. In Proceedings of the Annual Computer Security Applications Conference (ACSAC), 2008.
[46]
K. Raman. Selecting features to classify malware. In Proceedings of InfoSec Southwest, 2012.
[47]
K. Rieck, T. Holz, C. Willems, P. Düssel, and P. Laskov. Learning and classification of malware behavior. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2008.
[48]
I. Rosenberg, A. Shabtai, L. Rokach, and Y. Elovici. Generic black-box end-to-end attack against state of the art api call based malware classifiers. In International Symposium on Research in Attacks, Intrusions, and Defenses.
[49]
C. Rossow, C. J. Dietrich, C. Grier, C. Kreibich, V. Paxson, N. Pohlmann, H. Bos, and M. Van Steen. Prudent practices for designing malware experiments: Status quo and outlook. In 2012 IEEE Symposium on Security and Privacy. IEEE, 2012.
[50]
J. Sahs and L. Khan. A machine learning approach to android malware detection. In 2012 European Intelligence and Security Informatics Conference. IEEE, 2012.
[51]
A. Sami, B. Yadegari, H. Rahimi, N. Peiravian, S. Hashemi, and A. Hamze. Malware detection based on mining api calls. In Proceedings of the ACM Symposium on Applied Computing. ACM, 2010.
[52]
M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo. Data mining methods for detection of new malicious executables. In Proceedings of the IEEE Symposium on Security and Privacy, 2001.
[53]
A. Shabtai, R. Moskovitch, C. Feher, S. Dolev, and Y. Elovici. Detecting unknown malicious code by applying classification techniques on opcode patterns. Security Informatics, 1(1):1, 2012.
[54]
M. Z. Shafiq, S. M. Tabish, F. Mirza, and M. Farooq. PE-Miner: Mining structural information to detect malicious executables in realtime. In International Symposium on Recent Advances in Intrusion Detection. Springer-Verlag, 2009.
[55]
F. Shahzad and M. Farooq. Elf-miner: Using structural knowledge and data mining methods to detect new (linux) malicious executables. Knowledge and information systems, 30(3):589--612, 2012.
[56]
C. Smutz and A. Stavrou. Malicious pdf detection using metadata and structural features. In Annual Computer Security Applications Conference. ACM, 2012.
[57]
C. Smutz and A. Stavrou. When a tree falls: Using diversity in ensemble classifiers to identify evasion in malware detectors. In Proceedings of the Network and Distributed System Security Symposium, 2016.
[58]
N. vS rndić and P. Laskov. Practical evasion of a learning-based classifier: A case study. In Proceedings of the IEEE symposium on security and privacy, 2014.
[59]
N. vS rndić and P. Laskov. Hidost: a static machine-learning-based detector of malicious files. EURASIP Journal on Information Security, 2016(1):22, 2016.
[60]
M. Suenaga. A museum of API obfuscation on win32. Symantec Security Response, 2009.
[61]
Y. Vorobeychik and M. Kantarcioglu. Adversarial machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3):1--169, 2018.
[62]
W. Xu, Y. Qi, and D. Evans. Automatically evading classifiers. In Proceedings of the 2016 Network and Distributed Systems Symposium, 2016.
[63]
G. Yan, N. Brown, and D. Kong. Exploring discriminatory features for automated malware classification. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pages 41--61. Springer, 2013.
[64]
W. Yang, D. Kong, T. Xie, and C. A. Gunter. Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps. In Proceedings of the 33rd Annual Computer Security Applications Conference, 2017.
[65]
Y. Ye, D. Wang, T. Li, and D. Ye. IMDS: Intelligent malware detection system. In International conference on Knowledge Discovery and Data Mining, 2007.

Cited By

View all
  • (2024)Explainability-Informed Targeted Malware Misclassification2024 33rd International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN61486.2024.10637629(1-8)Online publication date: 29-Jul-2024
  • (2023)An Evaluation of Real-time Malware Detection in IoT Devices: Comparison of Machine Learning Algorithms with RapidMiner2023 IEEE International Conference on Electro Information Technology (eIT)10.1109/eIT57321.2023.10187265(077-082)Online publication date: 18-May-2023
  • (2023)Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179347(719-736)Online publication date: May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CODASPY '20: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy
March 2020
392 pages
ISBN:9781450371070
DOI:10.1145/3374664
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial machine learning
  2. malware classification

Qualifiers

  • Research-article

Funding Sources

Conference

CODASPY '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 149 of 789 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)16
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Explainability-Informed Targeted Malware Misclassification2024 33rd International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN61486.2024.10637629(1-8)Online publication date: 29-Jul-2024
  • (2023)An Evaluation of Real-time Malware Detection in IoT Devices: Comparison of Machine Learning Algorithms with RapidMiner2023 IEEE International Conference on Electro Information Technology (eIT)10.1109/eIT57321.2023.10187265(077-082)Online publication date: 18-May-2023
  • (2023)Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179347(719-736)Online publication date: May-2023
  • (2023)A Survey of Adversarial Attack and Defense Methods for Malware Classification in Cyber SecurityIEEE Communications Surveys & Tutorials10.1109/COMST.2022.322513725:1(467-496)Online publication date: 1-Jan-2023
  • (2023)Toward Effective Evaluation of Cyber Defense: Threat Based Adversary Emulation ApproachIEEE Access10.1109/ACCESS.2023.327262911(70443-70458)Online publication date: 2023
  • (2023)Towards robust CNN-based malware classifiers using adversarial examples generated based on two saliency similaritiesNeural Computing and Applications10.1007/s00521-023-08590-135:23(17129-17146)Online publication date: 27-Apr-2023
  • (2022)Quo Vadis: Hybrid Machine Learning Meta-Model Based on Contextual and Behavioral Malware RepresentationsProceedings of the 15th ACM Workshop on Artificial Intelligence and Security10.1145/3560830.3563726(127-136)Online publication date: 11-Nov-2022
  • (2022)We cannot trust in you: a study about the dissonance among anti-malware engines.Proceedings of the 17th International Conference on Availability, Reliability and Security10.1145/3538969.3544411(1-13)Online publication date: 23-Aug-2022
  • (2022)Position PaperProceedings of the 1st Workshop on Robust Malware Analysis10.1145/3494110.3528244(15-20)Online publication date: 30-May-2022
  • (2021)A Framework for Enhancing Deep Neural Networks Against Adversarial MalwareIEEE Transactions on Network Science and Engineering10.1109/TNSE.2021.30513548:1(736-750)Online publication date: 1-Jan-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media