research-article

Accurate Malware Detection by Extreme Abstraction

Authors:

Orit Edelstein,

Benjamin ZeltserAuthors Info & Claims

ACSAC '18: Proceedings of the 34th Annual Computer Security Applications Conference

Pages 101 - 111

https://doi.org/10.1145/3274694.3274700

Published: 03 December 2018 Publication History

Abstract

Modern malware applies a rich arsenal of evasion techniques to render dynamic analysis ineffective. In turn, dynamic analysis tools take great pains to hide themselves from malware; typically this entails trying to be as faithful as possible to the behavior of a real run. We present a novel approach to malware analysis that turns this idea on its head, using an extreme abstraction of the operating system that intentionally strays from real behavior. The key insight is that the presence of malicious behavior is sufficient evidence of malicious intent, even if the path taken is not one that could occur during a real run of the sample. By exploring multiple paths in a system that only approximates the behavior of a real system, we can discover behavior that would often be hard to elicit otherwise. We aggregate features from multiple paths and use a funnel-like configuration of machine learning classifiers to achieve high accuracy without incurring too much of a performance penalty. We describe our system, TAMALES (The Abstract Malware Analysis LEarning System), in detail and present machine learning results using a 330K sample set showing an FPR (False Positive Rate) of 0.10% with a TPR (True Positive Rate) of 99.11%, demonstrating that extreme abstraction can be extraordinarily effective in providing data that allows a classifier to accurately detect malware.

References

[1]

Sebastian Banescu, Christian S. Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. 2016. Code obfuscation against symbolic execution attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC 2016, Los Angeles, CA, USA, December 5--9, 2016. ACM, New York, NY, USA, 189--200.

Digital Library

[2]

Clark W. Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. 2009. Satisfiability Modulo Theories. In Handbook of Satisfiability. IOS Press, Amsterdam, 825--885.

[3]

Philippe Beaucamps, Isabelle Gnaedig, and Jean-Yves Marion. 2010. Behavior Abstraction in Malware Analysis. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Roşu, Oleg Sokolsky, and Nikolai Tillmann (Eds.). Springer, Berlin Heidelberg New York, 168--182.

Digital Library

[4]

Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5--32.

Digital Library

[5]

L Breiman, JH Friedman, RA Olshen, and CJ Stone. 1984. Classification and regression trees. Wadsworth & Brooks, Monterey, CA.

[6]

George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 3422--3426.

[7]

Omid E David and Nathan S Netanyahu. 2015. Deepsign: Deep learning for automatic malware signature generation and classification. In Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 1--8.

[8]

Martin Davis and Hilary Putnam. 1960. A Computing Procedure for Quantification Theory. J. ACM 7, 3 (1960), 201--215.

Digital Library

[9]

Serge Gaspers and Toby Walsh (Eds.). 2017. Theory and Applications of Satisfiability Testing- SAT 2017 - 20th International Conference, Melbourne, VIC, Australia, August 28 - September 1, 2017, Proceedings. Lecture Notes in Computer Science, Vol. 10491. Springer.

[10]

William Hardy, Lingwei Chen, Shifu Hou, Yanfang Ye, and Xin Li. 2016. DL4MD: A deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 61.

[11]

Grégoire Jacob, Hervé Debar, and Eric Filiol. 2009. Malware Behavioral Detection by Attribute-Automata Using Abstraction from Platform and Language. In Recent Advances in Intrusion Detection, 12th International Symposium, RAID 2009, Saint-Malo, France, September 23--25, 2009. Proceedings (Lecture Notes in Computer Science), Vol. 5758. Springer, Berlin Heidelberg New York, 81--100.

Digital Library

[12]

John T Kent. 1983. Information gain and a general measure of correlation. Biometrika 70, 1 (1983), 163--173.

[13]

James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (July 1976), 385--394.

Digital Library

[14]

Clemens Kolbitsch, Benjamin Livshits, Benjamin G. Zorn, and Christian Seifert. 2012. Rozzle: De-cloaking Internet Malware. In IEEE Symposium on Security and Privacy, SP 2012, 21--23 May 2012, San Francisco, California, USA. IEEE Computer Society, Washington, DC, USA, 443--457.

Digital Library

[15]

Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. 2016. Deep learning for classification of malware system call sequences. In Australasian Joint Conference on Artificial Intelligence. Springer, 137--149.

[16]

Christopher Kruegel. 2014. Full system emulation: Achieving successful automated dynamic analysis of evasive malware. (August 2014).

[17]

John Leitch. {n. d.}. Process hollowing. www.autosectools.com/process-hollowing.pdf. ({n. d.}).

[18]

Tamas K. Lengyel, Steve Maresca, Bryan D. Payne, George D. Webster, Sebastian Vogl, and Aggelos Kiayias. 2014. Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System. In Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC '14). ACM, New York, NY, USA, 386--395.

Digital Library

[19]

Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Limits of Static Analysis for Malware Detection. In 23rd Annual Computer Security Applications Conference (ACSAC 2007), December 10--14, 2007, Miami Beach, Florida, USA. IEEE Computer Society, Washington, DC, USA, 421--430.

[20]

Andreas Moser, Christopher Krügel, and Engin Kirda. 2007. Exploring Multiple Execution Paths for Malware Analysis. In 2007 IEEE Symposium on Security and Privacy (S&P 2007), 20--23 May 2007, Oakland, California, USA. IEEE Computer Society, Washington, DC, USA, 231--245.

Digital Library

[21]

Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev, and Yuval Elovici. 2008. Unknown Malcode Detection Using OPCODE Representation. In Intelligence and Security Informatics, First European Conference, EuroISI 2008, Esbjerg, Denmark, December 3--5, 2008. Proceedings (Lecture Notes in Computer Science), Vol. 5376. Springer, Berlin Heidelberg New York, 204--215.

Digital Library

[22]

Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 1916--1920.

[23]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[24]

Mila Dalla Preda. 2007. Code Obfuscation and Malware Detection by Abstract Interpretation. Ph.D. Dissertation. Università degli Studi di Verona, Dipartimento di Informatica.

[25]

Yong Qiao, Yuexiang Yang, Lin Ji, and Jie He. 2013. Analyzing Malware by Abstracting the Frequent Itemsets in API Call Sequences. In 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2013 / 11th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA-13 / 12th IEEE International Conference on Ubiquitous Computing and Communications, IUCC-2013, Melbourne, Australia, July 16--18, 2013. IEEE Computer Society, Washington, DC, USA, 265--270.

Digital Library

[26]

Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles Nicholas. 2017. Malware Detection by Eating a Whole EXE. arXiv preprint arXiv:1710.09435 (2017).

[27]

Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 11--20.

Digital Library

[28]

Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. 2010. All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). In Proceedings of the 2010 IEEE Symposium on Security and Privacy (SP '10). IEEE Computer Society, Washington, DC, USA, 317--331.

Digital Library

[29]

Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AV-Class: A Tool for Massive Malware Labeling. In Research in Attacks, Intrusions, and Defenses - 19th International Symposium, RAID 2016, Paris, France, September 19--21, 2016, Proceedings (Lecture Notes in Computer Science), Vol. 9854. Springer, Berlin Heidelberg New York, 230--253.

[30]

Michael Sikorski and Andrew Honig. 2012. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software (1st ed.). No Starch Press, San Francisco, CA, USA.

Digital Library

[31]

Themida {n. d.}. www.oreans.com/themida.php. ({n. d.}).

[32]

UPX {n. d.}. upx.github.io. ({n. d.}).

[33]

VirusSign {n. d.}. www.virussign.com. ({n. d.}).

[34]

VirusTotal {n. d.}. www.virustotal.com. ({n. d.}).

[35]

VMProtect {n. d.}. vmpsoft.com. ({n. d.}).

[36]

Kilian Q. Weinberger, Anirban Dasgupta, John Langford, Alexander J. Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14--18, 2009. ACM, New York, NY, USA, 1113--1120.

Digital Library

[37]

Jeffrey Wilhelm and Tzi-cker Chiueh. 2007. A Forced Sampled Execution Approach to Kernel Rootkit Identification. In Recent Advances in Intrusion Detection, 10th International Symposium, RAID 2007, Gold Goast, Australia, September 5--7, 2007, Proceedings (Lecture Notes in Computer Science), Vol. 4637. Springer, Berlin Heidelberg New York, 219--235.

Digital Library

[38]

Babak Yadegari and Saumya Debray. 2015. Symbolic Execution of Obfuscated Code. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). ACM, New York, NY, USA, 732--744.

Digital Library

Cited By

Xu XJiang SZhao JWang X(2024)DCEL: Classifier Fusion Model for Android Malware DetectionJournal of Systems Engineering and Electronics10.23919/JSEE.2024.00001835:1(163-177)Online publication date: Feb-2024
https://doi.org/10.23919/JSEE.2024.000018
Nishiyama TKumagai AFujino AKamiya K(2024)Malicious Log Detection Using Machine Learning to Maximize the Partial AUC2024 IEEE 21st Consumer Communications & Networking Conference (CCNC)10.1109/CCNC51664.2024.10454779(339-344)Online publication date: 6-Jan-2024
https://doi.org/10.1109/CCNC51664.2024.10454779
Wang JWang LDong FWang HMontpetit MLeivadeas AUhlig SJaved M(2023)Re-measuring the Label Dynamics of Online Anti-Malware Engines from Millions of SamplesProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3624800(253-267)Online publication date: 24-Oct-2023
https://dl.acm.org/doi/10.1145/3618257.3624800
Show More Cited By

Index Terms

Accurate Malware Detection by Extreme Abstraction
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Toward an Automatic, Online Behavioral Malware Classification System
SASO '13: Proceedings of the 2013 IEEE 7th International Conference on Self-Adaptive and Self-Organizing Systems

Malware authors are increasingly using specialized toolkits and obfuscation techniques to modify existing malware and avoid detection by traditional antivirus software. The resulting proliferation of obfuscated malware variants poses a challenge to ...
A Survey on Malware Detection with Deep Learning
SIN 2020: 13th International Conference on Security of Information and Networks

Rapid development of Internet and technology has emerged a bunch of evolving malware and attack strategies. Therefore researchers focused on machine learning and deep learning methods to detect malware (viruses, bots, ransomware, trojans). In order to ...
A heuristic approach for detection of obfuscated malware
ISI'09: Proceedings of the 2009 IEEE international conference on Intelligence and security informatics

Obfuscated malware has become popular because of pure benefits brought by obfuscation: low cost and readily availability of obfuscation tools accompanied with good result of evading signature based antivirus detection as well as prevention of reverse ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSAC '18: Proceedings of the 34th Annual Computer Security Applications Conference

December 2018

766 pages

ISBN:9781450365697

DOI:10.1145/3274694

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

ACSA: Applied Computing Security Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACSAC '18

ACSAC '18: 2018 Annual Computer Security Applications Conference

December 3 - 7, 2018

PR, San Juan, USA

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
685
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)6

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu XJiang SZhao JWang X(2024)DCEL: Classifier Fusion Model for Android Malware DetectionJournal of Systems Engineering and Electronics10.23919/JSEE.2024.00001835:1(163-177)Online publication date: Feb-2024
https://doi.org/10.23919/JSEE.2024.000018
Nishiyama TKumagai AFujino AKamiya K(2024)Malicious Log Detection Using Machine Learning to Maximize the Partial AUC2024 IEEE 21st Consumer Communications & Networking Conference (CCNC)10.1109/CCNC51664.2024.10454779(339-344)Online publication date: 6-Jan-2024
https://doi.org/10.1109/CCNC51664.2024.10454779
Wang JWang LDong FWang HMontpetit MLeivadeas AUhlig SJaved M(2023)Re-measuring the Label Dynamics of Online Anti-Malware Engines from Millions of SamplesProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3624800(253-267)Online publication date: 24-Oct-2023
https://dl.acm.org/doi/10.1145/3618257.3624800
Nishiyama TKumagai AFujino AKamiya K(2021)Sharpshooting Most Beneficial Part of AUC for Detecting Malicious LogsData Mining10.1007/978-981-16-8531-6_3(31-46)Online publication date: 9-Dec-2021
https://doi.org/10.1007/978-981-16-8531-6_3
Zhu SShi JYang LQin BZhang ZSong LWang GCapkun SRoesner F(2020)Measuring and modeling the label dynamics of online anti-malware enginesProceedings of the 29th USENIX Conference on Security Symposium10.5555/3489212.3489345(2361-2378)Online publication date: 12-Aug-2020
https://dl.acm.org/doi/10.5555/3489212.3489345
Yang LLiu J(2020)TuningMalconv: Malware Detection With Not Just Raw BytesIEEE Access10.1109/ACCESS.2020.30142458(140915-140922)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3014245
Kurogome YOtsuki YKawakoya YIwamura MHayashi SMori TSen KBalenson D(2019)EIGERProceedings of the 35th Annual Computer Security Applications Conference10.1145/3359789.3359808(687-701)Online publication date: 9-Dec-2019
https://dl.acm.org/doi/10.1145/3359789.3359808
Jordan AGauthier FHassanshahi BZhao DMardziel PVazou N(2019)Unacceptable BehaviorProceedings of the 14th ACM SIGSAC Workshop on Programming Languages and Analysis for Security10.1145/3338504.3357341(19-30)Online publication date: 15-Nov-2019
https://dl.acm.org/doi/10.1145/3338504.3357341
Mirzaei OSuarez-Tangil Gde Fuentes JTapiador JStringhini GGalbraith SRussello GSusilo WGollmann DKirda ELiang Z(2019)AndrEnsembleProceedings of the 2019 ACM Asia Conference on Computer and Communications Security10.1145/3321705.3329854(307-314)Online publication date: 2-Jul-2019
https://dl.acm.org/doi/10.1145/3321705.3329854

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten