skip to main content
10.1145/3274694.3274700acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

Accurate Malware Detection by Extreme Abstraction

Published: 03 December 2018 Publication History

Abstract

Modern malware applies a rich arsenal of evasion techniques to render dynamic analysis ineffective. In turn, dynamic analysis tools take great pains to hide themselves from malware; typically this entails trying to be as faithful as possible to the behavior of a real run. We present a novel approach to malware analysis that turns this idea on its head, using an extreme abstraction of the operating system that intentionally strays from real behavior. The key insight is that the presence of malicious behavior is sufficient evidence of malicious intent, even if the path taken is not one that could occur during a real run of the sample. By exploring multiple paths in a system that only approximates the behavior of a real system, we can discover behavior that would often be hard to elicit otherwise. We aggregate features from multiple paths and use a funnel-like configuration of machine learning classifiers to achieve high accuracy without incurring too much of a performance penalty. We describe our system, TAMALES (The Abstract Malware Analysis LEarning System), in detail and present machine learning results using a 330K sample set showing an FPR (False Positive Rate) of 0.10% with a TPR (True Positive Rate) of 99.11%, demonstrating that extreme abstraction can be extraordinarily effective in providing data that allows a classifier to accurately detect malware.

References

[1]
Sebastian Banescu, Christian S. Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. 2016. Code obfuscation against symbolic execution attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC 2016, Los Angeles, CA, USA, December 5--9, 2016. ACM, New York, NY, USA, 189--200.
[2]
Clark W. Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. 2009. Satisfiability Modulo Theories. In Handbook of Satisfiability. IOS Press, Amsterdam, 825--885.
[3]
Philippe Beaucamps, Isabelle Gnaedig, and Jean-Yves Marion. 2010. Behavior Abstraction in Malware Analysis. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Roşu, Oleg Sokolsky, and Nikolai Tillmann (Eds.). Springer, Berlin Heidelberg New York, 168--182.
[4]
Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5--32.
[5]
L Breiman, JH Friedman, RA Olshen, and CJ Stone. 1984. Classification and regression trees. Wadsworth & Brooks, Monterey, CA.
[6]
George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 3422--3426.
[7]
Omid E David and Nathan S Netanyahu. 2015. Deepsign: Deep learning for automatic malware signature generation and classification. In Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 1--8.
[8]
Martin Davis and Hilary Putnam. 1960. A Computing Procedure for Quantification Theory. J. ACM 7, 3 (1960), 201--215.
[9]
Serge Gaspers and Toby Walsh (Eds.). 2017. Theory and Applications of Satisfiability Testing- SAT 2017 - 20th International Conference, Melbourne, VIC, Australia, August 28 - September 1, 2017, Proceedings. Lecture Notes in Computer Science, Vol. 10491. Springer.
[10]
William Hardy, Lingwei Chen, Shifu Hou, Yanfang Ye, and Xin Li. 2016. DL4MD: A deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 61.
[11]
Grégoire Jacob, Hervé Debar, and Eric Filiol. 2009. Malware Behavioral Detection by Attribute-Automata Using Abstraction from Platform and Language. In Recent Advances in Intrusion Detection, 12th International Symposium, RAID 2009, Saint-Malo, France, September 23--25, 2009. Proceedings (Lecture Notes in Computer Science), Vol. 5758. Springer, Berlin Heidelberg New York, 81--100.
[12]
John T Kent. 1983. Information gain and a general measure of correlation. Biometrika 70, 1 (1983), 163--173.
[13]
James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (July 1976), 385--394.
[14]
Clemens Kolbitsch, Benjamin Livshits, Benjamin G. Zorn, and Christian Seifert. 2012. Rozzle: De-cloaking Internet Malware. In IEEE Symposium on Security and Privacy, SP 2012, 21--23 May 2012, San Francisco, California, USA. IEEE Computer Society, Washington, DC, USA, 443--457.
[15]
Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. 2016. Deep learning for classification of malware system call sequences. In Australasian Joint Conference on Artificial Intelligence. Springer, 137--149.
[16]
Christopher Kruegel. 2014. Full system emulation: Achieving successful automated dynamic analysis of evasive malware. (August 2014).
[17]
John Leitch. {n. d.}. Process hollowing. www.autosectools.com/process-hollowing.pdf. ({n. d.}).
[18]
Tamas K. Lengyel, Steve Maresca, Bryan D. Payne, George D. Webster, Sebastian Vogl, and Aggelos Kiayias. 2014. Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System. In Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC '14). ACM, New York, NY, USA, 386--395.
[19]
Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Limits of Static Analysis for Malware Detection. In 23rd Annual Computer Security Applications Conference (ACSAC 2007), December 10--14, 2007, Miami Beach, Florida, USA. IEEE Computer Society, Washington, DC, USA, 421--430.
[20]
Andreas Moser, Christopher Krügel, and Engin Kirda. 2007. Exploring Multiple Execution Paths for Malware Analysis. In 2007 IEEE Symposium on Security and Privacy (S&P 2007), 20--23 May 2007, Oakland, California, USA. IEEE Computer Society, Washington, DC, USA, 231--245.
[21]
Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev, and Yuval Elovici. 2008. Unknown Malcode Detection Using OPCODE Representation. In Intelligence and Security Informatics, First European Conference, EuroISI 2008, Esbjerg, Denmark, December 3--5, 2008. Proceedings (Lecture Notes in Computer Science), Vol. 5376. Springer, Berlin Heidelberg New York, 204--215.
[22]
Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 1916--1920.
[23]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[24]
Mila Dalla Preda. 2007. Code Obfuscation and Malware Detection by Abstract Interpretation. Ph.D. Dissertation. Università degli Studi di Verona, Dipartimento di Informatica.
[25]
Yong Qiao, Yuexiang Yang, Lin Ji, and Jie He. 2013. Analyzing Malware by Abstracting the Frequent Itemsets in API Call Sequences. In 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2013 / 11th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA-13 / 12th IEEE International Conference on Ubiquitous Computing and Communications, IUCC-2013, Melbourne, Australia, July 16--18, 2013. IEEE Computer Society, Washington, DC, USA, 265--270.
[26]
Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles Nicholas. 2017. Malware Detection by Eating a Whole EXE. arXiv preprint arXiv:1710.09435 (2017).
[27]
Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 11--20.
[28]
Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. 2010. All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). In Proceedings of the 2010 IEEE Symposium on Security and Privacy (SP '10). IEEE Computer Society, Washington, DC, USA, 317--331.
[29]
Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AV-Class: A Tool for Massive Malware Labeling. In Research in Attacks, Intrusions, and Defenses - 19th International Symposium, RAID 2016, Paris, France, September 19--21, 2016, Proceedings (Lecture Notes in Computer Science), Vol. 9854. Springer, Berlin Heidelberg New York, 230--253.
[30]
Michael Sikorski and Andrew Honig. 2012. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software (1st ed.). No Starch Press, San Francisco, CA, USA.
[31]
Themida {n. d.}. www.oreans.com/themida.php. ({n. d.}).
[32]
UPX {n. d.}. upx.github.io. ({n. d.}).
[33]
VirusSign {n. d.}. www.virussign.com. ({n. d.}).
[34]
VirusTotal {n. d.}. www.virustotal.com. ({n. d.}).
[35]
VMProtect {n. d.}. vmpsoft.com. ({n. d.}).
[36]
Kilian Q. Weinberger, Anirban Dasgupta, John Langford, Alexander J. Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14--18, 2009. ACM, New York, NY, USA, 1113--1120.
[37]
Jeffrey Wilhelm and Tzi-cker Chiueh. 2007. A Forced Sampled Execution Approach to Kernel Rootkit Identification. In Recent Advances in Intrusion Detection, 10th International Symposium, RAID 2007, Gold Goast, Australia, September 5--7, 2007, Proceedings (Lecture Notes in Computer Science), Vol. 4637. Springer, Berlin Heidelberg New York, 219--235.
[38]
Babak Yadegari and Saumya Debray. 2015. Symbolic Execution of Obfuscated Code. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). ACM, New York, NY, USA, 732--744.

Cited By

View all
  • (2024)DCEL: Classifier Fusion Model for Android Malware DetectionJournal of Systems Engineering and Electronics10.23919/JSEE.2024.00001835:1(163-177)Online publication date: Feb-2024
  • (2024)Malicious Log Detection Using Machine Learning to Maximize the Partial AUC2024 IEEE 21st Consumer Communications & Networking Conference (CCNC)10.1109/CCNC51664.2024.10454779(339-344)Online publication date: 6-Jan-2024
  • (2023)Re-measuring the Label Dynamics of Online Anti-Malware Engines from Millions of SamplesProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3624800(253-267)Online publication date: 24-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACSAC '18: Proceedings of the 34th Annual Computer Security Applications Conference
December 2018
766 pages
ISBN:9781450365697
DOI:10.1145/3274694
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • ACSA: Applied Computing Security Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Analysis
  2. Classification
  3. Detection
  4. Malware

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACSAC '18

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)6
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DCEL: Classifier Fusion Model for Android Malware DetectionJournal of Systems Engineering and Electronics10.23919/JSEE.2024.00001835:1(163-177)Online publication date: Feb-2024
  • (2024)Malicious Log Detection Using Machine Learning to Maximize the Partial AUC2024 IEEE 21st Consumer Communications & Networking Conference (CCNC)10.1109/CCNC51664.2024.10454779(339-344)Online publication date: 6-Jan-2024
  • (2023)Re-measuring the Label Dynamics of Online Anti-Malware Engines from Millions of SamplesProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3624800(253-267)Online publication date: 24-Oct-2023
  • (2021)Sharpshooting Most Beneficial Part of AUC for Detecting Malicious LogsData Mining10.1007/978-981-16-8531-6_3(31-46)Online publication date: 9-Dec-2021
  • (2020)Measuring and modeling the label dynamics of online anti-malware enginesProceedings of the 29th USENIX Conference on Security Symposium10.5555/3489212.3489345(2361-2378)Online publication date: 12-Aug-2020
  • (2020)TuningMalconv: Malware Detection With Not Just Raw BytesIEEE Access10.1109/ACCESS.2020.30142458(140915-140922)Online publication date: 2020
  • (2019)EIGERProceedings of the 35th Annual Computer Security Applications Conference10.1145/3359789.3359808(687-701)Online publication date: 9-Dec-2019
  • (2019)Unacceptable BehaviorProceedings of the 14th ACM SIGSAC Workshop on Programming Languages and Analysis for Security10.1145/3338504.3357341(19-30)Online publication date: 15-Nov-2019
  • (2019)AndrEnsembleProceedings of the 2019 ACM Asia Conference on Computer and Communications Security10.1145/3321705.3329854(307-314)Online publication date: 2-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media