Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces

Yagemann, Carter; Sultana, Salmin; Chen, Li; Lee, Wenke

doi:10.1007/978-3-030-30215-3_17

Carter Yagemann¹¹,
Salmin Sultana¹²,
Li Chen¹² &
…
Wenke Lee¹¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11723))

Included in the following conference series:

International Conference on Information Security

1156 Accesses
12 Citations

Abstract

This paper proposes Barnum, an offline control flow attack detection system that applies deep learning on hardware execution traces to model a program’s behavior and detect control flow anomalies. Our implementation analyzes document readers to detect exploits and ABI abuse. Recent work has proposed using deep learning based control flow classification to build more robust and scalable detection systems. These proposals, however, were not evaluated against different kinds of control flow attacks, programs, and adversarial perturbations.

We investigate anomaly detection approaches to improve the security coverage and scalability of control flow attack detection. Barnum is an end-to-end system consisting of three major components: (1) trace collection, (2) behavior modeling, and (3) anomaly detection via binary classification. It utilizes Intel^® Processor Trace for low overhead execution tracing and applies deep learning on the basic block sequences reconstructed from the trace to train a normal program behavior model. Based on the path prediction accuracy of the model, Barnum then determines a decision boundary to classify benign vs. malicious executions.

We evaluate against 8 families of attacks to Adobe Acrobat Reader and 9 to Microsoft Word on Windows 7. Both readers are complex programs with over 50 dynamically linked libraries, just-in-time compiled code and frequent network I/O. Barnum shows its effectiveness with 0% false positive and 2.4% false negative on a dataset of 1,250 benign and 1,639 malicious PDFs. Barnum is robust against evasion techniques as it successfully detects 500 adversarially perturbed PDFs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://tinyurl.com/y27clrfl.

References

01org: libipt (2018). https://github.com/01org/processor-trace
Aditham, S., Ranganathan, N., Katkoori, S.: LSTM-based memory profiling for predicting data attacks in distributed big data systems. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1259–1267. IEEE (2017)
Google Scholar
C.B., et al.: McAfee Labs Threat Report. Technical report, McAfee Labs, September 2018
Google Scholar
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
Bearden, R., Lo, D.C.T.: Automated microsoft office macro malware detection using machine learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 4448–4452. IEEE (2017)
Google Scholar
Carlini, N., Wagner, D.: ROP is still dangerous: breaking modern defenses. In: Proceedings of the 23rd USENIX Conference on Security Symposium (2014)
Google Scholar
Chen, L., Sultana, S., Sahita, R.: HeNet: a deep learning approach on Intel processor trace for effective exploit detection. arXiv preprint arXiv:1801.02318 (2018)
Chen, S., Xu, J., Sezer, E.C., Gauriar, P., Iyer, R.K.: Non-control-data attacks are realistic threats. In: Proceedings of the 14th USENIX Security Symposium (2005)
Google Scholar
Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0R: detection of malicious PDF-embedded Javascript code through discriminant analysis of API references. In: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 47–57. ACM (2014)
Google Scholar
Davis, A., Wolff, M.: Deep learning on disassembly data. In: BlackHat USA (2015)
Google Scholar
Fallah, F., Devadas, S., Keutzer, K.: OCCOM-efficient computation of observability-based code coverage metrics for functional verification. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 20(8), 1003–1015 (2001)
Article Google Scholar
Ferrie, P.: Attacks on more virtual machine emulators. Symantec Technol. Exch. 55, 1–17 (2007)
Google Scholar
Gao, D., Reiter, M.K., Song, D.: On gray-box program tracking for anomaly detection, p. 24. Department of Electrical and Computing Engineering (2004)
Google Scholar
Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly-based network intrusion detection: techniques, systems and challenges. Comput. Secur. 28(1–2), 18–28 (2009)
Article Google Scholar
Grosse, K., Papernot, N., Manoharan, P., Backes, M., McDaniel, P.D.: Adversarial perturbations against deep neural networks for malware classification. CoRR abs/1606.04435 (2016)
Google Scholar
Hestness, J., et al.: Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017)
Hu, W., Tan, Y.: Black-box attacks against RNN based malware detection algorithms. CoRR abs/1705.08131 (2017)
Google Scholar
Karademir, S., Dean, T., Leblanc, S.: Using clone detection to find malware in acrobat files. In: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, pp. 70–80. IBM Corporation (2013)
Google Scholar
Kim, G., Yi, H., Lee, J., Paek, Y., Yoon, S.: LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems. CoRR abs/1611.01726 (2016)
Google Scholar
Kolbitsch, C., Kirda, E., Kruegel, C.: The power of procrastination: detection and mitigation of execution-stalling malicious code. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 285–296. ACM (2011)
Google Scholar
Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep Learning for Classification of Malware System Call Sequences, pp. 137–149 (2016)
Chapter Google Scholar
Kuznetsov, V., Szekeres, L., Payer, M., Candea, G., Sekar, R., Song, D.: Code-pointer integrity. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (2014)
Google Scholar
Laskov, P., Šrndić, N.: Static detection of malicious Javascript-bearing PDF documents. In: Proceedings of the 27th Annual Computer Security Applications Conference, pp. 373–382. ACM (2011)
Google Scholar
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 25–36. SIAM (2003)
Google Scholar
Liu, D., Wang, H., Stavrou, A.: Detecting malicious Javascript in PDF through document instrumentation. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 100–111. IEEE (2014)
Google Scholar
Lu, X., Zhuge, J., Wang, R., Cao, Y., Chen, Y.: De-obfuscation and detection of malicious PDF files with high accuracy. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 4890–4899. IEEE (2013)
Google Scholar
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981. IEEE (2010)
Google Scholar
Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious PDF files detection. In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 510–524. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31537-4_40
Chapter Google Scholar
Microsoft: New feature in office 2016 can block macros and help prevent infection (2016). https://cloudblogs.microsoft.com/microsoftsecure/2016/03/22/new-feature-in-office-2016-can-block-macros-and-help-prevent-infection/
Mila: 16,800 clean and 11,960 malicious files for signature testing and research (2013). http://contagiodump.blogspot.com/2013/03/16800-clean-and-11960-malicious-files.html
Miramirkhani, N., Appini, M.P., Nikiforakis, N., Polychronakis, M.: Spotless sandboxes: evading malware analysis systems using wear-and-tear artifacts. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 1009–1024. IEEE (2017)
Google Scholar
Nissim, N., Cohen, A., Elovici, Y.: ALDOCX: detection of unknown malicious microsoft office documents using designated active learning methods based on new structural feature extraction methodology. IEEE Trans. Inf. Forensics Secur. 12(3), 631–646 (2017)
Article Google Scholar
Niu, B., Tan, G.: RockJIT: securing just-in-time compilation using modular control-flow integrity. In: Proceedings of the 21st ACM SIGSAC Conference on Computer and Communications Security (2014)
Google Scholar
Proofpoint: The human factor report 2016 (2016). https://www.proofpoint.com/sites/default/files/human-factor-report-2016.pdf
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware Detection by Eating a Whole EXE. ArXiv e-prints, October 2017
Google Scholar
Raff, E., Sylvester, J., Nicholas, C.: Learning the PE Header. Malware Detection with Minimal Domain Knowledge, ArXiv e-prints, September 2017
Google Scholar
Rosenberg, I., Shabtai, A., Rokach, L., Elovici, Y.: Generic black-box end-to-end attack against RNNs and other API calls based malware classifiers. ArXiv e-prints, July 2017
Google Scholar
Sandbox, C.: Cuckoo sandbox (2018). https://cuckoosandbox.org/
Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. In: International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20, October 2015
Google Scholar
Schmitt, F., Gassen, J., Gerhards-Padilla, E.: PDF scrutinizer: detecting Javascript-based attacks in PDF documents. In: 2012 Tenth Annual International Conference on Privacy, Security and Trust, pp. 104–111. IEEE (2012)
Google Scholar
Schumilo, S., Aschermann, C., Gawlik, R., Schinzel, S., Holz, T.: KAFL: hardware-assisted feedback fuzzing for OS Kernels. In: 26th USENIX Security Symposium, USENIX Security 2017, pp. 167–182. USENIX Association (2017)
Google Scholar
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Chapter Google Scholar
Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 239–248. ACM (2012)
Google Scholar
Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: SHELLOS: enabling fast detection and forensic analysis of code injection attacks. In: USENIX Security Symposium, pp. 183–200 (2011)
Google Scholar
Šrndic, N., Laskov, P.: Mimicus: a library for adversarial classifier evasion (2016)
Google Scholar
Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security, p. 4. ACM (2011)
Google Scholar
Willems, C., Freiling, F.C., Holz, T.: Using memory management to detect and extract illegitimate code for malware analysis. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 179–188. ACM (2012)
Google Scholar
Xu, M., Kim, T.: PlatPal: detecting malicious documents with platform diversity. In: 26th USENIX Security Symposium, USENIX Security 2017, pp. 271–287. USENIX Association (2017)
Google Scholar
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)
Google Scholar

Download references

Acknowledgement

This research was supported, in part, by the Intel Science and Technology Center for Adversary-Resilient Security Analytics. Some malware samples were provided by the Georgia Tech Research Institute Apiary framework. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.

Author information

Authors and Affiliations

Georgia Institute of Technology, Atlanta, GA, 30332, USA
Carter Yagemann & Wenke Lee
Security and Privacy Research, Intel Labs, Hillsboro, OR, 97124, USA
Salmin Sultana & Li Chen

Authors

Carter Yagemann
View author publications
You can also search for this author in PubMed Google Scholar
Salmin Sultana
View author publications
You can also search for this author in PubMed Google Scholar
Li Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenke Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carter Yagemann .

Editor information

Editors and Affiliations

The Ohio State University, Columbus, OH, USA
Zhiqiang Lin
University of Maryland, College Park, MD, USA
Charalampos Papamanthou
Stony Brook University, Stony Brook, NY, USA
Michalis Polychronakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yagemann, C., Sultana, S., Chen, L., Lee, W. (2019). Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces. In: Lin, Z., Papamanthou, C., Polychronakis, M. (eds) Information Security. ISC 2019. Lecture Notes in Computer Science(), vol 11723. Springer, Cham. https://doi.org/10.1007/978-3-030-30215-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-30215-3_17
Published: 02 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30214-6
Online ISBN: 978-3-030-30215-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics