Abstract
This paper proposes Barnum, an offline control flow attack detection system that applies deep learning on hardware execution traces to model a program’s behavior and detect control flow anomalies. Our implementation analyzes document readers to detect exploits and ABI abuse. Recent work has proposed using deep learning based control flow classification to build more robust and scalable detection systems. These proposals, however, were not evaluated against different kinds of control flow attacks, programs, and adversarial perturbations.
We investigate anomaly detection approaches to improve the security coverage and scalability of control flow attack detection. Barnum is an end-to-end system consisting of three major components: (1) trace collection, (2) behavior modeling, and (3) anomaly detection via binary classification. It utilizes Intel® Processor Trace for low overhead execution tracing and applies deep learning on the basic block sequences reconstructed from the trace to train a normal program behavior model. Based on the path prediction accuracy of the model, Barnum then determines a decision boundary to classify benign vs. malicious executions.
We evaluate against 8 families of attacks to Adobe Acrobat Reader and 9 to Microsoft Word on Windows 7. Both readers are complex programs with over 50 dynamically linked libraries, just-in-time compiled code and frequent network I/O. Barnum shows its effectiveness with 0% false positive and 2.4% false negative on a dataset of 1,250 benign and 1,639 malicious PDFs. Barnum is robust against evasion techniques as it successfully detects 500 adversarially perturbed PDFs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
01org: libipt (2018). https://github.com/01org/processor-trace
Aditham, S., Ranganathan, N., Katkoori, S.: LSTM-based memory profiling for predicting data attacks in distributed big data systems. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1259–1267. IEEE (2017)
C.B., et al.: McAfee Labs Threat Report. Technical report, McAfee Labs, September 2018
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
Bearden, R., Lo, D.C.T.: Automated microsoft office macro malware detection using machine learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 4448–4452. IEEE (2017)
Carlini, N., Wagner, D.: ROP is still dangerous: breaking modern defenses. In: Proceedings of the 23rd USENIX Conference on Security Symposium (2014)
Chen, L., Sultana, S., Sahita, R.: HeNet: a deep learning approach on Intel processor trace for effective exploit detection. arXiv preprint arXiv:1801.02318 (2018)
Chen, S., Xu, J., Sezer, E.C., Gauriar, P., Iyer, R.K.: Non-control-data attacks are realistic threats. In: Proceedings of the 14th USENIX Security Symposium (2005)
Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0R: detection of malicious PDF-embedded Javascript code through discriminant analysis of API references. In: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 47–57. ACM (2014)
Davis, A., Wolff, M.: Deep learning on disassembly data. In: BlackHat USA (2015)
Fallah, F., Devadas, S., Keutzer, K.: OCCOM-efficient computation of observability-based code coverage metrics for functional verification. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 20(8), 1003–1015 (2001)
Ferrie, P.: Attacks on more virtual machine emulators. Symantec Technol. Exch. 55, 1–17 (2007)
Gao, D., Reiter, M.K., Song, D.: On gray-box program tracking for anomaly detection, p. 24. Department of Electrical and Computing Engineering (2004)
Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly-based network intrusion detection: techniques, systems and challenges. Comput. Secur. 28(1–2), 18–28 (2009)
Grosse, K., Papernot, N., Manoharan, P., Backes, M., McDaniel, P.D.: Adversarial perturbations against deep neural networks for malware classification. CoRR abs/1606.04435 (2016)
Hestness, J., et al.: Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017)
Hu, W., Tan, Y.: Black-box attacks against RNN based malware detection algorithms. CoRR abs/1705.08131 (2017)
Karademir, S., Dean, T., Leblanc, S.: Using clone detection to find malware in acrobat files. In: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, pp. 70–80. IBM Corporation (2013)
Kim, G., Yi, H., Lee, J., Paek, Y., Yoon, S.: LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems. CoRR abs/1611.01726 (2016)
Kolbitsch, C., Kirda, E., Kruegel, C.: The power of procrastination: detection and mitigation of execution-stalling malicious code. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 285–296. ACM (2011)
Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep Learning for Classification of Malware System Call Sequences, pp. 137–149 (2016)
Kuznetsov, V., Szekeres, L., Payer, M., Candea, G., Sekar, R., Song, D.: Code-pointer integrity. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (2014)
Laskov, P., Šrndić, N.: Static detection of malicious Javascript-bearing PDF documents. In: Proceedings of the 27th Annual Computer Security Applications Conference, pp. 373–382. ACM (2011)
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 25–36. SIAM (2003)
Liu, D., Wang, H., Stavrou, A.: Detecting malicious Javascript in PDF through document instrumentation. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 100–111. IEEE (2014)
Lu, X., Zhuge, J., Wang, R., Cao, Y., Chen, Y.: De-obfuscation and detection of malicious PDF files with high accuracy. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 4890–4899. IEEE (2013)
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981. IEEE (2010)
Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious PDF files detection. In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 510–524. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31537-4_40
Microsoft: New feature in office 2016 can block macros and help prevent infection (2016). https://cloudblogs.microsoft.com/microsoftsecure/2016/03/22/new-feature-in-office-2016-can-block-macros-and-help-prevent-infection/
Mila: 16,800 clean and 11,960 malicious files for signature testing and research (2013). http://contagiodump.blogspot.com/2013/03/16800-clean-and-11960-malicious-files.html
Miramirkhani, N., Appini, M.P., Nikiforakis, N., Polychronakis, M.: Spotless sandboxes: evading malware analysis systems using wear-and-tear artifacts. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 1009–1024. IEEE (2017)
Nissim, N., Cohen, A., Elovici, Y.: ALDOCX: detection of unknown malicious microsoft office documents using designated active learning methods based on new structural feature extraction methodology. IEEE Trans. Inf. Forensics Secur. 12(3), 631–646 (2017)
Niu, B., Tan, G.: RockJIT: securing just-in-time compilation using modular control-flow integrity. In: Proceedings of the 21st ACM SIGSAC Conference on Computer and Communications Security (2014)
Proofpoint: The human factor report 2016 (2016). https://www.proofpoint.com/sites/default/files/human-factor-report-2016.pdf
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware Detection by Eating a Whole EXE. ArXiv e-prints, October 2017
Raff, E., Sylvester, J., Nicholas, C.: Learning the PE Header. Malware Detection with Minimal Domain Knowledge, ArXiv e-prints, September 2017
Rosenberg, I., Shabtai, A., Rokach, L., Elovici, Y.: Generic black-box end-to-end attack against RNNs and other API calls based malware classifiers. ArXiv e-prints, July 2017
Sandbox, C.: Cuckoo sandbox (2018). https://cuckoosandbox.org/
Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. In: International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20, October 2015
Schmitt, F., Gassen, J., Gerhards-Padilla, E.: PDF scrutinizer: detecting Javascript-based attacks in PDF documents. In: 2012 Tenth Annual International Conference on Privacy, Security and Trust, pp. 104–111. IEEE (2012)
Schumilo, S., Aschermann, C., Gawlik, R., Schinzel, S., Holz, T.: KAFL: hardware-assisted feedback fuzzing for OS Kernels. In: 26th USENIX Security Symposium, USENIX Security 2017, pp. 167–182. USENIX Association (2017)
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 239–248. ACM (2012)
Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: SHELLOS: enabling fast detection and forensic analysis of code injection attacks. In: USENIX Security Symposium, pp. 183–200 (2011)
Šrndic, N., Laskov, P.: Mimicus: a library for adversarial classifier evasion (2016)
Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security, p. 4. ACM (2011)
Willems, C., Freiling, F.C., Holz, T.: Using memory management to detect and extract illegitimate code for malware analysis. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 179–188. ACM (2012)
Xu, M., Kim, T.: PlatPal: detecting malicious documents with platform diversity. In: 26th USENIX Security Symposium, USENIX Security 2017, pp. 271–287. USENIX Association (2017)
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)
Acknowledgement
This research was supported, in part, by the Intel Science and Technology Center for Adversary-Resilient Security Analytics. Some malware samples were provided by the Georgia Tech Research Institute Apiary framework. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yagemann, C., Sultana, S., Chen, L., Lee, W. (2019). Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces. In: Lin, Z., Papamanthou, C., Polychronakis, M. (eds) Information Security. ISC 2019. Lecture Notes in Computer Science(), vol 11723. Springer, Cham. https://doi.org/10.1007/978-3-030-30215-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-30215-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30214-6
Online ISBN: 978-3-030-30215-3
eBook Packages: Computer ScienceComputer Science (R0)