Skip to main content

Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces

  • Conference paper
  • First Online:
Information Security (ISC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11723))

Included in the following conference series:

Abstract

This paper proposes Barnum, an offline control flow attack detection system that applies deep learning on hardware execution traces to model a program’s behavior and detect control flow anomalies. Our implementation analyzes document readers to detect exploits and ABI abuse. Recent work has proposed using deep learning based control flow classification to build more robust and scalable detection systems. These proposals, however, were not evaluated against different kinds of control flow attacks, programs, and adversarial perturbations.

We investigate anomaly detection approaches to improve the security coverage and scalability of control flow attack detection. Barnum is an end-to-end system consisting of three major components: (1) trace collection, (2) behavior modeling, and (3) anomaly detection via binary classification. It utilizes Intel® Processor Trace for low overhead execution tracing and applies deep learning on the basic block sequences reconstructed from the trace to train a normal program behavior model. Based on the path prediction accuracy of the model, Barnum then determines a decision boundary to classify benign vs. malicious executions.

We evaluate against 8 families of attacks to Adobe Acrobat Reader and 9 to Microsoft Word on Windows 7. Both readers are complex programs with over 50 dynamically linked libraries, just-in-time compiled code and frequent network I/O. Barnum shows its effectiveness with 0% false positive and 2.4% false negative on a dataset of 1,250 benign and 1,639 malicious PDFs. Barnum is robust against evasion techniques as it successfully detects 500 adversarially perturbed PDFs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://tinyurl.com/y27clrfl.

References

  1. 01org: libipt (2018). https://github.com/01org/processor-trace

  2. Aditham, S., Ranganathan, N., Katkoori, S.: LSTM-based memory profiling for predicting data attacks in distributed big data systems. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1259–1267. IEEE (2017)

    Google Scholar 

  3. C.B., et al.: McAfee Labs Threat Report. Technical report, McAfee Labs, September 2018

    Google Scholar 

  4. Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)

  5. Bearden, R., Lo, D.C.T.: Automated microsoft office macro malware detection using machine learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 4448–4452. IEEE (2017)

    Google Scholar 

  6. Carlini, N., Wagner, D.: ROP is still dangerous: breaking modern defenses. In: Proceedings of the 23rd USENIX Conference on Security Symposium (2014)

    Google Scholar 

  7. Chen, L., Sultana, S., Sahita, R.: HeNet: a deep learning approach on Intel processor trace for effective exploit detection. arXiv preprint arXiv:1801.02318 (2018)

  8. Chen, S., Xu, J., Sezer, E.C., Gauriar, P., Iyer, R.K.: Non-control-data attacks are realistic threats. In: Proceedings of the 14th USENIX Security Symposium (2005)

    Google Scholar 

  9. Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0R: detection of malicious PDF-embedded Javascript code through discriminant analysis of API references. In: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 47–57. ACM (2014)

    Google Scholar 

  10. Davis, A., Wolff, M.: Deep learning on disassembly data. In: BlackHat USA (2015)

    Google Scholar 

  11. Fallah, F., Devadas, S., Keutzer, K.: OCCOM-efficient computation of observability-based code coverage metrics for functional verification. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 20(8), 1003–1015 (2001)

    Article  Google Scholar 

  12. Ferrie, P.: Attacks on more virtual machine emulators. Symantec Technol. Exch. 55, 1–17 (2007)

    Google Scholar 

  13. Gao, D., Reiter, M.K., Song, D.: On gray-box program tracking for anomaly detection, p. 24. Department of Electrical and Computing Engineering (2004)

    Google Scholar 

  14. Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly-based network intrusion detection: techniques, systems and challenges. Comput. Secur. 28(1–2), 18–28 (2009)

    Article  Google Scholar 

  15. Grosse, K., Papernot, N., Manoharan, P., Backes, M., McDaniel, P.D.: Adversarial perturbations against deep neural networks for malware classification. CoRR abs/1606.04435 (2016)

    Google Scholar 

  16. Hestness, J., et al.: Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017)

  17. Hu, W., Tan, Y.: Black-box attacks against RNN based malware detection algorithms. CoRR abs/1705.08131 (2017)

    Google Scholar 

  18. Karademir, S., Dean, T., Leblanc, S.: Using clone detection to find malware in acrobat files. In: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, pp. 70–80. IBM Corporation (2013)

    Google Scholar 

  19. Kim, G., Yi, H., Lee, J., Paek, Y., Yoon, S.: LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems. CoRR abs/1611.01726 (2016)

    Google Scholar 

  20. Kolbitsch, C., Kirda, E., Kruegel, C.: The power of procrastination: detection and mitigation of execution-stalling malicious code. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 285–296. ACM (2011)

    Google Scholar 

  21. Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep Learning for Classification of Malware System Call Sequences, pp. 137–149 (2016)

    Chapter  Google Scholar 

  22. Kuznetsov, V., Szekeres, L., Payer, M., Candea, G., Sekar, R., Song, D.: Code-pointer integrity. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (2014)

    Google Scholar 

  23. Laskov, P., Šrndić, N.: Static detection of malicious Javascript-bearing PDF documents. In: Proceedings of the 27th Annual Computer Security Applications Conference, pp. 373–382. ACM (2011)

    Google Scholar 

  24. Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 25–36. SIAM (2003)

    Google Scholar 

  25. Liu, D., Wang, H., Stavrou, A.: Detecting malicious Javascript in PDF through document instrumentation. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 100–111. IEEE (2014)

    Google Scholar 

  26. Lu, X., Zhuge, J., Wang, R., Cao, Y., Chen, Y.: De-obfuscation and detection of malicious PDF files with high accuracy. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 4890–4899. IEEE (2013)

    Google Scholar 

  27. Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981. IEEE (2010)

    Google Scholar 

  28. Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious PDF files detection. In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 510–524. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31537-4_40

    Chapter  Google Scholar 

  29. Microsoft: New feature in office 2016 can block macros and help prevent infection (2016). https://cloudblogs.microsoft.com/microsoftsecure/2016/03/22/new-feature-in-office-2016-can-block-macros-and-help-prevent-infection/

  30. Mila: 16,800 clean and 11,960 malicious files for signature testing and research (2013). http://contagiodump.blogspot.com/2013/03/16800-clean-and-11960-malicious-files.html

  31. Miramirkhani, N., Appini, M.P., Nikiforakis, N., Polychronakis, M.: Spotless sandboxes: evading malware analysis systems using wear-and-tear artifacts. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 1009–1024. IEEE (2017)

    Google Scholar 

  32. Nissim, N., Cohen, A., Elovici, Y.: ALDOCX: detection of unknown malicious microsoft office documents using designated active learning methods based on new structural feature extraction methodology. IEEE Trans. Inf. Forensics Secur. 12(3), 631–646 (2017)

    Article  Google Scholar 

  33. Niu, B., Tan, G.: RockJIT: securing just-in-time compilation using modular control-flow integrity. In: Proceedings of the 21st ACM SIGSAC Conference on Computer and Communications Security (2014)

    Google Scholar 

  34. Proofpoint: The human factor report 2016 (2016). https://www.proofpoint.com/sites/default/files/human-factor-report-2016.pdf

  35. Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware Detection by Eating a Whole EXE. ArXiv e-prints, October 2017

    Google Scholar 

  36. Raff, E., Sylvester, J., Nicholas, C.: Learning the PE Header. Malware Detection with Minimal Domain Knowledge, ArXiv e-prints, September 2017

    Google Scholar 

  37. Rosenberg, I., Shabtai, A., Rokach, L., Elovici, Y.: Generic black-box end-to-end attack against RNNs and other API calls based malware classifiers. ArXiv e-prints, July 2017

    Google Scholar 

  38. Sandbox, C.: Cuckoo sandbox (2018). https://cuckoosandbox.org/

  39. Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. In: International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20, October 2015

    Google Scholar 

  40. Schmitt, F., Gassen, J., Gerhards-Padilla, E.: PDF scrutinizer: detecting Javascript-based attacks in PDF documents. In: 2012 Tenth Annual International Conference on Privacy, Security and Trust, pp. 104–111. IEEE (2012)

    Google Scholar 

  41. Schumilo, S., Aschermann, C., Gawlik, R., Schinzel, S., Holz, T.: KAFL: hardware-assisted feedback fuzzing for OS Kernels. In: 26th USENIX Security Symposium, USENIX Security 2017, pp. 167–182. USENIX Association (2017)

    Google Scholar 

  42. Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11

    Chapter  Google Scholar 

  43. Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 239–248. ACM (2012)

    Google Scholar 

  44. Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: SHELLOS: enabling fast detection and forensic analysis of code injection attacks. In: USENIX Security Symposium, pp. 183–200 (2011)

    Google Scholar 

  45. Šrndic, N., Laskov, P.: Mimicus: a library for adversarial classifier evasion (2016)

    Google Scholar 

  46. Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security, p. 4. ACM (2011)

    Google Scholar 

  47. Willems, C., Freiling, F.C., Holz, T.: Using memory management to detect and extract illegitimate code for malware analysis. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 179–188. ACM (2012)

    Google Scholar 

  48. Xu, M., Kim, T.: PlatPal: detecting malicious documents with platform diversity. In: 26th USENIX Security Symposium, USENIX Security 2017, pp. 271–287. USENIX Association (2017)

    Google Scholar 

  49. Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)

    Google Scholar 

Download references

Acknowledgement

This research was supported, in part, by the Intel Science and Technology Center for Adversary-Resilient Security Analytics. Some malware samples were provided by the Georgia Tech Research Institute Apiary framework. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carter Yagemann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yagemann, C., Sultana, S., Chen, L., Lee, W. (2019). Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces. In: Lin, Z., Papamanthou, C., Polychronakis, M. (eds) Information Security. ISC 2019. Lecture Notes in Computer Science(), vol 11723. Springer, Cham. https://doi.org/10.1007/978-3-030-30215-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30215-3_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30214-6

  • Online ISBN: 978-3-030-30215-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics