A Dynamic Malicious Document Detection Method Based on Multi-Memory Features

Wang, Yuanyuan; Li, Gengwang; Yu, Min; Chow, Kam-Pui; Jiang, Jianguo; Meng, Xiang; Huang, Weiqing

doi:10.1007/978-3-031-42991-0_11

Yuanyuan Wang¹⁷,
Gengwang Li¹⁷,
Min Yu¹⁷,
Kam-Pui Chow¹⁸,
Jianguo Jiang¹⁷,
Xiang Meng¹⁷ &
…
Weiqing Huang¹⁷

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 687))

Included in the following conference series:

IFIP International Conference on Digital Forensics

120 Accesses

Abstract

The massive use of Microsoft Office documents underscores the need for effective malicious document detection techniques. Most detection methods characterize document behavior using application programming interface traces or other descriptive information, but ignore memory information due to inherent difficulties. Since many malicious behavior patterns are only manifested in memory, these detection methods are vulnerable to ubiquity evasion attacks. One difficulty in extracting malicious behavior information from memory is that only high-coverage memory dump sequences are meaningful, but no established methods can be employed. Another difficulty is that no efficient method exists for representing the numerous long memory dump sequences associated with malicious document samples.

This chapter describes a multi-memory-feature-based method that leverages memory information to detect malicious documents. The detection method employs a high-coverage memory dump service and a multiple memory dump sequence reduction approach. The memory dump service hooks system application programming interfaces to cover the entire lifetimes of processes while also monitoring the initial Office process and every spawned subprocess. The multiple memory dump sequence reduction approach efficiently represents each memory dump in terms of the difference from its adjacent dump. Ablation experiments demonstrate that the memory dump sequence reduction approach performs best using a long short-term memory classifier, yielding an accuracy of \(98.27\%\). Experiments also demonstrate that the detection method outperforms state-of-the-art methods based on application programming interfaces in terms of accuracy and precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Bozkir, E. Tahillioglu, M. Aydos and I. Kara, Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision, Computers and Security, vol. 103, article no. 102166, 2021.
Google Scholar
A. Cohen, N. Nissim, L. Rokach and Y. Elovici, SFEM: Structural feature extraction methodology for the detection of malicious Office documents using machine learning methods, Expert Systems with Applications, vol. 63, pp. 324–343, 2016.
Google Scholar
I. Corona, D. Maiorca, D. Ariu and G. Giacinto, Lux0R: Detection of malicious PDF-embedded JavaScript code through discriminant analysis of API references, Proceedings of the Workshop on Artificial Intelligence and Security, pp. 47–57, 2014.
Google Scholar
M. Cova, C. Kruegel and G. Vigna, Detection and analysis of drive-by-download attacks and malicious JavaScript code, Proceedings of the Nineteenth International Conference on the World Wide Web, pp. 281–290, 2010.
Google Scholar
C. Curtsinger, B. Livshits, B. Zorn and C. Seifert, ZOZZLE: Fast and precise in-browser JavaScript malware detection, Proceedings of the Twentieth USENIX Security Symposium, 2011.
Google Scholar
Y. Dai, H. Li, Y. Qian, R. Yang and M. Zheng, SMASH: A malware detection method based on multi-feature ensemble learning, IEEE Access, vol. 7, pp. 112588–112597, 2019.
Google Scholar
C. Guarnieri, M. Schloesser, J. Bremer and A. Tanasi, Cuckoo Sandbox open-source automated malware analysis, presented at Black Hat USA, 2013.
Google Scholar
D. Javaheri and M. Hosseinzadeh, A framework for recognition and confronting of obfuscated malware based on memory dumping and filter drivers, Wireless Personal Communications, vol. 98(1), pp. 119–137, 2018.
Google Scholar
Kaspersky North America, Eight times more users attacked via an old Microsoft Office vulnerability in Q2, Press Release, Woburn, Massachusetts (www.kaspersky.com/about/press-releases/2022_eight-times-more-users-attacked-via-an-old-microsoft-office-vulnerability-in-q2), August 15, 2022.
P. Laskov and N. Srndic, Static detection of malicious JavaScript-bearing PDF documents, Proceedings of the Twenty-Seventh Annual Computer Security Applications Conference, pp. 373–382, 2011.
Google Scholar
J. Lin and H. Pao, Multi-view malicious document detection, Proceedings of the Conference on Technologies and Applications of Artificial Intelligence, pp. 170–175, 2013.
Google Scholar
L. Liu, X. He, L. Liu, L. Qing, Y. Fang and J. Liu, Capturing the symptoms of malicious code in electronic documents by file entropy signals combined with machine learning, Applied Soft Computing, vol. 82, article no. 105598, 2019.
Google Scholar
X. Lu, J. Zhuge, R. Wang, Y. Cao and Y. Chen, De-obfuscation and detection of malicious PDF files with high accuracy, Proceedings of the Forty-Sixth Hawaii International Conference on System Sciences, pp. 4890–4899, 2013.
Google Scholar
D. Maiorca, G. Giacinto and I. Corona, A pattern recognition system for malicious PDF file detection, Proceedings of the Eighth International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 510–524, 2012.
Google Scholar
M. Mimura and T. Ohminami, Using LSI to detect unknown malicious VBA macros, Journal of Information Processing, vol. 28, pp. 493–501, 2020.
Google Scholar
T. Mohammed, L. Nataraj, S. Chikkagoudar, S. Chandrasekaran and B. Manjunath, HAPSSA: Holistic approach to PDF malware detection using signal and statistical analysis, Proceedings of the IEEE Military Communications Conference, pp. 709–714, 2021.
Google Scholar
N. Nissim, O. Lahav, A. Cohen, Y. Elovici and L. Rokach, Volatile memory analysis using the minhash method for efficient and secure detection of malware in private clouds, Computers and Security, vol. 87, article no. 101590, 2019.
Google Scholar
T. Panker and N. Nissim, Leveraging malicious behavior traces from volatile memory using machine learning methods for trusted unknown malware detection in Linux cloud environments, Knowledge-Based Systems, vol. 226, article no. 107095, 2021.
Google Scholar
H. Pareek, P. Eswari and N. Babu, Entropy and n-gram analysis of malicious PDF documents, International Journal of Engineering and Technology, vol. 2(2), 2013.
Google Scholar
C. Rathnayaka and A. Jamdagni, An efficient approach for advanced malware analysis using a memory forensic technique, Proceedings of the Sixteenth IEEE International Conference on Trust, Security and Privacy in Computing and Communications, Eleventh IEEE International Conference on Big Data Science and Engineering and Fourteenth IEEE International Conference on Embedded Software and Systems, pp. 1145–1150, 2017.
Google Scholar
K. Rieck, T. Krueger and A. Dewald, Cujo: Efficient detection and prevention of drive-by-download attacks, Proceedings of the Twenty-Sixth Annual Computer Security Applications Conference, pp. 31–39, 2010.
Google Scholar
T. Schreck, S. Berger and J. Gobel, BISSAM: Automatic vulnerability identification of Office documents, Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 204–213, 2012.
Google Scholar
M. Shafiq, S. Khayam, and M. Farooq, Embedded malware detection using Markov n-grams, Proceedings of the Fifth International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 88–107, 2008.
Google Scholar
N. Srndic and P. Laskov, Detection of malicious PDF files based on hierarchical document structure, Proceedings of the Twentieth Annual Network and Distributed System Security Symposium, 2013.
Google Scholar
N. Srndic and P. Laskov, Practical evasion of a learning-based classifier: A case study, Proceedings of the IEEE Symposium on Security and Privacy, pp. 197–211, 2014.
Google Scholar
N. Srndic and P. Laskov, Hidost: A static machine-learning-based detector of malicious files, EURASIP Journal on Information Security, vol. 2016(1), article no. 45, 2016.
Google Scholar
S. Stolfo, K. Wang and W. Li, Towards stealthy malware detection, in Malware Detection, M. Christodorescu, S. Jha, D. Maughan, D. Song and C. Wang (Eds.), Springer, Boston, Massachusetts, pp. 231–249, 2007.
Google Scholar
Z. Tzermias, G. Sykiotakis, M. Polychronakis and E. Markatos, Combining static and dynamic analysis for the detection of malicious documents, Proceedings of the Fourth European Workshop on System Security, article no. 4, 2011.
Google Scholar
C. Willems, T. Holz and F. Freiling, Toward automated dynamic malware analysis using CWSandbox, IEEE Security and Privacy, vol. 5(2), pp. 32–39, 2007.
Google Scholar
W. Xu, Y. Qi and D. Evans, Automatically evading classifiers: A case study on PDF malware classifiers, Proceedings of the Twenty-Third Network and Distributed Systems Symposium, vol. 10, 2016.
Google Scholar
Z. Zhang, P. Qi and W. Wang, Dynamic malware analysis with feature engineering and feature learning, Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, Thirty-Second Innovative Applications of Artificial Intelligence Conference and Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 1210–1217, 2020.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Yuanyuan Wang, Gengwang Li, Min Yu, Jianguo Jiang, Xiang Meng & Weiqing Huang
University of Hong Kong, Hong Kong, China
Kam-Pui Chow

Authors

Yuanyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gengwang Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kam-Pui Chow
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Meng
View author publications
You can also search for this author in PubMed Google Scholar
Weiqing Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Yu .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, Air Force Institute of Technology, Wright-Patterson AFB, OH, USA
Gilbert Peterson
Keplinger Hall 3315, University of Tulsa, Tulsa, OK, USA
Sujeet Shenoi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, Y. et al. (2023). A Dynamic Malicious Document Detection Method Based on Multi-Memory Features. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XIX. DigitalForensics 2023. IFIP Advances in Information and Communication Technology, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-031-42991-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-42991-0_11
Published: 19 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42990-3
Online ISBN: 978-3-031-42991-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics