Abstract
The massive use of Microsoft Office documents underscores the need for effective malicious document detection techniques. Most detection methods characterize document behavior using application programming interface traces or other descriptive information, but ignore memory information due to inherent difficulties. Since many malicious behavior patterns are only manifested in memory, these detection methods are vulnerable to ubiquity evasion attacks. One difficulty in extracting malicious behavior information from memory is that only high-coverage memory dump sequences are meaningful, but no established methods can be employed. Another difficulty is that no efficient method exists for representing the numerous long memory dump sequences associated with malicious document samples.
This chapter describes a multi-memory-feature-based method that leverages memory information to detect malicious documents. The detection method employs a high-coverage memory dump service and a multiple memory dump sequence reduction approach. The memory dump service hooks system application programming interfaces to cover the entire lifetimes of processes while also monitoring the initial Office process and every spawned subprocess. The multiple memory dump sequence reduction approach efficiently represents each memory dump in terms of the difference from its adjacent dump. Ablation experiments demonstrate that the memory dump sequence reduction approach performs best using a long short-term memory classifier, yielding an accuracy of \(98.27\%\). Experiments also demonstrate that the detection method outperforms state-of-the-art methods based on application programming interfaces in terms of accuracy and precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Bozkir, E. Tahillioglu, M. Aydos and I. Kara, Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision, Computers and Security, vol. 103, article no. 102166, 2021.
A. Cohen, N. Nissim, L. Rokach and Y. Elovici, SFEM: Structural feature extraction methodology for the detection of malicious Office documents using machine learning methods, Expert Systems with Applications, vol. 63, pp. 324–343, 2016.
I. Corona, D. Maiorca, D. Ariu and G. Giacinto, Lux0R: Detection of malicious PDF-embedded JavaScript code through discriminant analysis of API references, Proceedings of the Workshop on Artificial Intelligence and Security, pp. 47–57, 2014.
M. Cova, C. Kruegel and G. Vigna, Detection and analysis of drive-by-download attacks and malicious JavaScript code, Proceedings of the Nineteenth International Conference on the World Wide Web, pp. 281–290, 2010.
C. Curtsinger, B. Livshits, B. Zorn and C. Seifert, ZOZZLE: Fast and precise in-browser JavaScript malware detection, Proceedings of the Twentieth USENIX Security Symposium, 2011.
Y. Dai, H. Li, Y. Qian, R. Yang and M. Zheng, SMASH: A malware detection method based on multi-feature ensemble learning, IEEE Access, vol. 7, pp. 112588–112597, 2019.
C. Guarnieri, M. Schloesser, J. Bremer and A. Tanasi, Cuckoo Sandbox open-source automated malware analysis, presented at Black Hat USA, 2013.
D. Javaheri and M. Hosseinzadeh, A framework for recognition and confronting of obfuscated malware based on memory dumping and filter drivers, Wireless Personal Communications, vol. 98(1), pp. 119–137, 2018.
Kaspersky North America, Eight times more users attacked via an old Microsoft Office vulnerability in Q2, Press Release, Woburn, Massachusetts (www.kaspersky.com/about/press-releases/2022_eight-times-more-users-attacked-via-an-old-microsoft-office-vulnerability-in-q2), August 15, 2022.
P. Laskov and N. Srndic, Static detection of malicious JavaScript-bearing PDF documents, Proceedings of the Twenty-Seventh Annual Computer Security Applications Conference, pp. 373–382, 2011.
J. Lin and H. Pao, Multi-view malicious document detection, Proceedings of the Conference on Technologies and Applications of Artificial Intelligence, pp. 170–175, 2013.
L. Liu, X. He, L. Liu, L. Qing, Y. Fang and J. Liu, Capturing the symptoms of malicious code in electronic documents by file entropy signals combined with machine learning, Applied Soft Computing, vol. 82, article no. 105598, 2019.
X. Lu, J. Zhuge, R. Wang, Y. Cao and Y. Chen, De-obfuscation and detection of malicious PDF files with high accuracy, Proceedings of the Forty-Sixth Hawaii International Conference on System Sciences, pp. 4890–4899, 2013.
D. Maiorca, G. Giacinto and I. Corona, A pattern recognition system for malicious PDF file detection, Proceedings of the Eighth International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 510–524, 2012.
M. Mimura and T. Ohminami, Using LSI to detect unknown malicious VBA macros, Journal of Information Processing, vol. 28, pp. 493–501, 2020.
T. Mohammed, L. Nataraj, S. Chikkagoudar, S. Chandrasekaran and B. Manjunath, HAPSSA: Holistic approach to PDF malware detection using signal and statistical analysis, Proceedings of the IEEE Military Communications Conference, pp. 709–714, 2021.
N. Nissim, O. Lahav, A. Cohen, Y. Elovici and L. Rokach, Volatile memory analysis using the minhash method for efficient and secure detection of malware in private clouds, Computers and Security, vol. 87, article no. 101590, 2019.
T. Panker and N. Nissim, Leveraging malicious behavior traces from volatile memory using machine learning methods for trusted unknown malware detection in Linux cloud environments, Knowledge-Based Systems, vol. 226, article no. 107095, 2021.
H. Pareek, P. Eswari and N. Babu, Entropy and n-gram analysis of malicious PDF documents, International Journal of Engineering and Technology, vol. 2(2), 2013.
C. Rathnayaka and A. Jamdagni, An efficient approach for advanced malware analysis using a memory forensic technique, Proceedings of the Sixteenth IEEE International Conference on Trust, Security and Privacy in Computing and Communications, Eleventh IEEE International Conference on Big Data Science and Engineering and Fourteenth IEEE International Conference on Embedded Software and Systems, pp. 1145–1150, 2017.
K. Rieck, T. Krueger and A. Dewald, Cujo: Efficient detection and prevention of drive-by-download attacks, Proceedings of the Twenty-Sixth Annual Computer Security Applications Conference, pp. 31–39, 2010.
T. Schreck, S. Berger and J. Gobel, BISSAM: Automatic vulnerability identification of Office documents, Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 204–213, 2012.
M. Shafiq, S. Khayam, and M. Farooq, Embedded malware detection using Markov n-grams, Proceedings of the Fifth International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 88–107, 2008.
N. Srndic and P. Laskov, Detection of malicious PDF files based on hierarchical document structure, Proceedings of the Twentieth Annual Network and Distributed System Security Symposium, 2013.
N. Srndic and P. Laskov, Practical evasion of a learning-based classifier: A case study, Proceedings of the IEEE Symposium on Security and Privacy, pp. 197–211, 2014.
N. Srndic and P. Laskov, Hidost: A static machine-learning-based detector of malicious files, EURASIP Journal on Information Security, vol. 2016(1), article no. 45, 2016.
S. Stolfo, K. Wang and W. Li, Towards stealthy malware detection, in Malware Detection, M. Christodorescu, S. Jha, D. Maughan, D. Song and C. Wang (Eds.), Springer, Boston, Massachusetts, pp. 231–249, 2007.
Z. Tzermias, G. Sykiotakis, M. Polychronakis and E. Markatos, Combining static and dynamic analysis for the detection of malicious documents, Proceedings of the Fourth European Workshop on System Security, article no. 4, 2011.
C. Willems, T. Holz and F. Freiling, Toward automated dynamic malware analysis using CWSandbox, IEEE Security and Privacy, vol. 5(2), pp. 32–39, 2007.
W. Xu, Y. Qi and D. Evans, Automatically evading classifiers: A case study on PDF malware classifiers, Proceedings of the Twenty-Third Network and Distributed Systems Symposium, vol. 10, 2016.
Z. Zhang, P. Qi and W. Wang, Dynamic malware analysis with feature engineering and feature learning, Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, Thirty-Second Innovative Applications of Artificial Intelligence Conference and Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 1210–1217, 2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this chapter
Cite this chapter
Wang, Y. et al. (2023). A Dynamic Malicious Document Detection Method Based on Multi-Memory Features. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XIX. DigitalForensics 2023. IFIP Advances in Information and Communication Technology, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-031-42991-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-42991-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42990-3
Online ISBN: 978-3-031-42991-0
eBook Packages: Computer ScienceComputer Science (R0)