Abstract
With the continuous escalation of cyberattacks, the forms of malware have become increasingly diverse, posing significant security threats to enterprises, government agencies, and individual users. Malware developers often employ techniques such as packing and obfuscation to evade detection, making traditional detection methods less effective. This study proposes a malware family classification method based on feature fusion and a two-layer classification framework. First, readable characters, bytes, and opcodes are extracted from the malware binary and disassembly files. Frequency and semantic features are extracted from both opcodes and readable characters, followed by frequency fusion and semantic fusion. A Markov image is generated from the byte transfer probability matrix. Next, a two-layer classification framework combining deep learning and traditional machine learning is designed based on the fused features and Markov image. This framework effectively integrates the advantages of different feature dimensions and models. In the first layer, each feature is detected, and in the second layer, the prediction probabilities of each feature are fused. Experimental results show that the proposed method achieves a malware family classification accuracy of 99.4%, outperforming other compared methods. The Macro-F1 score also improves by 1.4% compared to the best method. The approach reduces the impact of malware packing, obfuscation techniques, and data imbalance on classification performance, providing an effective solution for malware classification.














Similar content being viewed by others
Data availability
Data will be made available on request.
References
Threat volume rises: Cybercriminals to release 411,000 malicious files per day in 2023 | Kaspersky [EB/OL]. [2024–07–29] .https://www.kaspersky.com.cn/about/press-releases/2023_rising-threats.
Zhi-wen, W., Guang-qi, L., et al.: Survey on machine-learning-based malware identification research [J]. J. Chinese Comput. Syst. 43(12), 2628–2637 (2022)
Lo, W., Alqahtani, H., Thakur, K., et al.: A hybrid deep learning based intrusion detection system using spatial-temporal representation of in-vehicle network traffic[J]. Vehicular Commun. 35, 100471 (2022)
Yang, Y., Lin, Y., Li, Z., et al.: GooseBt: a programmable malware detection framework based on process, file, registry, and COM monitoring[J]. Comput. Commun. 204, 24–32 (2023)
Kumar, G., Alqahtani, H.: Machine learning techniques for intrusion detection systems in SDN-recent advances, challenges and future directions[J]. CMES – Comput. Model. Eng. Sci. 134(1), 89–119 (2022)
Bo-na, X., Jin, L.: Malware classification method based on improved CNN[J]. Acta Electron. Sin. 51(5), 1187–1197 (2023)
Gulmez S, Kakisim A G, Sogukpinar I. XRan: Explainable deep learning-based ransomware detection using dynamic analysis[J]. Computers & Security, 2024: 103703.
Sihag, V., Vardhan, M., Singh, P.: A survey of android application and malware hardening[J]. Comput. Sci. Rev. 39(1), 100365 (2021)
Damodaran, A., Troia, F.D., Visaggio, C.A., et al.: A comparison of static, dynamic, and hybrid analysis for malware detection[J]. J. Comput. Virol. Hack. Tech. 13(1), 1–12 (2017)
Jeon, S., Moon, J.: Malware-detection method with a convolutional recurrent neural network using opcode sequences[J]. Inf. Sci. 535, 1–15 (2020)
Parildi, E.S., Hatzinakos, D., Lawryshyn, Y.: Deep learning-aided runtime opcode-based Windows malware detection[J]. Neural Comput. Appl. 18, 11963–11983 (2021)
Wang, Q., Qian, Q.: Malicious code classification based on opcode sequences and textCNN network[J]. J. Inf. Security Appl. 67, 103151 (2022)
Chandak A, Lee W, Stamp M. A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification[G]//STAMP M, ALAZAB M, SHALAGINOV A.Malware Analysis Using Artificial Intelligence and Deep Learning. 2021: 287–320.
Ito R, Mimura M. Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques[C]//2019 14th Asia Joint Conference on Information Security (AsiaJCIS). 2019: 1–8.
Mimura, M.: Evaluation of printable character-based malicious PE file-detection method[J]. Internet of Things 19, 100521 (2022)
Nataraj L, Karthikeyan S, Jacob G, et al. Malware Images: Visualization and Automatic Classification[J]. ACM, 2011.
Vasan, D., Alazab, M., Wassan, S., et al.: IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture[J]. Comput. Netw. 171, 107138 (2020)
Yuan, B., Wang, J., Liu, D., et al.: Byte-level malware classification based on markov images and deep learning[J]. Comput. Secur. 92, 101740 (2020)
Li, S., Jiang, L., Zhang, Q., et al.: A malicious mining code detection method based on multi-features fusion[J]. IEEE Trans. Netw. Sci. Eng. 10(5), 2731–2739 (2023)
Gibert, D., Mateu, C., Planes, J.: HYDRA: a multimodal deep learning framework for malware classification[J]. Comput. Secur. 95, 101873 (2020)
Naseem F, Aris A, Babun L, et al. MINOS: A Lightweight Real-Time Cryptojacking Detection System[C]//Proceedings 2021 Network and Distributed System Security Symposium. 2021.
Li, S., Tang, Z., Li, H., et al.: GMADV: an android malware variant generation and classification adversarial training framework[J]. J. Inf. Security Appl. 84, 103800 (2024)
Ronen R, Radu M, Feuerstein C, et al. Microsoft Malware Classification Challenge[J]. 2018.
Darem A, Abawajy J, Makkar A, et al. Visualization and deep-learning-based malware variant detection using OpCode-level features[J]. Future Generation Computer Systems, 2021, (Suppl C): 314–323.
Xu, Z., Li, J., Lv, Z., et al.: A graph spatial-temporal model for predicting population density of key areas[J]. Comput. Electr. Eng. 93, 107235 (2021)
Yuan Z, Yu Y, Wu Y, et al. Prefix Tuning for Few-shot Malware Classification with Supervised Contrastive Cross-Entropy Learning[C]//2024 International Joint Conference on Neural Networks (IJCNN). 2024: 1–8.
Raff E, Barker J, Sylvester J, et al. Malware Detection by Eating a Whole EXE[C]//AAAI Conference on Artificial Intelligence. 2018.
Jeon, J., Jeong, B., Baek, S., et al.: Static multi feature-based malware detection using multi SPP-net in smart IoT environments[J]. IEEE Trans. Inf. Forensics Secur. 19, 2487–2500 (2024)
Li, S., Li, Y., Wu, X., et al.: Imbalanced malware family classification using multimodal fusion and weight self-learning[J]. IEEE Trans. Intell. Transp. Syst. 24(7), 7642–7652 (2023)
Acknowledgements
This work was supported by the Project of the Key Laboratory of Wireless Sensor Networks in University of Sichuan Province (WSN2022001).
Funding
Key Laboratory of Wireless Sensor Networks in University of Sichuan Province, WSN2022001.
Author information
Authors and Affiliations
Contributions
Hao Yan: Visualization, Writing – original draft. Huanzhou Li: Methodology, Supervision. Jian Zhang: Formal analysis, Project administration. Zhangguo Tang: Conceptualization, Data curation. Hancheng Long: Processed the data. Min Zhu: Software. Tianyue Zhang: Prepared the figures. Linglong Luo: Investigation.
Corresponding author
Ethics declarations
Conflicts of interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yan, H., Zhang, J., Tang, Z. et al. Malware classification method based on feature fusion. Int. J. Inf. Secur. 24, 97 (2025). https://doi.org/10.1007/s10207-025-01013-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10207-025-01013-3