Malware classification method based on feature fusion

Yan, Hao; Zhang, Jian; Tang, Zhangguo; Long, Hancheng; Zhu, Min; Zhang, Tianyue; Luo, Linglong; Li, Huanzhou

doi:10.1007/s10207-025-01013-3

Malware classification method based on feature fusion

Regular Contribution
Published: 16 March 2025

Volume 24, article number 97, (2025)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Hao Yan^1,2,
Jian Zhang^1,2,
Zhangguo Tang^1,2,
Hancheng Long^1,2,
Min Zhu^1,2,
Tianyue Zhang^1,2,
Linglong Luo^1,2 &
…
Huanzhou Li^1,2

153 Accesses
Explore all metrics

Abstract

With the continuous escalation of cyberattacks, the forms of malware have become increasingly diverse, posing significant security threats to enterprises, government agencies, and individual users. Malware developers often employ techniques such as packing and obfuscation to evade detection, making traditional detection methods less effective. This study proposes a malware family classification method based on feature fusion and a two-layer classification framework. First, readable characters, bytes, and opcodes are extracted from the malware binary and disassembly files. Frequency and semantic features are extracted from both opcodes and readable characters, followed by frequency fusion and semantic fusion. A Markov image is generated from the byte transfer probability matrix. Next, a two-layer classification framework combining deep learning and traditional machine learning is designed based on the fused features and Markov image. This framework effectively integrates the advantages of different feature dimensions and models. In the first layer, each feature is detected, and in the second layer, the prediction probabilities of each feature are fused. Experimental results show that the proposed method achieves a malware family classification accuracy of 99.4%, outperforming other compared methods. The Macro-F1 score also improves by 1.4% compared to the best method. The approach reduces the impact of malware packing, obfuscation techniques, and data imbalance on classification performance, providing an effective solution for malware classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malware Detection Using Machine Learning and Deep Learning

A Literature Review of Various Analysis Methods and Classification techniques of Malware

Malware Classification by Using Deep Learning Framework

Data availability

Data will be made available on request.

References

Threat volume rises: Cybercriminals to release 411,000 malicious files per day in 2023 | Kaspersky [EB/OL]. [2024–07–29] .https://www.kaspersky.com.cn/about/press-releases/2023_rising-threats.
Zhi-wen, W., Guang-qi, L., et al.: Survey on machine-learning-based malware identification research [J]. J. Chinese Comput. Syst. 43(12), 2628–2637 (2022)
MATH Google Scholar
Lo, W., Alqahtani, H., Thakur, K., et al.: A hybrid deep learning based intrusion detection system using spatial-temporal representation of in-vehicle network traffic[J]. Vehicular Commun. 35, 100471 (2022)
MATH Google Scholar
Yang, Y., Lin, Y., Li, Z., et al.: GooseBt: a programmable malware detection framework based on process, file, registry, and COM monitoring[J]. Comput. Commun. 204, 24–32 (2023)
MATH Google Scholar
Kumar, G., Alqahtani, H.: Machine learning techniques for intrusion detection systems in SDN-recent advances, challenges and future directions[J]. CMES – Comput. Model. Eng. Sci. 134(1), 89–119 (2022)
MATH Google Scholar
Bo-na, X., Jin, L.: Malware classification method based on improved CNN[J]. Acta Electron. Sin. 51(5), 1187–1197 (2023)
MATH Google Scholar
Gulmez S, Kakisim A G, Sogukpinar I. XRan: Explainable deep learning-based ransomware detection using dynamic analysis[J]. Computers & Security, 2024: 103703.
Sihag, V., Vardhan, M., Singh, P.: A survey of android application and malware hardening[J]. Comput. Sci. Rev. 39(1), 100365 (2021)
MATH Google Scholar
Damodaran, A., Troia, F.D., Visaggio, C.A., et al.: A comparison of static, dynamic, and hybrid analysis for malware detection[J]. J. Comput. Virol. Hack. Tech. 13(1), 1–12 (2017)
MATH Google Scholar
Jeon, S., Moon, J.: Malware-detection method with a convolutional recurrent neural network using opcode sequences[J]. Inf. Sci. 535, 1–15 (2020)
MathSciNet MATH Google Scholar
Parildi, E.S., Hatzinakos, D., Lawryshyn, Y.: Deep learning-aided runtime opcode-based Windows malware detection[J]. Neural Comput. Appl. 18, 11963–11983 (2021)
Google Scholar
Wang, Q., Qian, Q.: Malicious code classification based on opcode sequences and textCNN network[J]. J. Inf. Security Appl. 67, 103151 (2022)
MATH Google Scholar
Chandak A, Lee W, Stamp M. A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification[G]//STAMP M, ALAZAB M, SHALAGINOV A.Malware Analysis Using Artificial Intelligence and Deep Learning. 2021: 287–320.
Ito R, Mimura M. Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques[C]//2019 14th Asia Joint Conference on Information Security (AsiaJCIS). 2019: 1–8.
Mimura, M.: Evaluation of printable character-based malicious PE file-detection method[J]. Internet of Things 19, 100521 (2022)
Google Scholar
Nataraj L, Karthikeyan S, Jacob G, et al. Malware Images: Visualization and Automatic Classification[J]. ACM, 2011.
Vasan, D., Alazab, M., Wassan, S., et al.: IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture[J]. Comput. Netw. 171, 107138 (2020)
Google Scholar
Yuan, B., Wang, J., Liu, D., et al.: Byte-level malware classification based on markov images and deep learning[J]. Comput. Secur. 92, 101740 (2020)
MATH Google Scholar
Li, S., Jiang, L., Zhang, Q., et al.: A malicious mining code detection method based on multi-features fusion[J]. IEEE Trans. Netw. Sci. Eng. 10(5), 2731–2739 (2023)
MATH Google Scholar
Gibert, D., Mateu, C., Planes, J.: HYDRA: a multimodal deep learning framework for malware classification[J]. Comput. Secur. 95, 101873 (2020)
MATH Google Scholar
Naseem F, Aris A, Babun L, et al. MINOS: A Lightweight Real-Time Cryptojacking Detection System[C]//Proceedings 2021 Network and Distributed System Security Symposium. 2021.
Li, S., Tang, Z., Li, H., et al.: GMADV: an android malware variant generation and classification adversarial training framework[J]. J. Inf. Security Appl. 84, 103800 (2024)
Google Scholar
Ronen R, Radu M, Feuerstein C, et al. Microsoft Malware Classification Challenge[J]. 2018.
Darem A, Abawajy J, Makkar A, et al. Visualization and deep-learning-based malware variant detection using OpCode-level features[J]. Future Generation Computer Systems, 2021, (Suppl C): 314–323.
Xu, Z., Li, J., Lv, Z., et al.: A graph spatial-temporal model for predicting population density of key areas[J]. Comput. Electr. Eng. 93, 107235 (2021)
MATH Google Scholar
Yuan Z, Yu Y, Wu Y, et al. Prefix Tuning for Few-shot Malware Classification with Supervised Contrastive Cross-Entropy Learning[C]//2024 International Joint Conference on Neural Networks (IJCNN). 2024: 1–8.
Raff E, Barker J, Sylvester J, et al. Malware Detection by Eating a Whole EXE[C]//AAAI Conference on Artificial Intelligence. 2018.
Jeon, J., Jeong, B., Baek, S., et al.: Static multi feature-based malware detection using multi SPP-net in smart IoT environments[J]. IEEE Trans. Inf. Forensics Secur. 19, 2487–2500 (2024)
MATH Google Scholar
Li, S., Li, Y., Wu, X., et al.: Imbalanced malware family classification using multimodal fusion and weight self-learning[J]. IEEE Trans. Intell. Transp. Syst. 24(7), 7642–7652 (2023)
MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Project of the Key Laboratory of Wireless Sensor Networks in University of Sichuan Province (WSN2022001).

Funding

Key Laboratory of Wireless Sensor Networks in University of Sichuan Province, WSN2022001.

Author information

Authors and Affiliations

School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101, China
Hao Yan, Jian Zhang, Zhangguo Tang, Hancheng Long, Min Zhu, Tianyue Zhang, Linglong Luo & Huanzhou Li
Institute of Network and Communication Technology, Sichuan Normal University, Chengdu, 610101, China
Hao Yan, Jian Zhang, Zhangguo Tang, Hancheng Long, Min Zhu, Tianyue Zhang, Linglong Luo & Huanzhou Li

Authors

Hao Yan
View author publications
You can also search for this author inPubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhangguo Tang
View author publications
You can also search for this author inPubMed Google Scholar
Hancheng Long
View author publications
You can also search for this author inPubMed Google Scholar
Min Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Tianyue Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Linglong Luo
View author publications
You can also search for this author inPubMed Google Scholar
Huanzhou Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Hao Yan: Visualization, Writing – original draft. Huanzhou Li: Methodology, Supervision. Jian Zhang: Formal analysis, Project administration. Zhangguo Tang: Conceptualization, Data curation. Hancheng Long: Processed the data. Min Zhu: Software. Tianyue Zhang: Prepared the figures. Linglong Luo: Investigation.

Corresponding author

Correspondence to Huanzhou Li.

Ethics declarations

Conflicts of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, H., Zhang, J., Tang, Z. et al. Malware classification method based on feature fusion. Int. J. Inf. Secur. 24, 97 (2025). https://doi.org/10.1007/s10207-025-01013-3

Download citation

Accepted: 27 February 2025
Published: 16 March 2025
DOI: https://doi.org/10.1007/s10207-025-01013-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malware classification method based on feature fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Malware Detection Using Machine Learning and Deep Learning

A Literature Review of Various Analysis Methods and Classification techniques of Malware

Malware Classification by Using Deep Learning Framework

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now