Skip to main content

Advertisement

Log in

Malware classification method based on feature fusion

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

With the continuous escalation of cyberattacks, the forms of malware have become increasingly diverse, posing significant security threats to enterprises, government agencies, and individual users. Malware developers often employ techniques such as packing and obfuscation to evade detection, making traditional detection methods less effective. This study proposes a malware family classification method based on feature fusion and a two-layer classification framework. First, readable characters, bytes, and opcodes are extracted from the malware binary and disassembly files. Frequency and semantic features are extracted from both opcodes and readable characters, followed by frequency fusion and semantic fusion. A Markov image is generated from the byte transfer probability matrix. Next, a two-layer classification framework combining deep learning and traditional machine learning is designed based on the fused features and Markov image. This framework effectively integrates the advantages of different feature dimensions and models. In the first layer, each feature is detected, and in the second layer, the prediction probabilities of each feature are fused. Experimental results show that the proposed method achieves a malware family classification accuracy of 99.4%, outperforming other compared methods. The Macro-F1 score also improves by 1.4% compared to the best method. The approach reduces the impact of malware packing, obfuscation techniques, and data imbalance on classification performance, providing an effective solution for malware classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

Data will be made available on request.

References

  1. Threat volume rises: Cybercriminals to release 411,000 malicious files per day in 2023 | Kaspersky [EB/OL]. [2024–07–29] .https://www.kaspersky.com.cn/about/press-releases/2023_rising-threats.

  2. Zhi-wen, W., Guang-qi, L., et al.: Survey on machine-learning-based malware identification research [J]. J. Chinese Comput. Syst. 43(12), 2628–2637 (2022)

    MATH  Google Scholar 

  3. Lo, W., Alqahtani, H., Thakur, K., et al.: A hybrid deep learning based intrusion detection system using spatial-temporal representation of in-vehicle network traffic[J]. Vehicular Commun. 35, 100471 (2022)

    MATH  Google Scholar 

  4. Yang, Y., Lin, Y., Li, Z., et al.: GooseBt: a programmable malware detection framework based on process, file, registry, and COM monitoring[J]. Comput. Commun. 204, 24–32 (2023)

    MATH  Google Scholar 

  5. Kumar, G., Alqahtani, H.: Machine learning techniques for intrusion detection systems in SDN-recent advances, challenges and future directions[J]. CMES – Comput. Model. Eng. Sci. 134(1), 89–119 (2022)

    MATH  Google Scholar 

  6. Bo-na, X., Jin, L.: Malware classification method based on improved CNN[J]. Acta Electron. Sin. 51(5), 1187–1197 (2023)

    MATH  Google Scholar 

  7. Gulmez S, Kakisim A G, Sogukpinar I. XRan: Explainable deep learning-based ransomware detection using dynamic analysis[J]. Computers & Security, 2024: 103703.

  8. Sihag, V., Vardhan, M., Singh, P.: A survey of android application and malware hardening[J]. Comput. Sci. Rev. 39(1), 100365 (2021)

    MATH  Google Scholar 

  9. Damodaran, A., Troia, F.D., Visaggio, C.A., et al.: A comparison of static, dynamic, and hybrid analysis for malware detection[J]. J. Comput. Virol. Hack. Tech. 13(1), 1–12 (2017)

    MATH  Google Scholar 

  10. Jeon, S., Moon, J.: Malware-detection method with a convolutional recurrent neural network using opcode sequences[J]. Inf. Sci. 535, 1–15 (2020)

    MathSciNet  MATH  Google Scholar 

  11. Parildi, E.S., Hatzinakos, D., Lawryshyn, Y.: Deep learning-aided runtime opcode-based Windows malware detection[J]. Neural Comput. Appl. 18, 11963–11983 (2021)

    Google Scholar 

  12. Wang, Q., Qian, Q.: Malicious code classification based on opcode sequences and textCNN network[J]. J. Inf. Security Appl. 67, 103151 (2022)

    MATH  Google Scholar 

  13. Chandak A, Lee W, Stamp M. A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification[G]//STAMP M, ALAZAB M, SHALAGINOV A.Malware Analysis Using Artificial Intelligence and Deep Learning. 2021: 287–320.

  14. Ito R, Mimura M. Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques[C]//2019 14th Asia Joint Conference on Information Security (AsiaJCIS). 2019: 1–8.

  15. Mimura, M.: Evaluation of printable character-based malicious PE file-detection method[J]. Internet of Things 19, 100521 (2022)

    Google Scholar 

  16. Nataraj L, Karthikeyan S, Jacob G, et al. Malware Images: Visualization and Automatic Classification[J]. ACM, 2011.

  17. Vasan, D., Alazab, M., Wassan, S., et al.: IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture[J]. Comput. Netw. 171, 107138 (2020)

    Google Scholar 

  18. Yuan, B., Wang, J., Liu, D., et al.: Byte-level malware classification based on markov images and deep learning[J]. Comput. Secur. 92, 101740 (2020)

    MATH  Google Scholar 

  19. Li, S., Jiang, L., Zhang, Q., et al.: A malicious mining code detection method based on multi-features fusion[J]. IEEE Trans. Netw. Sci. Eng. 10(5), 2731–2739 (2023)

    MATH  Google Scholar 

  20. Gibert, D., Mateu, C., Planes, J.: HYDRA: a multimodal deep learning framework for malware classification[J]. Comput. Secur. 95, 101873 (2020)

    MATH  Google Scholar 

  21. Naseem F, Aris A, Babun L, et al. MINOS: A Lightweight Real-Time Cryptojacking Detection System[C]//Proceedings 2021 Network and Distributed System Security Symposium. 2021.

  22. Li, S., Tang, Z., Li, H., et al.: GMADV: an android malware variant generation and classification adversarial training framework[J]. J. Inf. Security Appl. 84, 103800 (2024)

    Google Scholar 

  23. Ronen R, Radu M, Feuerstein C, et al. Microsoft Malware Classification Challenge[J]. 2018.

  24. Darem A, Abawajy J, Makkar A, et al. Visualization and deep-learning-based malware variant detection using OpCode-level features[J]. Future Generation Computer Systems, 2021, (Suppl C): 314–323.

  25. Xu, Z., Li, J., Lv, Z., et al.: A graph spatial-temporal model for predicting population density of key areas[J]. Comput. Electr. Eng. 93, 107235 (2021)

    MATH  Google Scholar 

  26. Yuan Z, Yu Y, Wu Y, et al. Prefix Tuning for Few-shot Malware Classification with Supervised Contrastive Cross-Entropy Learning[C]//2024 International Joint Conference on Neural Networks (IJCNN). 2024: 1–8.

  27. Raff E, Barker J, Sylvester J, et al. Malware Detection by Eating a Whole EXE[C]//AAAI Conference on Artificial Intelligence. 2018.

  28. Jeon, J., Jeong, B., Baek, S., et al.: Static multi feature-based malware detection using multi SPP-net in smart IoT environments[J]. IEEE Trans. Inf. Forensics Secur. 19, 2487–2500 (2024)

    MATH  Google Scholar 

  29. Li, S., Li, Y., Wu, X., et al.: Imbalanced malware family classification using multimodal fusion and weight self-learning[J]. IEEE Trans. Intell. Transp. Syst. 24(7), 7642–7652 (2023)

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Project of the Key Laboratory of Wireless Sensor Networks in University of Sichuan Province (WSN2022001).

Funding

Key Laboratory of Wireless Sensor Networks in University of Sichuan Province, WSN2022001.

Author information

Authors and Affiliations

Authors

Contributions

Hao Yan: Visualization, Writing – original draft. Huanzhou Li: Methodology, Supervision. Jian Zhang: Formal analysis, Project administration. Zhangguo Tang: Conceptualization, Data curation. Hancheng Long: Processed the data. Min Zhu: Software. Tianyue Zhang: Prepared the figures. Linglong Luo: Investigation.

Corresponding author

Correspondence to Huanzhou Li.

Ethics declarations

Conflicts of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, H., Zhang, J., Tang, Z. et al. Malware classification method based on feature fusion. Int. J. Inf. Secur. 24, 97 (2025). https://doi.org/10.1007/s10207-025-01013-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10207-025-01013-3

Keywords