Skip to main content

Advertisement

A malware visualization method based on transition probability matrix suitable for imbalanced family classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Information technology brings us not only marvelous convenience and productivity, but also potential insecure factor, which may pose threats to our properties, data or even reputation. Malicious software is exactly an accomplice of such attacks. Fundamentally, the key step to deal with malicious software is to accurately identify and classify it. Although traditional static and dynamic analysis approaches could accomplish this task to some extent, they have intrinsic defects in terms of variant feature exaction, vulnerability to code obfuscation and encryption, or excessive resource consumption. Recently, CNN-based malware classification methods, which employ CNN models to classify visualized malware images, provide a promising way to accomplish malware classification tasks. However, most mainstream CNN models require inputs with a fixed size, while various sizes of original malware samples frequently lead to various sizes of malware visualization images. Simply resizing these images causes losses of malware features, resulting in drops of classification accuracy. In this paper, we propose a malware visualization method based on transition probabilities of malware operation codes to generate proper images with a uniform size as inputs for CNN models. As a result, the conventional resizing operations could be avoided. The proposed method is compatible with most mainstream CNN models. Moreover, the proposed method could address problems concerning insufficient or imbalanced datasets, which may challenge the classification abilities of CNN models. Experimental results demonstrate the excellent compatibility and classification performance of the proposed method in terms of accuracy, precision, recall and F1-score. For reproducible research, the source codes and training models of the proposed method are available at https://github.com/xchuxiao23/mal_cls.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability and Access

The datasets analysed during the current study are available in the BIG2015, https://www.kaggle.com/c/malware-classification/data; CICMaldroid2020, https://www.unb.ca/cic/datasets/maldroid-2020.html; Drebin, https://drebin.mlsec.org/.

References

  1. Kaspersky (2022) Kaspersky Security Bulletin 2022. Statistics. https://securelist.com/ksb-2022-statistics/108129/

  2. Kaspersky (2023) IT threat evolution Q1 2023. Mobile statistics. https://securelist.com/it-threat-evolution-q1-2023-mobile-statistics/109893/

  3. Cesare S, Xiang Y, Zhou W (2013) Control flow-based malware variant detection. IEEE Trans Dependable Secure Comput 11(4):307–317

    Article  MATH  Google Scholar 

  4. Fang W, He J, Li W, et al (2023) Comprehensive android malware detection based on federated learning architecture. IEEE Trans Inf Forensics Sec

  5. Chen X, Hao Z, Li L et al (2022) Cruparamer: Learning on parameter-augmented api sequences for malware detection. IEEE Trans Inf Forensics Sec 17:788–803

    Article  MATH  Google Scholar 

  6. Fan M, Liu J, Luo X et al (2018) Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Trans Inf Forensics Sec 13(8):1890–1905

    Article  MATH  Google Scholar 

  7. Wang S, Yan Q, Chen Z et al (2017) Detecting android malware leveraging text semantics of network flows. IEEE Trans Inf Forensics Sec 13(5):1096–1109

    Article  MATH  Google Scholar 

  8. Cai H, Meng N, Ryder B et al (2018) Droidcat: Effective android malware detection and categorization via app-level profiling. IEEE Trans Inf Forensics Sec 14(6):1455–1470

    Article  MATH  Google Scholar 

  9. Shan Z, Wang X (2013) Growing grapes in your computer to defend against malware. IEEE Trans Inf Forensics Sec 9(2):196–207

    Article  MATH  Google Scholar 

  10. Nataraj L, Karthikeyan S, Jacob G, et al (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security, pp 1–7

  11. Zou B, Cao C, Tao F et al (2022) Imclnet: A lightweight deep neural network for image-based malware classification. J Inf Sec Appl 70:103313

    MATH  Google Scholar 

  12. Kalash M, Rochan M, Mohammed N, et al (2018) Malware classification with deep convolutional neural networks. In: 2018 9th IFIP international conference on new technologies, mobility and security (NTMS), IEEE, pp 1–5

  13. Wu W, Peng H, Zhu H et al (2024) Csmc: A secure and efficient visualized malware classification method inspired by compressed sensing. Sensors 24(13):4253

    Article  MATH  Google Scholar 

  14. Wu W, Peng H, Zhu H, et al (2024) Mvc-rsn: A malware classification method with variant identification ability. IEEE Int Things J

  15. Li Q, Mi J, Li W et al (2021) Cnn-based malware variants detection method for internet of things. IEEE Int Things J 8(23):16946–16962

    Article  MATH  Google Scholar 

  16. Hao J, Luo S, Pan L (2022) Eii-mbs: Malware family classification via enhanced adversarial instruction behavior semantic learning. Comput Sec 122:102905

    Article  Google Scholar 

  17. Ronen R, Radu M, Feuerstein C, et al (2018) Microsoft malware classification challenge. arXiv:1802.10135

  18. Arp D, Spreitzenbarth M, Hubner M, et al (2014) Drebin: Effective and explainable detection of android malware in your pocket. In: Ndss, pp 23–26

  19. Anderson B, Quist D, Neil J et al (2011) Graph-based malware detection using dynamic analysis. J Comput Virol 7:247–258

    Article  MATH  Google Scholar 

  20. Narayanan BN, Djaneye-Boundjou O, Kebede TM (2016) Performance analysis of machine learning and pattern recognition algorithms for malware classification. In: 2016 IEEE national aerospace and electronics conference (NAECON) and ohio innovation summit (OIS), IEEE, pp 338–342

  21. Kong Z, Xue J, Wang Y et al (2023) MalFSM: Feature Subset Selection Method for Malware Family Classification. Chinese J Electron 32(1):26–38

    Article  MATH  Google Scholar 

  22. Lin WC, Yeh YR (2022) Efficient malware classification by binary sequences with one-dimensional convolutional neural networks. Mathematics 10(4):608

    Article  MATH  Google Scholar 

  23. Gibert D, Mateu C, Planes J, et al (2018) Classification of malware by using structural entropy on convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence

  24. Vasan D, Alazab M, Wassan S et al (2020) IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138

    Article  Google Scholar 

  25. Geremias J, Viegas EK, Santin AO, et al (2022) Towards multi-view android malware detection through image-based deep learning. In: 2022 International wireless communications and mobile computing (IWCMC), IEEE, pp 572–577

  26. Yuan B, Wang J, Wu P et al (2021) Iot malware classification based on lightweight convolutional neural networks. IEEE Int Things J 9(5):3770–3783

    Article  MATH  Google Scholar 

  27. Xie N, Wang X, Wang W et al (2019) Fingerprinting android malware families. Front Comput Sci 13:637–646

    Article  MATH  Google Scholar 

  28. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223

  29. Mahdavifar S, Kadir AFA, Fatemi R et al (2020) Dynamic android malware category classification using semi-supervised deep learning. 2020 IEEE Intl Conf on Dependable. Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, pp 515–522

    MATH  Google Scholar 

  30. Mahdavifar S, Alhadidi D, Ghorbani AA (2022) Effective and efficient hybrid android malware classification using pseudo-label stacked auto-encoder. J Netw Syst Manag 30:1–34

    Article  MATH  Google Scholar 

  31. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  32. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  33. He K, Zhang X, Ren S, et al (2016) Identity mappings in deep residual networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, Springer, pp 630–645

  34. Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  35. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  36. Howard AG, Zhu M, Chen B, et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  37. Drew J, Moore T, Hahsler M (2016) Polymorphic malware detection using sequence classification methods. In: 2016 IEEE security and privacy workshops (SPW), IEEE, pp 81–87

  38. Manavi F, Hamzeh A (2017) A new method for malware detection using opcode visualization. In: 2017 Artificial intelligence and signal processing conference (AISP), IEEE, pp 96–102

  39. Rahul R, Anjali T, Menon VK, et al (2017) Deep learning for network flow analysis and malware classification. In: Security in Computing and Communications: 5th International Symposium, SSCC 2017, Manipal, India, September 13–16, 2017, Proceedings 5, Springer, pp 226–235

  40. Kim JY, Cho SB (2022) Obfuscated malware detection using deep generative model based on global/local features. Comput Sec 112:102501

    Article  MATH  Google Scholar 

  41. Gibert D, Mateu C, Planes J et al (2019) Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hacking Techniques 15:15–28

    Article  MATH  Google Scholar 

  42. Kim JY, Bu SJ, Cho SB (2017) Malware detection using deep transferred generative adversarial networks. In: Neural Information Processing: 24th international conference, ICONIP 2017, Guangzhou, China, November 14-18, 2017, Proceedings, Part I 24, Springer, pp 556–564

  43. Kim JY, Bu SJ, Cho SB (2018) Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf Sci 460:83–102

    Article  MATH  Google Scholar 

  44. Ren Z, Chen G, Lu W (2020) Malware visualization methods based on deep convolution neural networks. Multimed Tools Appl 79:10975–10993

    Article  MATH  Google Scholar 

  45. Padmavathi G, Shanmugapriya D, Roshni A (2022) Performance analysis of unsupervised machine learning methods for mobile malware detection. In: 2022 9th International conference on computing for sustainable global development (INDIACom), IEEE, pp 201–206

  46. Jo J, Cho J, Moon J (2023) A malware detection and extraction method for the related information using the vit attention mechanism on android operating system. Appl Sci 13(11):6839

    Article  MATH  Google Scholar 

  47. Kural T, Sönmez Y, Dener M (2021) Android malware analysis and benchmarking with deep learning. Düzce Üniversitesi Bilim ve Teknoloji Dergisi 9(6):289–302

    Article  MATH  Google Scholar 

  48. Niu W, Wang Y, Liu X, et al (2023) GCDroid: Android malware detection based on graph compression with reachability relationship extraction for iot devices. IEEE Int Things J

  49. Al-Fawa’reh M, Saif A, Jafar MT, et al (2020) Malware detection by eating a whole apk. In: 2020 15th International conference for internet technology and secured transactions (ICITST), IEEE, pp 1–7

  50. Pei X, Deng X, Tian S, et al (2022) A knowledge transfer-based semi-supervised federated learning for iot malware detection. IEEE Trans Dependable Secure Comput

  51. Martín A, Rodríguez-Fernández V, Camacho D (2018) CANDYMAN: Classifying android malware families by modelling dynamic traces with markov chains. Eng Appl Artif Intell 74:121–133

    Article  Google Scholar 

  52. Massarelli L, Aniello L, Ciccotelli C, et al (2017) Android malware family classification based on resource consumption over time. In: 2017 12th International conference on malicious and unwanted software (MALWARE), IEEE, pp 31–38

  53. Singh J, Thakur D, Ali F et al (2020) Deep feature extraction and classification of android malware images. Sensors 20(24):7013

    Article  MATH  Google Scholar 

  54. Gao H, Cheng S, Zhang W (2021) GDroid: Android malware detection and classification with graph convolutional network. Comput Sec 106:102264

    Article  MATH  Google Scholar 

  55. Elish KO, Elish MO, Almohri HM (2022) Lightweight, effective detection and characterization of mobile malware families. IEEE Trans Comput 71(11):2982–2995

    Article  MATH  Google Scholar 

  56. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  57. Wang F, Shi X, Yang F et al (2024) Malsort: Lightweight and efficient image-based malware classification using masked self-supervised framework with swin transformer. J Inf Sec Appl 83:103784

    Google Scholar 

  58. Zou B, Cao C, Wang L et al (2024) Facile: A capsule network with fewer capsules and richer hierarchical information for malware image classification. Comput Sec 137:103606

    Article  Google Scholar 

  59. Bao H, Li W, Chen H, et al (2024) Stories behind decisions: Towards interpretable malware family classification with hierarchical attention. Comput Sec pp 103943

  60. Xie Y, Luo X, Sun J (2024) Towards enhancing sequence-optimized malware representation with context-separated bi-directional long short-term memory and proximal policy optimization. IEEE Trans Dependable Secure Comput

  61. Alam MM, Raff E, Biderman SR, et al (2024) Holographic global convolutional networks for long-range prediction tasks in malware detection. In: International conference on artificial intelligence and statistics, PMLR, pp 4042–4050

  62. Zhou F, Wang D, Xiong Y, et al (2024) Famcf: A few-shot android malware family classification framework. Comput Sec pp 104027

  63. Kiraz Ö, Doğru İA (2024) Visualising static features and classifying android malware using a convolutional neural network approach. Appl Sci 14(11):4772

    Article  MATH  Google Scholar 

  64. Li S, Tang Z, Li H et al (2024) Gmadv: An android malware variant generation and classification adversarial training framework. J Inf Sec Appl 84:103800

    Google Scholar 

  65. Ansori DB, Slamet J, Ghufron MZ et al (2024) Android malware classification using gain ratio and ensembled machine learning. Int J Safety Sec Eng 14(1):259–266

    Article  MATH  Google Scholar 

  66. Zhang Y, Liao Z, Zhang N, et al (2024) Deep hashing for malware family classification and new malware identification. IEEE Int Things J

Download references

Funding

This work was supported in part by the Key R&D Program of Shandong Province, China under Grant 2021CXGC010107; in part by the National Key Research and Development Program of China under Grant 2020YFB1805402; in part by the National Natural Science Foundation of China under Grant 61972051 and Grant 62032002.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, H.P. and W.W.; methodology, W.W. ,C.X. and H.P.; software, C.X. and W.W.; validation, W.W., H.P., C.X. and L.L.; formal analysis, W.W. and C.X.; writing W.W., C.X. and Y.L.

Corresponding author

Correspondence to Haipeng Peng.

Ethics declarations

Conflict of Interests

The authors declare that they do not have any conflict of interest.

Ethical and Informed Consent

Ethics approval was not required for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, W., Peng, H., Xu, C. et al. A malware visualization method based on transition probability matrix suitable for imbalanced family classification. Appl Intell 55, 236 (2025). https://doi.org/10.1007/s10489-024-05911-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05911-2

Keywords