Abstract
Information technology brings us not only marvelous convenience and productivity, but also potential insecure factor, which may pose threats to our properties, data or even reputation. Malicious software is exactly an accomplice of such attacks. Fundamentally, the key step to deal with malicious software is to accurately identify and classify it. Although traditional static and dynamic analysis approaches could accomplish this task to some extent, they have intrinsic defects in terms of variant feature exaction, vulnerability to code obfuscation and encryption, or excessive resource consumption. Recently, CNN-based malware classification methods, which employ CNN models to classify visualized malware images, provide a promising way to accomplish malware classification tasks. However, most mainstream CNN models require inputs with a fixed size, while various sizes of original malware samples frequently lead to various sizes of malware visualization images. Simply resizing these images causes losses of malware features, resulting in drops of classification accuracy. In this paper, we propose a malware visualization method based on transition probabilities of malware operation codes to generate proper images with a uniform size as inputs for CNN models. As a result, the conventional resizing operations could be avoided. The proposed method is compatible with most mainstream CNN models. Moreover, the proposed method could address problems concerning insufficient or imbalanced datasets, which may challenge the classification abilities of CNN models. Experimental results demonstrate the excellent compatibility and classification performance of the proposed method in terms of accuracy, precision, recall and F1-score. For reproducible research, the source codes and training models of the proposed method are available at https://github.com/xchuxiao23/mal_cls.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability and Access
The datasets analysed during the current study are available in the BIG2015, https://www.kaggle.com/c/malware-classification/data; CICMaldroid2020, https://www.unb.ca/cic/datasets/maldroid-2020.html; Drebin, https://drebin.mlsec.org/.
References
Kaspersky (2022) Kaspersky Security Bulletin 2022. Statistics. https://securelist.com/ksb-2022-statistics/108129/
Kaspersky (2023) IT threat evolution Q1 2023. Mobile statistics. https://securelist.com/it-threat-evolution-q1-2023-mobile-statistics/109893/
Cesare S, Xiang Y, Zhou W (2013) Control flow-based malware variant detection. IEEE Trans Dependable Secure Comput 11(4):307–317
Fang W, He J, Li W, et al (2023) Comprehensive android malware detection based on federated learning architecture. IEEE Trans Inf Forensics Sec
Chen X, Hao Z, Li L et al (2022) Cruparamer: Learning on parameter-augmented api sequences for malware detection. IEEE Trans Inf Forensics Sec 17:788–803
Fan M, Liu J, Luo X et al (2018) Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Trans Inf Forensics Sec 13(8):1890–1905
Wang S, Yan Q, Chen Z et al (2017) Detecting android malware leveraging text semantics of network flows. IEEE Trans Inf Forensics Sec 13(5):1096–1109
Cai H, Meng N, Ryder B et al (2018) Droidcat: Effective android malware detection and categorization via app-level profiling. IEEE Trans Inf Forensics Sec 14(6):1455–1470
Shan Z, Wang X (2013) Growing grapes in your computer to defend against malware. IEEE Trans Inf Forensics Sec 9(2):196–207
Nataraj L, Karthikeyan S, Jacob G, et al (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security, pp 1–7
Zou B, Cao C, Tao F et al (2022) Imclnet: A lightweight deep neural network for image-based malware classification. J Inf Sec Appl 70:103313
Kalash M, Rochan M, Mohammed N, et al (2018) Malware classification with deep convolutional neural networks. In: 2018 9th IFIP international conference on new technologies, mobility and security (NTMS), IEEE, pp 1–5
Wu W, Peng H, Zhu H et al (2024) Csmc: A secure and efficient visualized malware classification method inspired by compressed sensing. Sensors 24(13):4253
Wu W, Peng H, Zhu H, et al (2024) Mvc-rsn: A malware classification method with variant identification ability. IEEE Int Things J
Li Q, Mi J, Li W et al (2021) Cnn-based malware variants detection method for internet of things. IEEE Int Things J 8(23):16946–16962
Hao J, Luo S, Pan L (2022) Eii-mbs: Malware family classification via enhanced adversarial instruction behavior semantic learning. Comput Sec 122:102905
Ronen R, Radu M, Feuerstein C, et al (2018) Microsoft malware classification challenge. arXiv:1802.10135
Arp D, Spreitzenbarth M, Hubner M, et al (2014) Drebin: Effective and explainable detection of android malware in your pocket. In: Ndss, pp 23–26
Anderson B, Quist D, Neil J et al (2011) Graph-based malware detection using dynamic analysis. J Comput Virol 7:247–258
Narayanan BN, Djaneye-Boundjou O, Kebede TM (2016) Performance analysis of machine learning and pattern recognition algorithms for malware classification. In: 2016 IEEE national aerospace and electronics conference (NAECON) and ohio innovation summit (OIS), IEEE, pp 338–342
Kong Z, Xue J, Wang Y et al (2023) MalFSM: Feature Subset Selection Method for Malware Family Classification. Chinese J Electron 32(1):26–38
Lin WC, Yeh YR (2022) Efficient malware classification by binary sequences with one-dimensional convolutional neural networks. Mathematics 10(4):608
Gibert D, Mateu C, Planes J, et al (2018) Classification of malware by using structural entropy on convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence
Vasan D, Alazab M, Wassan S et al (2020) IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138
Geremias J, Viegas EK, Santin AO, et al (2022) Towards multi-view android malware detection through image-based deep learning. In: 2022 International wireless communications and mobile computing (IWCMC), IEEE, pp 572–577
Yuan B, Wang J, Wu P et al (2021) Iot malware classification based on lightweight convolutional neural networks. IEEE Int Things J 9(5):3770–3783
Xie N, Wang X, Wang W et al (2019) Fingerprinting android malware families. Front Comput Sci 13:637–646
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223
Mahdavifar S, Kadir AFA, Fatemi R et al (2020) Dynamic android malware category classification using semi-supervised deep learning. 2020 IEEE Intl Conf on Dependable. Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, pp 515–522
Mahdavifar S, Alhadidi D, Ghorbani AA (2022) Effective and efficient hybrid android malware classification using pseudo-label stacked auto-encoder. J Netw Syst Manag 30:1–34
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, et al (2016) Identity mappings in deep residual networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, Springer, pp 630–645
Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Howard AG, Zhu M, Chen B, et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Drew J, Moore T, Hahsler M (2016) Polymorphic malware detection using sequence classification methods. In: 2016 IEEE security and privacy workshops (SPW), IEEE, pp 81–87
Manavi F, Hamzeh A (2017) A new method for malware detection using opcode visualization. In: 2017 Artificial intelligence and signal processing conference (AISP), IEEE, pp 96–102
Rahul R, Anjali T, Menon VK, et al (2017) Deep learning for network flow analysis and malware classification. In: Security in Computing and Communications: 5th International Symposium, SSCC 2017, Manipal, India, September 13–16, 2017, Proceedings 5, Springer, pp 226–235
Kim JY, Cho SB (2022) Obfuscated malware detection using deep generative model based on global/local features. Comput Sec 112:102501
Gibert D, Mateu C, Planes J et al (2019) Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hacking Techniques 15:15–28
Kim JY, Bu SJ, Cho SB (2017) Malware detection using deep transferred generative adversarial networks. In: Neural Information Processing: 24th international conference, ICONIP 2017, Guangzhou, China, November 14-18, 2017, Proceedings, Part I 24, Springer, pp 556–564
Kim JY, Bu SJ, Cho SB (2018) Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf Sci 460:83–102
Ren Z, Chen G, Lu W (2020) Malware visualization methods based on deep convolution neural networks. Multimed Tools Appl 79:10975–10993
Padmavathi G, Shanmugapriya D, Roshni A (2022) Performance analysis of unsupervised machine learning methods for mobile malware detection. In: 2022 9th International conference on computing for sustainable global development (INDIACom), IEEE, pp 201–206
Jo J, Cho J, Moon J (2023) A malware detection and extraction method for the related information using the vit attention mechanism on android operating system. Appl Sci 13(11):6839
Kural T, Sönmez Y, Dener M (2021) Android malware analysis and benchmarking with deep learning. Düzce Üniversitesi Bilim ve Teknoloji Dergisi 9(6):289–302
Niu W, Wang Y, Liu X, et al (2023) GCDroid: Android malware detection based on graph compression with reachability relationship extraction for iot devices. IEEE Int Things J
Al-Fawa’reh M, Saif A, Jafar MT, et al (2020) Malware detection by eating a whole apk. In: 2020 15th International conference for internet technology and secured transactions (ICITST), IEEE, pp 1–7
Pei X, Deng X, Tian S, et al (2022) A knowledge transfer-based semi-supervised federated learning for iot malware detection. IEEE Trans Dependable Secure Comput
Martín A, Rodríguez-Fernández V, Camacho D (2018) CANDYMAN: Classifying android malware families by modelling dynamic traces with markov chains. Eng Appl Artif Intell 74:121–133
Massarelli L, Aniello L, Ciccotelli C, et al (2017) Android malware family classification based on resource consumption over time. In: 2017 12th International conference on malicious and unwanted software (MALWARE), IEEE, pp 31–38
Singh J, Thakur D, Ali F et al (2020) Deep feature extraction and classification of android malware images. Sensors 20(24):7013
Gao H, Cheng S, Zhang W (2021) GDroid: Android malware detection and classification with graph convolutional network. Comput Sec 106:102264
Elish KO, Elish MO, Almohri HM (2022) Lightweight, effective detection and characterization of mobile malware families. IEEE Trans Comput 71(11):2982–2995
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Wang F, Shi X, Yang F et al (2024) Malsort: Lightweight and efficient image-based malware classification using masked self-supervised framework with swin transformer. J Inf Sec Appl 83:103784
Zou B, Cao C, Wang L et al (2024) Facile: A capsule network with fewer capsules and richer hierarchical information for malware image classification. Comput Sec 137:103606
Bao H, Li W, Chen H, et al (2024) Stories behind decisions: Towards interpretable malware family classification with hierarchical attention. Comput Sec pp 103943
Xie Y, Luo X, Sun J (2024) Towards enhancing sequence-optimized malware representation with context-separated bi-directional long short-term memory and proximal policy optimization. IEEE Trans Dependable Secure Comput
Alam MM, Raff E, Biderman SR, et al (2024) Holographic global convolutional networks for long-range prediction tasks in malware detection. In: International conference on artificial intelligence and statistics, PMLR, pp 4042–4050
Zhou F, Wang D, Xiong Y, et al (2024) Famcf: A few-shot android malware family classification framework. Comput Sec pp 104027
Kiraz Ö, Doğru İA (2024) Visualising static features and classifying android malware using a convolutional neural network approach. Appl Sci 14(11):4772
Li S, Tang Z, Li H et al (2024) Gmadv: An android malware variant generation and classification adversarial training framework. J Inf Sec Appl 84:103800
Ansori DB, Slamet J, Ghufron MZ et al (2024) Android malware classification using gain ratio and ensembled machine learning. Int J Safety Sec Eng 14(1):259–266
Zhang Y, Liao Z, Zhang N, et al (2024) Deep hashing for malware family classification and new malware identification. IEEE Int Things J
Funding
This work was supported in part by the Key R&D Program of Shandong Province, China under Grant 2021CXGC010107; in part by the National Key Research and Development Program of China under Grant 2020YFB1805402; in part by the National Natural Science Foundation of China under Grant 61972051 and Grant 62032002.
Author information
Authors and Affiliations
Contributions
Conceptualization, H.P. and W.W.; methodology, W.W. ,C.X. and H.P.; software, C.X. and W.W.; validation, W.W., H.P., C.X. and L.L.; formal analysis, W.W. and C.X.; writing W.W., C.X. and Y.L.
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they do not have any conflict of interest.
Ethical and Informed Consent
Ethics approval was not required for this research.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, W., Peng, H., Xu, C. et al. A malware visualization method based on transition probability matrix suitable for imbalanced family classification. Appl Intell 55, 236 (2025). https://doi.org/10.1007/s10489-024-05911-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05911-2