A malware visualization method based on transition probability matrix suitable for imbalanced family classification

Wu, Wei; Peng, Haipeng; Xu, Chuxiao; Liu, Yuhong; Li, Lixiang

doi:10.1007/s10489-024-05911-2

A malware visualization method based on transition probability matrix suitable for imbalanced family classification

Published: 28 December 2024

Volume 55, article number 236, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

111 Accesses
Explore all metrics

Abstract

Information technology brings us not only marvelous convenience and productivity, but also potential insecure factor, which may pose threats to our properties, data or even reputation. Malicious software is exactly an accomplice of such attacks. Fundamentally, the key step to deal with malicious software is to accurately identify and classify it. Although traditional static and dynamic analysis approaches could accomplish this task to some extent, they have intrinsic defects in terms of variant feature exaction, vulnerability to code obfuscation and encryption, or excessive resource consumption. Recently, CNN-based malware classification methods, which employ CNN models to classify visualized malware images, provide a promising way to accomplish malware classification tasks. However, most mainstream CNN models require inputs with a fixed size, while various sizes of original malware samples frequently lead to various sizes of malware visualization images. Simply resizing these images causes losses of malware features, resulting in drops of classification accuracy. In this paper, we propose a malware visualization method based on transition probabilities of malware operation codes to generate proper images with a uniform size as inputs for CNN models. As a result, the conventional resizing operations could be avoided. The proposed method is compatible with most mainstream CNN models. Moreover, the proposed method could address problems concerning insufficient or imbalanced datasets, which may challenge the classification abilities of CNN models. Experimental results demonstrate the excellent compatibility and classification performance of the proposed method in terms of accuracy, precision, recall and F1-score. For reproducible research, the source codes and training models of the proposed method are available at https://github.com/xchuxiao23/mal_cls.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attacks on Visualization-Based Malware Detection: Balancing Effectiveness and Executability

Study of a Hybrid Approach Towards Malware Detection in Executable Files

Article 15 May 2021

Classification of Malware Using Visualization Techniques

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability and Access

The datasets analysed during the current study are available in the BIG2015, https://www.kaggle.com/c/malware-classification/data; CICMaldroid2020, https://www.unb.ca/cic/datasets/maldroid-2020.html; Drebin, https://drebin.mlsec.org/.

References

Kaspersky (2022) Kaspersky Security Bulletin 2022. Statistics. https://securelist.com/ksb-2022-statistics/108129/
Kaspersky (2023) IT threat evolution Q1 2023. Mobile statistics. https://securelist.com/it-threat-evolution-q1-2023-mobile-statistics/109893/
Cesare S, Xiang Y, Zhou W (2013) Control flow-based malware variant detection. IEEE Trans Dependable Secure Comput 11(4):307–317
Article MATH Google Scholar
Fang W, He J, Li W, et al (2023) Comprehensive android malware detection based on federated learning architecture. IEEE Trans Inf Forensics Sec
Chen X, Hao Z, Li L et al (2022) Cruparamer: Learning on parameter-augmented api sequences for malware detection. IEEE Trans Inf Forensics Sec 17:788–803
Article MATH Google Scholar
Fan M, Liu J, Luo X et al (2018) Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Trans Inf Forensics Sec 13(8):1890–1905
Article MATH Google Scholar
Wang S, Yan Q, Chen Z et al (2017) Detecting android malware leveraging text semantics of network flows. IEEE Trans Inf Forensics Sec 13(5):1096–1109
Article MATH Google Scholar
Cai H, Meng N, Ryder B et al (2018) Droidcat: Effective android malware detection and categorization via app-level profiling. IEEE Trans Inf Forensics Sec 14(6):1455–1470
Article MATH Google Scholar
Shan Z, Wang X (2013) Growing grapes in your computer to defend against malware. IEEE Trans Inf Forensics Sec 9(2):196–207
Article MATH Google Scholar
Nataraj L, Karthikeyan S, Jacob G, et al (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security, pp 1–7
Zou B, Cao C, Tao F et al (2022) Imclnet: A lightweight deep neural network for image-based malware classification. J Inf Sec Appl 70:103313
MATH Google Scholar
Kalash M, Rochan M, Mohammed N, et al (2018) Malware classification with deep convolutional neural networks. In: 2018 9th IFIP international conference on new technologies, mobility and security (NTMS), IEEE, pp 1–5
Wu W, Peng H, Zhu H et al (2024) Csmc: A secure and efficient visualized malware classification method inspired by compressed sensing. Sensors 24(13):4253
Article MATH Google Scholar
Wu W, Peng H, Zhu H, et al (2024) Mvc-rsn: A malware classification method with variant identification ability. IEEE Int Things J
Li Q, Mi J, Li W et al (2021) Cnn-based malware variants detection method for internet of things. IEEE Int Things J 8(23):16946–16962
Article MATH Google Scholar
Hao J, Luo S, Pan L (2022) Eii-mbs: Malware family classification via enhanced adversarial instruction behavior semantic learning. Comput Sec 122:102905
Article Google Scholar
Ronen R, Radu M, Feuerstein C, et al (2018) Microsoft malware classification challenge. arXiv:1802.10135
Arp D, Spreitzenbarth M, Hubner M, et al (2014) Drebin: Effective and explainable detection of android malware in your pocket. In: Ndss, pp 23–26
Anderson B, Quist D, Neil J et al (2011) Graph-based malware detection using dynamic analysis. J Comput Virol 7:247–258
Article MATH Google Scholar
Narayanan BN, Djaneye-Boundjou O, Kebede TM (2016) Performance analysis of machine learning and pattern recognition algorithms for malware classification. In: 2016 IEEE national aerospace and electronics conference (NAECON) and ohio innovation summit (OIS), IEEE, pp 338–342
Kong Z, Xue J, Wang Y et al (2023) MalFSM: Feature Subset Selection Method for Malware Family Classification. Chinese J Electron 32(1):26–38
Article MATH Google Scholar
Lin WC, Yeh YR (2022) Efficient malware classification by binary sequences with one-dimensional convolutional neural networks. Mathematics 10(4):608
Article MATH Google Scholar
Gibert D, Mateu C, Planes J, et al (2018) Classification of malware by using structural entropy on convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence
Vasan D, Alazab M, Wassan S et al (2020) IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138
Article Google Scholar
Geremias J, Viegas EK, Santin AO, et al (2022) Towards multi-view android malware detection through image-based deep learning. In: 2022 International wireless communications and mobile computing (IWCMC), IEEE, pp 572–577
Yuan B, Wang J, Wu P et al (2021) Iot malware classification based on lightweight convolutional neural networks. IEEE Int Things J 9(5):3770–3783
Article MATH Google Scholar
Xie N, Wang X, Wang W et al (2019) Fingerprinting android malware families. Front Comput Sci 13:637–646
Article MATH Google Scholar
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223
Mahdavifar S, Kadir AFA, Fatemi R et al (2020) Dynamic android malware category classification using semi-supervised deep learning. 2020 IEEE Intl Conf on Dependable. Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, pp 515–522
MATH Google Scholar
Mahdavifar S, Alhadidi D, Ghorbani AA (2022) Effective and efficient hybrid android malware classification using pseudo-label stacked auto-encoder. J Netw Syst Manag 30:1–34
Article MATH Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, et al (2016) Identity mappings in deep residual networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, Springer, pp 630–645
Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Howard AG, Zhu M, Chen B, et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Drew J, Moore T, Hahsler M (2016) Polymorphic malware detection using sequence classification methods. In: 2016 IEEE security and privacy workshops (SPW), IEEE, pp 81–87
Manavi F, Hamzeh A (2017) A new method for malware detection using opcode visualization. In: 2017 Artificial intelligence and signal processing conference (AISP), IEEE, pp 96–102
Rahul R, Anjali T, Menon VK, et al (2017) Deep learning for network flow analysis and malware classification. In: Security in Computing and Communications: 5th International Symposium, SSCC 2017, Manipal, India, September 13–16, 2017, Proceedings 5, Springer, pp 226–235
Kim JY, Cho SB (2022) Obfuscated malware detection using deep generative model based on global/local features. Comput Sec 112:102501
Article MATH Google Scholar
Gibert D, Mateu C, Planes J et al (2019) Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hacking Techniques 15:15–28
Article MATH Google Scholar
Kim JY, Bu SJ, Cho SB (2017) Malware detection using deep transferred generative adversarial networks. In: Neural Information Processing: 24th international conference, ICONIP 2017, Guangzhou, China, November 14-18, 2017, Proceedings, Part I 24, Springer, pp 556–564
Kim JY, Bu SJ, Cho SB (2018) Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf Sci 460:83–102
Article MATH Google Scholar
Ren Z, Chen G, Lu W (2020) Malware visualization methods based on deep convolution neural networks. Multimed Tools Appl 79:10975–10993
Article MATH Google Scholar
Padmavathi G, Shanmugapriya D, Roshni A (2022) Performance analysis of unsupervised machine learning methods for mobile malware detection. In: 2022 9th International conference on computing for sustainable global development (INDIACom), IEEE, pp 201–206
Jo J, Cho J, Moon J (2023) A malware detection and extraction method for the related information using the vit attention mechanism on android operating system. Appl Sci 13(11):6839
Article MATH Google Scholar
Kural T, Sönmez Y, Dener M (2021) Android malware analysis and benchmarking with deep learning. Düzce Üniversitesi Bilim ve Teknoloji Dergisi 9(6):289–302
Article MATH Google Scholar
Niu W, Wang Y, Liu X, et al (2023) GCDroid: Android malware detection based on graph compression with reachability relationship extraction for iot devices. IEEE Int Things J
Al-Fawa’reh M, Saif A, Jafar MT, et al (2020) Malware detection by eating a whole apk. In: 2020 15th International conference for internet technology and secured transactions (ICITST), IEEE, pp 1–7
Pei X, Deng X, Tian S, et al (2022) A knowledge transfer-based semi-supervised federated learning for iot malware detection. IEEE Trans Dependable Secure Comput
Martín A, Rodríguez-Fernández V, Camacho D (2018) CANDYMAN: Classifying android malware families by modelling dynamic traces with markov chains. Eng Appl Artif Intell 74:121–133
Article Google Scholar
Massarelli L, Aniello L, Ciccotelli C, et al (2017) Android malware family classification based on resource consumption over time. In: 2017 12th International conference on malicious and unwanted software (MALWARE), IEEE, pp 31–38
Singh J, Thakur D, Ali F et al (2020) Deep feature extraction and classification of android malware images. Sensors 20(24):7013
Article MATH Google Scholar
Gao H, Cheng S, Zhang W (2021) GDroid: Android malware detection and classification with graph convolutional network. Comput Sec 106:102264
Article MATH Google Scholar
Elish KO, Elish MO, Almohri HM (2022) Lightweight, effective detection and characterization of mobile malware families. IEEE Trans Comput 71(11):2982–2995
Article MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Wang F, Shi X, Yang F et al (2024) Malsort: Lightweight and efficient image-based malware classification using masked self-supervised framework with swin transformer. J Inf Sec Appl 83:103784
Google Scholar
Zou B, Cao C, Wang L et al (2024) Facile: A capsule network with fewer capsules and richer hierarchical information for malware image classification. Comput Sec 137:103606
Article Google Scholar
Bao H, Li W, Chen H, et al (2024) Stories behind decisions: Towards interpretable malware family classification with hierarchical attention. Comput Sec pp 103943
Xie Y, Luo X, Sun J (2024) Towards enhancing sequence-optimized malware representation with context-separated bi-directional long short-term memory and proximal policy optimization. IEEE Trans Dependable Secure Comput
Alam MM, Raff E, Biderman SR, et al (2024) Holographic global convolutional networks for long-range prediction tasks in malware detection. In: International conference on artificial intelligence and statistics, PMLR, pp 4042–4050
Zhou F, Wang D, Xiong Y, et al (2024) Famcf: A few-shot android malware family classification framework. Comput Sec pp 104027
Kiraz Ö, Doğru İA (2024) Visualising static features and classifying android malware using a convolutional neural network approach. Appl Sci 14(11):4772
Article MATH Google Scholar
Li S, Tang Z, Li H et al (2024) Gmadv: An android malware variant generation and classification adversarial training framework. J Inf Sec Appl 84:103800
Google Scholar
Ansori DB, Slamet J, Ghufron MZ et al (2024) Android malware classification using gain ratio and ensembled machine learning. Int J Safety Sec Eng 14(1):259–266
Article MATH Google Scholar
Zhang Y, Liao Z, Zhang N, et al (2024) Deep hashing for malware family classification and new malware identification. IEEE Int Things J

Download references

Funding

This work was supported in part by the Key R&D Program of Shandong Province, China under Grant 2021CXGC010107; in part by the National Key Research and Development Program of China under Grant 2020YFB1805402; in part by the National Natural Science Foundation of China under Grant 61972051 and Grant 62032002.

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
Wei Wu
Information Security Center, State Key Laboratory of Networking and Switching Technology and National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Haipeng Peng, Chuxiao Xu, Yuhong Liu & Lixiang Li

Authors

Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haipeng Peng
View author publications
You can also search for this author in PubMed Google Scholar
Chuxiao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lixiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, H.P. and W.W.; methodology, W.W. ,C.X. and H.P.; software, C.X. and W.W.; validation, W.W., H.P., C.X. and L.L.; formal analysis, W.W. and C.X.; writing W.W., C.X. and Y.L.

Corresponding author

Correspondence to Haipeng Peng.

Ethics declarations

Conflict of Interests

The authors declare that they do not have any conflict of interest.

Ethical and Informed Consent

Ethics approval was not required for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, W., Peng, H., Xu, C. et al. A malware visualization method based on transition probability matrix suitable for imbalanced family classification. Appl Intell 55, 236 (2025). https://doi.org/10.1007/s10489-024-05911-2

Download citation

Accepted: 22 October 2024
Published: 28 December 2024
DOI: https://doi.org/10.1007/s10489-024-05911-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A malware visualization method based on transition probability matrix suitable for imbalanced family classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attacks on Visualization-Based Malware Detection: Balancing Effectiveness and Executability

Study of a Hybrid Approach Towards Malware Detection in Executable Files

Classification of Malware Using Visualization Techniques

Data Availability and Access

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Ethical and Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A malware visualization method based on transition probability matrix suitable for imbalanced family classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attacks on Visualization-Based Malware Detection: Balancing Effectiveness and Executability

Study of a Hybrid Approach Towards Malware Detection in Executable Files

Classification of Malware Using Visualization Techniques

Explore related subjects

Data Availability and Access

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Ethical and Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation