ABSTRACT
In recent years, the volume and type of malware is growing, which increases the need of improving a detection and classification malware systems. Nowadays, deep convolutional neural networks (CNNs) have recently proven to be very successful for malware classification due to their performance on images classification. However, their effectiveness is degraded with the unbalanced malware families. In this paper, we propose a malware classification framework using CNN-based deep learning architecture, including a SMOTE technique "Synthetic Minority Oversampling Technique" to balance the dataset (malwares families).
Our proposed method consists to converting the binary files into gray scale images and balancing them by the SMOTE method, and then we use them to train the CNN architecture to detect and identify malware families. We use the Transfer Learning technique based on an existing Deep Learning model VGG16 that has previously trained with the ImageNet dataset (≥ 10 million).
For evaluations, an extensive experiment was conducted using Microsoft Malware dataset. The Results show that our approach is efficient with an average accuracy of 98%.
- Niket Bhodia, Pratikkumar Prajapati, Fabio Di Troia, and Mark Stamp. 2019. Transfer Learning for Image-based Malware Classification: In Proceedings of the 5th International Conference on Information Systems Security and Privacy, SCITEPRESS - Science and Technology Publications, Prague, Czech Republic, 719–726. DOI:https://doi.org/10.5220/0007701407190726Google Scholar
- Tewfik Bounouh, Zakaria Brahimi, Ameer Al-Nemrat, and Chafika Benzaid. 2016. A Scalable Malware Classification Based on Integrated Static and Dynamic Features. In Global Security, Safety and Sustainability - The Security Challenges of the Connected World, Hamid Jahankhani, Alex Carlile, David Emm, Amin Hosseinian-Far, Guy Brown, Graham Sexton and Arshad Jamal (eds.). Springer International Publishing, Cham, 113–124. DOI:https://doi.org/10.1007/978-3-319-51064-4_10Google Scholar
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. jair 16, (June 2002), 321–357. DOI:https://doi.org/10.1613/jair.953Google Scholar
- Ekta Gandotra, Divya Bansal, and Sanjeev Sofat. 2014. Malware Analysis and Classification: A Survey. JIS 05, 02 (2014), 56–64. DOI:https://doi.org/10.4236/jis.2014.52006Google ScholarCross Ref
- Daniel Gibert, Carles Mateu, and Jordi Planes. 2020. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. Journal of Network and Computer Applications 153, (March 2020), 102526. DOI:https://doi.org/10.1016/j.jnca.2019.102526Google ScholarDigital Library
- Daniel Gibert, Carles Mateu, Jordi Planes, and Ramon Vicens. 2019. Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hack Tech 15, 1 (March 2019), 15–28. DOI:https://doi.org/10.1007/s11416-018-0323-0Google ScholarCross Ref
- Deguang Kong and Guanhua Yan. Discriminant malware distance learning on structural information for automated malware classification. 9.Google Scholar
- Andreas Moser, Christopher Kruegel, and Engin Kirda. Limits of Static Analysis for Malware Detection. 10.Google Scholar
- L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath. 2011. Malware images: visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security - VizSec ’11, ACM Press, Pittsburgh, Pennsylvania, 1–7. DOI:https://doi.org/10.1145/2016904.2016908Google ScholarDigital Library
- B. Prima and M. Bouhorma. 2020. USING TRANSFER LEARNING FOR MALWARE CLASSIFICATION. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLIV-4/W3-2020, (November 2020), 343–349. DOI:https://doi.org/10.5194/isprs-archives-XLIV-4-W3-2020-343-2020Google Scholar
- Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft Malware Classification Challenge. arXiv:1802.10135 [cs] (February 2018). Retrieved February 20, 2021 from http://arxiv.org/abs/1802.10135Google Scholar
- Asaf Shabtai, Robert Moskovitch, Clint Feher, Shlomi Dolev, and Yuval Elovici. 2012. Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur Inform 1, 1 (December 2012), 1. DOI:https://doi.org/10.1186/2190-8532-1-1Google ScholarCross Ref
- Sajedul Talukder. 2020. Tools and Techniques for Malware Detection and Analysis. arXiv:2002.06819 [cs] (June 2020). Retrieved February 20, 2021 from http://arxiv.org/abs/2002.06819Google Scholar
- Danish Vasan, Mamoun Alazab, Sobia Wassan, Babak Safaei, and Qin Zheng. 2020. Image-Based malware classification using ensemble of CNN architectures (IMCEC). Computers & Security 92, (May 2020), 101748. DOI:https://doi.org/10.1016/j.cose.2020.101748Google ScholarCross Ref
- Sravani Yajamanam, Vikash Raja Samuel Selvin, Fabio Di Troia, and Mark Stamp. 2018. Deep Learning versus Gist Descriptors for Image-based Malware Classification: In Proceedings of the 4th International Conference on Information Systems Security and Privacy, SCITEPRESS - Science and Technology Publications, Funchal, Madeira, Portugal, 553–561. DOI:https://doi.org/10.5220/0006685805530561Google Scholar
- Songqing Yue. 2017. Imbalanced Malware Images Classification: a CNN based Approach. arXiv:1708.08042 [cs, stat] (August 2017). Retrieved February 20, 2021 from http://arxiv.org/abs/1708.08042Google Scholar
- 2020. Global Threat Landscape Report. (2020), 16. Retrieved from https://www.fortinet.com/content/dam/fortinet/assets/threat-reports/threat-report-h1-2020.pdfGoogle Scholar
- McAfee Labs — Rapport sur le paysage des menaces liées au COVID-19, juillet 2020. 40.Google Scholar
- Keras. Retrieved February 20, 2021 from https://keras.ioGoogle Scholar
- scikit-learn. Retrieved February 20, 2021 from https://scikit-learn.org/stable/user_guide.htmlGoogle Scholar
- tensorflow. Retrieved February 20, 2021 from https://www.tensorflow.org/Google Scholar
Recommendations
Image-based Malware Classification using Deep Convolutional Neural Network and Transfer Learning
AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and SystemMalware classification is a major challenge as they have multiple families and its type has been ever increasing. With the involvement of deep learning and the availability of massive data, neural networks can easily address this problem. This ...
An Efficient Convolutional Neural Network with Transfer Learning for Malware Classification
Rising prevalence of malicious software (malware) attacks represent a serious threat to online safety in the modern era. Malware is a threat to anyone who uses the Internet since it steals data and causes damage to computer systems. In addition, the ...
Learning and Classification of Malware Behavior
DIMVA '08: Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability AssessmentMalicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the effectiveness of classical signature-based ...
Comments