skip to main content
10.1145/3454127.3457631acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnissConference Proceedingsconference-collections
research-article

TRANSFER LEARNING AND SMOTE ALGORITHM FOR IMAGE-BASED MALWARE CLASSIFICATION

Published:26 November 2021Publication History

ABSTRACT

In recent years, the volume and type of malware is growing, which increases the need of improving a detection and classification malware systems. Nowadays, deep convolutional neural networks (CNNs) have recently proven to be very successful for malware classification due to their performance on images classification. However, their effectiveness is degraded with the unbalanced malware families. In this paper, we propose a malware classification framework using CNN-based deep learning architecture, including a SMOTE technique "Synthetic Minority Oversampling Technique" to balance the dataset (malwares families).

Our proposed method consists to converting the binary files into gray scale images and balancing them by the SMOTE method, and then we use them to train the CNN architecture to detect and identify malware families. We use the Transfer Learning technique based on an existing Deep Learning model VGG16 that has previously trained with the ImageNet dataset (≥ 10 million).

For evaluations, an extensive experiment was conducted using Microsoft Malware dataset. The Results show that our approach is efficient with an average accuracy of 98%.

References

  1. Niket Bhodia, Pratikkumar Prajapati, Fabio Di Troia, and Mark Stamp. 2019. Transfer Learning for Image-based Malware Classification: In Proceedings of the 5th International Conference on Information Systems Security and Privacy, SCITEPRESS - Science and Technology Publications, Prague, Czech Republic, 719–726. DOI:https://doi.org/10.5220/0007701407190726Google ScholarGoogle Scholar
  2. Tewfik Bounouh, Zakaria Brahimi, Ameer Al-Nemrat, and Chafika Benzaid. 2016. A Scalable Malware Classification Based on Integrated Static and Dynamic Features. In Global Security, Safety and Sustainability - The Security Challenges of the Connected World, Hamid Jahankhani, Alex Carlile, David Emm, Amin Hosseinian-Far, Guy Brown, Graham Sexton and Arshad Jamal (eds.). Springer International Publishing, Cham, 113–124. DOI:https://doi.org/10.1007/978-3-319-51064-4_10Google ScholarGoogle Scholar
  3. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. jair 16, (June 2002), 321–357. DOI:https://doi.org/10.1613/jair.953Google ScholarGoogle Scholar
  4. Ekta Gandotra, Divya Bansal, and Sanjeev Sofat. 2014. Malware Analysis and Classification: A Survey. JIS 05, 02 (2014), 56–64. DOI:https://doi.org/10.4236/jis.2014.52006Google ScholarGoogle ScholarCross RefCross Ref
  5. Daniel Gibert, Carles Mateu, and Jordi Planes. 2020. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. Journal of Network and Computer Applications 153, (March 2020), 102526. DOI:https://doi.org/10.1016/j.jnca.2019.102526Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel Gibert, Carles Mateu, Jordi Planes, and Ramon Vicens. 2019. Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hack Tech 15, 1 (March 2019), 15–28. DOI:https://doi.org/10.1007/s11416-018-0323-0Google ScholarGoogle ScholarCross RefCross Ref
  7. Deguang Kong and Guanhua Yan. Discriminant malware distance learning on structural information for automated malware classification. 9.Google ScholarGoogle Scholar
  8. Andreas Moser, Christopher Kruegel, and Engin Kirda. Limits of Static Analysis for Malware Detection. 10.Google ScholarGoogle Scholar
  9. L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath. 2011. Malware images: visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security - VizSec ’11, ACM Press, Pittsburgh, Pennsylvania, 1–7. DOI:https://doi.org/10.1145/2016904.2016908Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Prima and M. Bouhorma. 2020. USING TRANSFER LEARNING FOR MALWARE CLASSIFICATION. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLIV-4/W3-2020, (November 2020), 343–349. DOI:https://doi.org/10.5194/isprs-archives-XLIV-4-W3-2020-343-2020Google ScholarGoogle Scholar
  11. Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft Malware Classification Challenge. arXiv:1802.10135 [cs] (February 2018). Retrieved February 20, 2021 from http://arxiv.org/abs/1802.10135Google ScholarGoogle Scholar
  12. Asaf Shabtai, Robert Moskovitch, Clint Feher, Shlomi Dolev, and Yuval Elovici. 2012. Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur Inform 1, 1 (December 2012), 1. DOI:https://doi.org/10.1186/2190-8532-1-1Google ScholarGoogle ScholarCross RefCross Ref
  13. Sajedul Talukder. 2020. Tools and Techniques for Malware Detection and Analysis. arXiv:2002.06819 [cs] (June 2020). Retrieved February 20, 2021 from http://arxiv.org/abs/2002.06819Google ScholarGoogle Scholar
  14. Danish Vasan, Mamoun Alazab, Sobia Wassan, Babak Safaei, and Qin Zheng. 2020. Image-Based malware classification using ensemble of CNN architectures (IMCEC). Computers & Security 92, (May 2020), 101748. DOI:https://doi.org/10.1016/j.cose.2020.101748Google ScholarGoogle ScholarCross RefCross Ref
  15. Sravani Yajamanam, Vikash Raja Samuel Selvin, Fabio Di Troia, and Mark Stamp. 2018. Deep Learning versus Gist Descriptors for Image-based Malware Classification: In Proceedings of the 4th International Conference on Information Systems Security and Privacy, SCITEPRESS - Science and Technology Publications, Funchal, Madeira, Portugal, 553–561. DOI:https://doi.org/10.5220/0006685805530561Google ScholarGoogle Scholar
  16. Songqing Yue. 2017. Imbalanced Malware Images Classification: a CNN based Approach. arXiv:1708.08042 [cs, stat] (August 2017). Retrieved February 20, 2021 from http://arxiv.org/abs/1708.08042Google ScholarGoogle Scholar
  17. 2020. Global Threat Landscape Report. (2020), 16. Retrieved from https://www.fortinet.com/content/dam/fortinet/assets/threat-reports/threat-report-h1-2020.pdfGoogle ScholarGoogle Scholar
  18. McAfee Labs — Rapport sur le paysage des menaces liées au COVID-19, juillet 2020. 40.Google ScholarGoogle Scholar
  19. Keras. Retrieved February 20, 2021 from https://keras.ioGoogle ScholarGoogle Scholar
  20. scikit-learn. Retrieved February 20, 2021 from https://scikit-learn.org/stable/user_guide.htmlGoogle ScholarGoogle Scholar
  21. tensorflow. Retrieved February 20, 2021 from https://www.tensorflow.org/Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security
    April 2021
    410 pages
    ISBN:9781450388719
    DOI:10.1145/3454127

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 26 November 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format