Skip to main content

An Evaluation of Image-Based Malware Classification Using Machine Learning

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1287))

Included in the following conference series:

Abstract

This paper investigates the image-based malware classification using machine learning techniques. It is a recent approach for malware classification in which malware binaries are converted into images (i.e. malware images) prior to feeding machine learning models, i.e. k-nearest neighbour (k-NN), Naïve Bayes (NB), Support Vector Machine (SVM) or Convolution Neural Networks (CNN). This approach relies on image texture to classify a malware instead of signatures or behaviours of malware collected via malware analysis, thus it does not encounter a problem if the signatures of a new malware variant has not been collected or the behaviours of a new malware variant has not been updated.

This paper evaluates classification performance of various machine learning classifiers (i.e. k-NN, NB, SVM, CNN) fed by malware images in various dimensions (i.e., 128 × 128, 64 × 64, 32 × 32, 16 × 16). The experiment results achieved on three different datasets including Malimg, Malheur and BIG2015 show that k-NN outperforms others on three datasets with high accuracy (i.e. 97.9%, 94.41% and 95.63% respectively). On the contrary, NB showed its weakness on image-based malware classification. Experiment results also indicate that the accuracy of the k-NN reaches the highest value at the input image size of 32 × 32 and tends to reduce if too many feature information provided by large input images, i.e. 64 × 64, 128 × 128.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A Survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 50(3), 1–40 (2017). Article No. 41

    Article  Google Scholar 

  2. Kaspersky Security Bulletin 2019, Kaspersky (2019). https://securelist.com/kaspersky-security-bulletin-threat-predictions-for-2019/88878/

  3. Cybersecurity Ventures (2018). https://cybersecurityventures.com/-cybercrime-damages-6-trillion-by-2021/

  4. Souri, A., Hosseini, R.: A state-of-the-art survey of malware detection approaches using data mining techniques. Hum.-Centric Comput. Inf. Sci. 8(1), 1–22 (2018). https://doi.org/10.1186/s13673-018-0125-x

    Article  Google Scholar 

  5. Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, Pennsylvania, USA (2011)

    Google Scholar 

  6. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)

    Article  Google Scholar 

  7. Han, K.S., Lim, J.H., Kang, B., Im, E.G.: Malware analysis using visualized images and entropy graphs. Int. J. Inf. Secur. 14(1), 1–14 (2014). https://doi.org/10.1007/s10207-014-0242-0

    Article  Google Scholar 

  8. Douze, M. et al.: Evaluation of GIST descriptors for web-scale image search. In: Proceedings of the ACM International Conference on Image and Video Retrieval, Article No. 19, Greece (2009)

    Google Scholar 

  9. Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the 6th ACM Conference on Data and Application Security and Privacy, Louisiana, USA (2016)

    Google Scholar 

  10. Bhodia, N., Prajapati, P., Troia, F.D., Stamp, M.: Transfer learning for image-based malware classification. In: Proceedings of the 5th International Conference on Information Systems Security and Privacy, pp. 719–726 (2015)

    Google Scholar 

  11. Alex, T.: Malware-detection-using-Machine-Learning. https://github.com/tuff96/Malware-detection-using-Machine-Learning

  12. Le, Q., Boydell, O., Mac Namee, B., Scanlon, M.: Deep learning at the shallow end: Malware classification for non-domain experts. Digit. Invest. 26(1), 5118–5126 (2018)

    Google Scholar 

  13. Cui, Z., et al.: Detection of malicious code variants based on deep learning. IEEE Trans. Ind. Inform. 14(7), 3187–3196 (2018)

    Article  Google Scholar 

  14. Tareen, S.A.K., Saleem, Z.: A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. In: International Conference on Computing, Mathematics and Engineering Technologies (iCoMET 2018), Sukkur, Pakistan (2018)

    Google Scholar 

  15. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. (JCS) 19(4), 639–668 (2011)

    Article  Google Scholar 

  16. Torralba, A.: How many pixels make an image? Vis. Neurosci. 26(1), 123–131 (2009)

    Article  MathSciNet  Google Scholar 

  17. Orava, J.: k-nearst neighbour kernel density estimation, the choice of optimal k. Tatra Mountains Math. Publ. 50(1), 39–50 (2011)

    Article  MathSciNet  Google Scholar 

  18. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maryland, USA, pp. 655–665 (2014)

    Google Scholar 

  19. Albelwi, S., Mahmood, A.: A framework for designing the architectures of deep convolutional neural networks. Entropy 19(6), 242 (2017)

    Article  Google Scholar 

  20. Google Brain Team: TensorFlow. https://www.tensorflow.org/. Accessed 18 Nov 2019

  21. Pedregosa, F., et al.: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  22. Keras: Keras Documentation (2015). https://keras.io/

  23. Abadi, M. et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conf. on Operating Systems Design and Implementation, Savannah, GA, USA, pp. 265–283 (2016)

    Google Scholar 

  24. Van den Bossche, J., et al.: Scikit-learn. https://scikit-learn.org/stable/. Accessed 18 Nov 2019

  25. Powers, D.M.W.: Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  26. Stamp, M.: Data analysis. In: Introduction to Machine Learning with Applications in Information Security. CRC Press, Taylor & Francis Group (2018). ISBN-13: 978-1-138-62678-2

    Google Scholar 

  27. Yajamanam, S., Selvin, V., Troia, F.D., Stamp, M.: Deep learning versus gist descriptors for image-based malware classification. In: Proceedings of the 4th International Conference on Info. Systems Security and Privacy (ICISSP 2018), pp. 553–561 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tran The Son .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Son, T.T., Lee, C., Le-Minh, H., Aslam, N., Raza, M., Long, N.Q. (2020). An Evaluation of Image-Based Malware Classification Using Machine Learning. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds) Advances in Computational Collective Intelligence. ICCCI 2020. Communications in Computer and Information Science, vol 1287. Springer, Cham. https://doi.org/10.1007/978-3-030-63119-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63119-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63118-5

  • Online ISBN: 978-3-030-63119-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics