An Evaluation of Image-Based Malware Classification Using Machine Learning

Son, Tran The; Lee, Chando; Le-Minh, Hoa; Aslam, Nauman; Raza, Moshin; Long, Nguyen Quoc

doi:10.1007/978-3-030-63119-2_11

Tran The Son⁸,
Chando Lee⁸,
Hoa Le-Minh⁹,
Nauman Aslam⁹,
Moshin Raza⁹ &
…
Nguyen Quoc Long¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1287))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1220 Accesses
4 Citations

Abstract

This paper investigates the image-based malware classification using machine learning techniques. It is a recent approach for malware classification in which malware binaries are converted into images (i.e. malware images) prior to feeding machine learning models, i.e. k-nearest neighbour (k-NN), Naïve Bayes (NB), Support Vector Machine (SVM) or Convolution Neural Networks (CNN). This approach relies on image texture to classify a malware instead of signatures or behaviours of malware collected via malware analysis, thus it does not encounter a problem if the signatures of a new malware variant has not been collected or the behaviours of a new malware variant has not been updated.

This paper evaluates classification performance of various machine learning classifiers (i.e. k-NN, NB, SVM, CNN) fed by malware images in various dimensions (i.e., 128 × 128, 64 × 64, 32 × 32, 16 × 16). The experiment results achieved on three different datasets including Malimg, Malheur and BIG2015 show that k-NN outperforms others on three datasets with high accuracy (i.e. 97.9%, 94.41% and 95.63% respectively). On the contrary, NB showed its weakness on image-based malware classification. Experiment results also indicate that the accuracy of the k-NN reaches the highest value at the input image size of 32 × 32 and tends to reduce if too many feature information provided by large input images, i.e. 64 × 64, 128 × 128.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A Survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 50(3), 1–40 (2017). Article No. 41
Article Google Scholar
Kaspersky Security Bulletin 2019, Kaspersky (2019). https://securelist.com/kaspersky-security-bulletin-threat-predictions-for-2019/88878/
Cybersecurity Ventures (2018). https://cybersecurityventures.com/-cybercrime-damages-6-trillion-by-2021/
Souri, A., Hosseini, R.: A state-of-the-art survey of malware detection approaches using data mining techniques. Hum.-Centric Comput. Inf. Sci. 8(1), 1–22 (2018). https://doi.org/10.1186/s13673-018-0125-x
Article Google Scholar
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, Pennsylvania, USA (2011)
Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Article Google Scholar
Han, K.S., Lim, J.H., Kang, B., Im, E.G.: Malware analysis using visualized images and entropy graphs. Int. J. Inf. Secur. 14(1), 1–14 (2014). https://doi.org/10.1007/s10207-014-0242-0
Article Google Scholar
Douze, M. et al.: Evaluation of GIST descriptors for web-scale image search. In: Proceedings of the ACM International Conference on Image and Video Retrieval, Article No. 19, Greece (2009)
Google Scholar
Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the 6th ACM Conference on Data and Application Security and Privacy, Louisiana, USA (2016)
Google Scholar
Bhodia, N., Prajapati, P., Troia, F.D., Stamp, M.: Transfer learning for image-based malware classification. In: Proceedings of the 5th International Conference on Information Systems Security and Privacy, pp. 719–726 (2015)
Google Scholar
Alex, T.: Malware-detection-using-Machine-Learning. https://github.com/tuff96/Malware-detection-using-Machine-Learning
Le, Q., Boydell, O., Mac Namee, B., Scanlon, M.: Deep learning at the shallow end: Malware classification for non-domain experts. Digit. Invest. 26(1), 5118–5126 (2018)
Google Scholar
Cui, Z., et al.: Detection of malicious code variants based on deep learning. IEEE Trans. Ind. Inform. 14(7), 3187–3196 (2018)
Article Google Scholar
Tareen, S.A.K., Saleem, Z.: A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. In: International Conference on Computing, Mathematics and Engineering Technologies (iCoMET 2018), Sukkur, Pakistan (2018)
Google Scholar
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. (JCS) 19(4), 639–668 (2011)
Article Google Scholar
Torralba, A.: How many pixels make an image? Vis. Neurosci. 26(1), 123–131 (2009)
Article MathSciNet Google Scholar
Orava, J.: k-nearst neighbour kernel density estimation, the choice of optimal k. Tatra Mountains Math. Publ. 50(1), 39–50 (2011)
Article MathSciNet Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maryland, USA, pp. 655–665 (2014)
Google Scholar
Albelwi, S., Mahmood, A.: A framework for designing the architectures of deep convolutional neural networks. Entropy 19(6), 242 (2017)
Article Google Scholar
Google Brain Team: TensorFlow. https://www.tensorflow.org/. Accessed 18 Nov 2019
Pedregosa, F., et al.: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Keras: Keras Documentation (2015). https://keras.io/
Abadi, M. et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conf. on Operating Systems Design and Implementation, Savannah, GA, USA, pp. 265–283 (2016)
Google Scholar
Van den Bossche, J., et al.: Scikit-learn. https://scikit-learn.org/stable/. Accessed 18 Nov 2019
Powers, D.M.W.: Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
Stamp, M.: Data analysis. In: Introduction to Machine Learning with Applications in Information Security. CRC Press, Taylor & Francis Group (2018). ISBN-13: 978-1-138-62678-2
Google Scholar
Yajamanam, S., Selvin, V., Troia, F.D., Stamp, M.: Deep learning versus gist descriptors for image-based malware classification. In: Proceedings of the 4th International Conference on Info. Systems Security and Privacy (ICISSP 2018), pp. 553–561 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Vietnam – Korea University of Information and Communication Technology, Da Nang, Vietnam
Tran The Son & Chando Lee
Faculty of Engineering and Environment, Northumbria University, Newcastle-upon-Tyne, UK
Hoa Le-Minh, Nauman Aslam & Moshin Raza
IT Faculty, FTP University, Da Nang, Vietnam
Nguyen Quoc Long

Authors

Tran The Son
View author publications
You can also search for this author in PubMed Google Scholar
Chando Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hoa Le-Minh
View author publications
You can also search for this author in PubMed Google Scholar
Nauman Aslam
View author publications
You can also search for this author in PubMed Google Scholar
Moshin Raza
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Quoc Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tran The Son .

Editor information

Editors and Affiliations

Wroclaw University of Economics and Business, Wrocław, Poland
Marcin Hernes
Wrocław University of Science and Technology, Wrocław, Poland
Krystian Wojtkiewicz
University of Newcastle, Newcastle, Australia
Edward Szczerbicki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Son, T.T., Lee, C., Le-Minh, H., Aslam, N., Raza, M., Long, N.Q. (2020). An Evaluation of Image-Based Malware Classification Using Machine Learning. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds) Advances in Computational Collective Intelligence. ICCCI 2020. Communications in Computer and Information Science, vol 1287. Springer, Cham. https://doi.org/10.1007/978-3-030-63119-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-63119-2_11
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63118-5
Online ISBN: 978-3-030-63119-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics