Using convolutional neural networks for classification of malware represented as images

Gibert, Daniel; Mateu, Carles; Planes, Jordi; Vicens, Ramon

doi:10.1007/s11416-018-0323-0

Using convolutional neural networks for classification of malware represented as images

Original Paper
Published: 27 August 2018

Volume 15, pages 15–28, (2019)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Daniel Gibert ORCID: orcid.org/0000-0002-2448-1297^1,2,
Carles Mateu²,
Jordi Planes² &
…
Ramon Vicens¹

4284 Accesses
122 Citations
1 Altmetric
Explore all metrics

Abstract

The number of malicious files detected every year are counted by millions. One of the main reasons for these high volumes of different files is the fact that, in order to evade detection, malware authors add mutation. This means that malicious files belonging to the same family, with the same malicious behavior, are constantly modified or obfuscated using several techniques, in such a way that they look like different files. In order to be effective in analyzing and classifying such large amounts of files, we need to be able to categorize them into groups and identify their respective families on the basis of their behavior. In this paper, malicious software is visualized as gray scale images since its ability to capture minor changes while retaining the global structure helps to detect variations. Motivated by the visual similarity between malware samples of the same family, we propose a file agnostic deep learning approach for malware categorization to efficiently group malicious software into families based on a set of discriminant patterns extracted from their visualization as images. The suitability of our approach is evaluated against two benchmarks: the MalImg dataset and the Microsoft Malware Classification Challenge dataset. Experimental comparison demonstrates its superior performance with respect to state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Detailed Investigation and Analysis of Deep Learning Architectures and Visualization Techniques for Malware Family Identification

Malware Classification Using Image Representation

Deep Learning-Based Malware Detection Using PE Headers

References

Ahmadi, M., Giacinto, G., Ulyanov, D., Semenov, S., Trofimov, M.: Novel feature extraction, selection and fusion for effective malware family classification. CoRR abs/1511.04317 (2015)
Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. J. Comput. Virol. 7(4), 247–258 (2011). https://doi.org/10.1007/s11416-011-0152-x
Article Google Scholar
Bat-Erdene, M., Park, H., Li, H., Lee, H., Choi, M.S.: Entropy analysis to classify unknown packing algorithms for malware detection. Int. J. Inf. Secur. 16(3), 227–248 (2017)
Article Google Scholar
Billar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. Forensics 1, 156–168 (2007)
Article Google Scholar
Chandrasekar Ravi, R.M.: Malware detection using windows API sequence and machine learning. Int. J. Comput. Appl. 43, 12–16 (2012)
Google Scholar
Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44(2), 6:1–6:42 (2008). https://doi.org/10.1145/2089125.2089126
Google Scholar
Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secur. 5, 56–64 (2014)
Google Scholar
Ghiasi, M., Sami, A., Salehi, Z.: Dynamic VSA: a framework for malware detection based on register contents. Eng. Appl. Artif. Intell. 44, 111–122 (2015)
Article Google Scholar
Gibert, D., Bejar, J., Mateu, C., Planes, J., Solis, D., Vicens, R.: Convolutional neural networks for classification of malware assembly code. In: International Conference of the Catalan Association for Artificial Intelligence, pp. 221–226 (2017). https://doi.org/10.3233/978-1-61499-806-8-221
Gibert, D., Mateu, C., Planes, J., Vicens, R.: Classification of malware by using structural entropy on convolutional neural networks. In: AAAI Conference on Artificial Intelligence (2018)
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. SMC–3(6), 610–621 (1973)
Article Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psych. 24, 417–441 (1933)
Article MATH Google Scholar
Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. (Lond.) 195, 215–243 (1968)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp. 2278–2324 (1998)
LLC, M.: Mcafee labs threats report (2017). https://www.mcafee.com/us/resources/reports/rp-quarterly-threats-jun-2017.pdf. Accessed 20 Sept 2017
Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Anal. 5, 40–45 (2007)
Article Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pp. 807–814. Omnipress, USA (2010)
Narayanan, B.N., Djaneye-Boundjou, O., Kebede, T.M.: Performance analysis of machine learning and pattern recognition algorithms for malware classification. In: Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016 IEEE National, pp. 338–342. IEEE (2016)
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, VizSec ’11, pp. 4:1–4:7. ACM, New York, NY, USA (2011)
Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994. Vol. 1—Conference A: Computer Vision amp; Image Processing, vol. 1 (1994)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Ranvee, S., Hiray, S.: Comparative analysis of feature extraction methods of malware detection. Int. J. Comput. Appl. 120, 1–7 (2015)
Google Scholar
Salehi, Z., Sami, A., Ghiasi, M.: MAAR: robust features to detect malicious activity based on api calls, their arguments and return values. Eng. Appl. Artif. Intell. 59, 93–102 (2017)
Article Google Scholar
Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inf. 1(1), 1 (2012). https://doi.org/10.1186/2190-8532-1-1
Article Google Scholar
Sorokin, I.: Comparing files using structural entropy. J. Comput. Virol. 7(4), 259 (2011)
Article MathSciNet Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Storlie, C., Anderson, B., Vander Wiel, S., Quist, D., Hash, C., Brown, N.: Stochastic identification of malware with dynamic traces. Ann. Appl. Stat. 8(1), 1–18 (2014). https://doi.org/10.1214/13-AOAS703
Article MathSciNet MATH Google Scholar
Tesauro, G., Kephart, J., Sorkin, G.B.: Neural networks for computer virus recognition. In: IEEE International Conference on Intelligence and Security Informatics, vol. 11 (1996)
Turkowski, K.: Filters for common resampling tasks. In: Glassner, A.S. (ed.) Graphics Gems, pp. 147–165. Academic Press Professional Inc., San Diego, CA (1990)
Chapter Google Scholar
Wojnowicz, M., Chisholm, G., Wolff, M.: Suspiciously structured entropy: wavelet decomposition of software entropy reveals symptoms of malware in the energy spectrum. In: Florida Artificial Intelligence Research Society Conference (2016)
Yuxin, D., Siyi, Z.: Malware detection based on deep learning algorithm. Neural Comput. Appl. (2017). https://doi.org/10.1007/s00521-017-3077-6

Download references

Acknowledgements

We would like to thank the Blueliv Labs team, especially Daniel Solís, and Àngel Puigventós for their support and the feedback provided during the development of this work. This work has been partially funded by the Spanish MICINN Projects TIN2014-53234-C2-2-R, TIN2015-71799-C2-2-P and by AGAUR DI-2016-091.

Author information

Authors and Affiliations

Blueliv, Leap in Value, Barcelona, Spain
Daniel Gibert & Ramon Vicens
University of Lleida, Lleida, Spain
Daniel Gibert, Carles Mateu & Jordi Planes

Authors

Daniel Gibert
View author publications
You can also search for this author in PubMed Google Scholar
Carles Mateu
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Planes
View author publications
You can also search for this author in PubMed Google Scholar
Ramon Vicens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Gibert.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gibert, D., Mateu, C., Planes, J. et al. Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hack Tech 15, 15–28 (2019). https://doi.org/10.1007/s11416-018-0323-0

Download citation

Received: 17 February 2018
Accepted: 10 August 2018
Published: 27 August 2018
Issue Date: 11 March 2019
DOI: https://doi.org/10.1007/s11416-018-0323-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using convolutional neural networks for classification of malware represented as images

Abstract

Access this article

Similar content being viewed by others

A Detailed Investigation and Analysis of Deep Learning Architectures and Visualization Techniques for Malware Family Identification

Malware Classification Using Image Representation

Deep Learning-Based Malware Detection Using PE Headers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using convolutional neural networks for classification of malware represented as images

Abstract

Access this article

Similar content being viewed by others

A Detailed Investigation and Analysis of Deep Learning Architectures and Visualization Techniques for Malware Family Identification

Malware Classification Using Image Representation

Deep Learning-Based Malware Detection Using PE Headers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation