Deep Learning for Classification of Malware System Call Sequences

Kolosnjaji, Bojan; Zarras, Apostolis; Webster, George; Eckert, Claudia

doi:10.1007/978-3-319-50127-7_11

Bojan Kolosnjaji²¹,
Apostolis Zarras²¹,
George Webster²¹ &
…
Claudia Eckert²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

7922 Accesses
7 Altmetric

Abstract

The increase in number and variety of malware samples amplifies the need for improvement in automatic detection and classification of the malware variants. Machine learning is a natural choice to cope with this increase, because it addresses the need of discovering underlying patterns in large-scale datasets. Nowadays, neural network methodology has been grown to the state that can surpass limitations of previous machine learning methods, such as Hidden Markov Models and Support Vector Machines. As a consequence, neural networks can now offer superior classification accuracy in many domains, such as computer vision or natural language processing. This improvement comes from the possibility of constructing neural networks with a higher number of potentially diverse layers and is known as Deep Learning.

In this paper, we attempt to transfer these performance improvements to model the malware system call sequences for the purpose of malware classification. We construct a neural network based on convolutional and recurrent network layers in order to obtain the best features for classification. This way we get a hierarchical feature extraction architecture that combines convolution of n-grams with full sequential modeling. Our evaluation results demonstrate that our approach outperforms previously used methods in malware classification, being able to achieve an average of 85.6% on precision and 89.4% on recall using this combined neural network architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Malware Detection with Sequence-Based Machine Learning and Deep Learning

Malware Classification Based on Graph Convolutional Neural Networks and Static Call Graph Features

A Comparison of Neural Network Architectures for Malware Classification Based on Noriben Operation Sequences

References

PEInfo Service. https://github.com/crits/crits_services/tree/master/peinfo_service
VirusTotal, May 2015. http://www.virustotal.com
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems. arXiv preprint arXiv:1603.04467 (2015)
Attaluri, S., McGhee, S., Stamp, M.: Profile Hidden Markov Models and metamorphic virus detection. J. Comput. Virol. 5(2), 151–169 (2009)
Article Google Scholar
Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74320-0_10
Chapter Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: ISOC Network and Distributed System Security Symposium (NDSS) (2009)
Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Python for Scientific Computing Conference (SciPy) (2010)
Google Scholar
Bu, Z., et al.: McAfee Threats Report: Second Quarter 2012 (2012)
Google Scholar
Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Article Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Google Scholar
Guarnieri, C., Tanasi, A., Bremer, J., Schloesser, M.: The Cuckoo Sandbox (2012)
Google Scholar
Heller, K., Svore, K., Keromytis, A.D., Stolfo, S.: One class support vector machines for detecting anomalous windows registry accesses. In: Workshop on Data Mining for Computer Security (DMSEC) (2003)
Google Scholar
Huang, W., Stokes, J.W.: MtNet: a multi-task neural network for dynamic malware classification. In: Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA) (2016)
Google Scholar
Kolosnjaji, B., Zarras, A., Lengyel, T., Webster, G., Eckert, C.: Adaptive semantics-aware malware classification. In: Caballero, J., Zurutuza, U., Rodríguez, R.J. (eds.) DIMVA 2016. LNCS, vol. 9721, pp. 419–439. Springer, Heidelberg (2016). doi:10.1007/978-3-319-40667-1_21
Chapter Google Scholar
Lengyel, T.K., Maresca, S., Payne, B.D., Webster, G.D., Vogl, S., Kiayias, A.: Scalability, fidelity and stealth in the DRAKVUF dynamic malware analysis system. In: Annual Computer Security Applications Conference (ACSAC) (2014)
Google Scholar
Maxwell, K.: Maltrieve, April 2015. https://github.com/krmaxwell/maltrieve
Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
Google Scholar
Perdisci, R., ManChon, U.: VAMO: towards a fully automated malware clustering validity analysis. In: Annual Computer Security Applications Conference (ACSAC) (2012)
Google Scholar
Pfoh, J., Schneider, C., Eckert, C.: Leveraging string kernels for malware detection. In: Lopez, J., Huang, X., Sandhu, R. (eds.) NSS 2013. LNCS, vol. 7873, pp. 206–219. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38631-2_16
Chapter Google Scholar
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70542-0_6
Chapter Google Scholar
Roberts, J.-M.: Virus Share, November 2015. https://virusshare.com/
Saxe, J., Berlin, K.: Features, deep neural network based malware detection using two dimensional binary program arXiv preprint arXiv:1508.03096 (2015)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: IEEE Symposium on Security and Privacy (2001)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Tegeler, F., Fu, X., Vigna, G., Kruegel, C.: Botfinder: finding bots in network traffic without deep packet inspection. In International Conference on Emerging Networking Experiments and Technologies (CoNEXT) (2012)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). 85
MATH Google Scholar
VirusTotal. File Statistics, November 2015. https://www.virustotal.com/en/statistics/
Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Conference on Computer and Communications Security (CCS) (2002)
Google Scholar
Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy (1999)
Google Scholar
Webster, G.D., Hanif, Z.D., Ludwig, A.L.P., Lengyel, T.K., Zarras, A., Eckert, C.: SKALD: a scalable architecture for feature extraction, multi-user analysis, and real-time information sharing. In: Bishop, M., Nascimento, A.C.A. (eds.) ISC 2016. LNCS, vol. 9866, pp. 231–249. Springer, Heidelberg (2016). doi:10.1007/978-3-319-45871-7_15
Chapter Google Scholar
Xiao, H., Eckert, C.: Efficient online sequence prediction with side information. In: IEEE International Conference on Data Mining (ICDM) (2013)
Google Scholar
Xiao, H., Stibor, T.: A supervised topic transition model for detecting malicious system call sequences. In: Workshop on Knowledge Discovery, Modeling and Simulation (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Bojan Kolosnjaji, Apostolis Zarras, George Webster & Claudia Eckert

Authors

Bojan Kolosnjaji
View author publications
You can also search for this author in PubMed Google Scholar
Apostolis Zarras
View author publications
You can also search for this author in PubMed Google Scholar
George Webster
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Eckert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bojan Kolosnjaji .

Editor information

Editors and Affiliations

University of Tasmania, Hobart, Australia
Byeong Ho Kang
Auckland University of Technology, Auckland, New Zealand
Quan Bai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C. (2016). Deep Learning for Classification of Malware System Call Sequences. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-50127-7_11
Published: 29 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics