A Hierarchical Graph-Based Neural Network for Malware Classification

Wang, Shuai; Zhao, Yuran; Liu, Gongshen; Su, Bo

doi:10.1007/978-3-030-92273-3_51

A Hierarchical Graph-Based Neural Network for Malware Classification

Conference paper
First Online: 05 December 2021

2213 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13111))

Abstract

In recent years, malware classification models based on machine learning and deep learning have developed rapidly. Although these models have yielded promising results, many of them have limited generalization capacity for the lack of good semantic information. To solve this problem, we start with finding an appropriate representation of the program and convert the program to a hierarchical graph structure composed of one Function Call Graph and many Control Flow Graphs. Based on the graph structure, we implement a malware classification model with better semantic representation and stronger generalization ability by using BERT and Graph Attention Network. The results of experiments on two different datasets demonstrate that our model outperforms other state-of-the-art models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Abusitta, A., Li, M.Q., Fung, B.C.: Malware classification and composition analysis: a survey of recent developments. J. Inf. Secur. Appl. 59, 102828 (2021)
Google Scholar
Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711. PMLR (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ding, S.H., Fung, B.C., Charland, P.: Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)
Google Scholar
Gibert, D., Mateu, C., Planes, J.: A hierarchical convolutional neural network for malware classification. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Google Scholar
Gibert, D., Mateu, C., Planes, J., Vicens, R.: Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 15(1), 15–28 (2018). https://doi.org/10.1007/s11416-018-0323-0
Article Google Scholar
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216 (2017)
Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 239–248 (2017)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)
Google Scholar
Li, M.Q., Fung, B., Charland, P., Ding, S.H.: I-mad: a novel interpretable malware detector using hierarchical transformer. arXiv preprint arXiv:1909.06865 (2019)
Li, X., Yu, Q., Yin, H.: Palmtree: learning an assembly language model for instruction embedding. arXiv preprint arXiv:2103.03809 (2021)
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)
Prajapati, P., Stamp, M.: An empirical analysis of image-based learning techniques for malware classification. In: Stamp, M., Alazab, M., Shalaginov, A. (eds.) Malware Analysis Using Artificial Intelligence and Deep Learning, pp. 411–435. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-62582-5_16
Chapter Google Scholar
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware detection by eating a whole exe. In: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Solis, D., Vicens, R.: Convolutional neural networks for classification of malware assembly code. In: Recent Advances in Artificial Intelligence Research and Development: Proceedings of the 20th International Conference of the Catalan Association for Artificial Intelligence, Deltebre, Terres de L’Ebre, Spain, 25–27 October 2017, vol. 300, p. 221. IOS Press (2017)
Google Scholar
Vinyals, O., Bengio, S., Kudlur, M.: Order matters: sequence to sequence for sets. arXiv preprint arXiv:1511.06391 (2015)
Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)
Google Scholar
Yu, Z., Cao, R., Tang, Q., Nie, S., Huang, J., Wu, S.: Order matters: semantic-aware neural networks for binary code similarity detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1145–1152 (2020)
Google Scholar

Download references

Acknowledgments

This research work has been funded by the National Natural Science Foundation of China (No. 61772337).

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Shuai Wang, Yuran Zhao, Gongshen Liu & Bo Su

Authors

Shuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuran Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Gongshen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gongshen Liu .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Zhao, Y., Liu, G., Su, B. (2021). A Hierarchical Graph-Based Neural Network for Malware Classification. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-92273-3_51
Published: 05 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92272-6
Online ISBN: 978-3-030-92273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics