Skip to main content

A Hierarchical Graph-Based Neural Network for Malware Classification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13111))

Abstract

In recent years, malware classification models based on machine learning and deep learning have developed rapidly. Although these models have yielded promising results, many of them have limited generalization capacity for the lack of good semantic information. To solve this problem, we start with finding an appropriate representation of the program and convert the program to a hierarchical graph structure composed of one Function Call Graph and many Control Flow Graphs. Based on the graph structure, we implement a malware classification model with better semantic representation and stronger generalization ability by using BERT and Graph Attention Network. The results of experiments on two different datasets demonstrate that our model outperforms other state-of-the-art models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://virusshare.com.

  2. 2.

    https://virustotal.com.

  3. 3.

    https://rada.re/n/.

References

  1. Abusitta, A., Li, M.Q., Fung, B.C.: Malware classification and composition analysis: a survey of recent developments. J. Inf. Secur. Appl. 59, 102828 (2021)

    Google Scholar 

  2. Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711. PMLR (2016)

    Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Ding, S.H., Fung, B.C., Charland, P.: Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)

    Google Scholar 

  5. Gibert, D., Mateu, C., Planes, J.: A hierarchical convolutional neural network for malware classification. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  6. Gibert, D., Mateu, C., Planes, J., Vicens, R.: Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 15(1), 15–28 (2018). https://doi.org/10.1007/s11416-018-0323-0

    Article  Google Scholar 

  7. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216 (2017)

  8. Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 239–248 (2017)

    Google Scholar 

  9. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  10. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)

    Google Scholar 

  11. Li, M.Q., Fung, B., Charland, P., Ding, S.H.: I-mad: a novel interpretable malware detector using hierarchical transformer. arXiv preprint arXiv:1909.06865 (2019)

  12. Li, X., Yu, Q., Yin, H.: Palmtree: learning an assembly language model for instruction embedding. arXiv preprint arXiv:2103.03809 (2021)

  13. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)

  14. Prajapati, P., Stamp, M.: An empirical analysis of image-based learning techniques for malware classification. In: Stamp, M., Alazab, M., Shalaginov, A. (eds.) Malware Analysis Using Artificial Intelligence and Deep Learning, pp. 411–435. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-62582-5_16

    Chapter  Google Scholar 

  15. Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware detection by eating a whole exe. In: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  16. Solis, D., Vicens, R.: Convolutional neural networks for classification of malware assembly code. In: Recent Advances in Artificial Intelligence Research and Development: Proceedings of the 20th International Conference of the Catalan Association for Artificial Intelligence, Deltebre, Terres de L’Ebre, Spain, 25–27 October 2017, vol. 300, p. 221. IOS Press (2017)

    Google Scholar 

  17. Vinyals, O., Bengio, S., Kudlur, M.: Order matters: sequence to sequence for sets. arXiv preprint arXiv:1511.06391 (2015)

  18. Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)

    Google Scholar 

  19. Yu, Z., Cao, R., Tang, Q., Nie, S., Huang, J., Wu, S.: Order matters: semantic-aware neural networks for binary code similarity detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1145–1152 (2020)

    Google Scholar 

Download references

Acknowledgments

This research work has been funded by the National Natural Science Foundation of China (No. 61772337).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gongshen Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Zhao, Y., Liu, G., Su, B. (2021). A Hierarchical Graph-Based Neural Network for Malware Classification. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92273-3_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92272-6

  • Online ISBN: 978-3-030-92273-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics