Skip to main content

Android Malware Detection Using BERT

  • Conference paper
  • First Online:
Applied Cryptography and Network Security Workshops (ACNS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13285))

Included in the following conference series:

Abstract

In this paper, we propose two empirical studies to (1) detect Android malware and (2) classify Android malware into families. We first (1) reproduce the results of MalBERT using BERT models learning with Android application’s manifests obtained from 265k applications (vs. 22k for MalBERT) from the AndroZoo dataset in order to detect malware. The results of the MalBERT paper are excellent and hard to believe as a manifest only roughly represents an application, we therefore try to answer the following questions in this paper. Are the experiments from MalBERT reproducible? How important are Permissions for malware detection? Is it possible to keep or improve the results by reducing the size of the manifests? We then (2) investigate if BERT can be used to classify Android malware into families. The results show that BERT can successfully differentiate malware/goodware with 97% accuracy. Furthermore BERT can classify malware families with 93% accuracy. We also demonstrate that Android permissions are not what allows BERT to successfully classify and even that it does not actually need it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.virustotal.com.

  2. 2.

    To obtain information about malware families, we rely on the AVclass tool [25].

  3. 3.

    https://tfhub.dev.

  4. 4.

    The exact model we used can be found at https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4?tf-hub-format=compressed. We note that we also relied on the matching BERT Pre-processor available at https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3?tf-hub-format=compressed.

  5. 5.

    More information about this model as well as about the other available models of this collection can be found at https://tfhub.dev/google/collections/bert.

References

  1. The free encyclopedia. https://www.wikipedia.org/

  2. Google play store. https://play.google.com/

  3. Tensorflow. https://www.tensorflow.org/

  4. Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., Awajan, A.: Intelligent mobile malware detection using permission requests and API calls. Future Gener. Comput. Syst. 107, 509–521 (2020). https://doi.org/10.1016/j.future.2020.02.002, https://www.sciencedirect.com/science/article/pii/S0167739X19321223

  5. Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of android apps for the research community. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp. 468–471. IEEE (2016)

    Google Scholar 

  6. Alsoghyer, S., Almomani, I.: On the effectiveness of application permissions for android ransomware detection. In: 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), pp. 94–99 (2020). https://doi.org/10.1109/CDMA47397.2020.00022

  7. Alzaylaee, M.K., Yerima, S.Y., Sezer, S.: DL-droid: deep learning based android malware detection using real devices. Comput. Secur. 89, 101663 (2020). https://doi.org/10.1016/j.cose.2019.101663, https://www.sciencedirect.com/science/article/pii/S0167404819300161

  8. Arora, A., Peddoju, S.K., Conti, M.: PermPair: android malware detection using permission pairs. IEEE Trans. Inf. Forensics Secur. 15, 1968–1982 (2020). https://doi.org/10.1109/TIFS.2019.2950134

    Article  Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  10. Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv e-prints arXiv:2002.08155 (2020)

  11. He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with Disentangled Attention. arXiv e-prints arXiv:2006.03654 (2020)

  12. Jeffrey, M., Nathan, H., William, G., Ryan, B.: Machine learning-based android malware detection using manifest permissions (2021). https://doi.org/10.24251/HICSS.2021.839

  13. Jimenez, M., Rwemalika, R., Papadakis, M., Sarro, F., Le Traon, Y., Harman, M.: The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 695–705 (2019)

    Google Scholar 

  14. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? Natural language attack on text classification and entailment. arXiv preprint arXiv:1907.11932 (2019)

  15. Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for android malware detection using deep learning. Digit. Invest. 24, S48–S59 (2018). https://doi.org/10.1016/j.diin.2018.01.007, https://www.sciencedirect.com/science/article/pii/S1742287618300392

  16. Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2019). https://doi.org/10.1109/TIFS.2018.2866319

    Article  Google Scholar 

  17. Lee, Y., Saxe, J., Harang, R.: CATBERT: context-aware tiny BERT for detecting social engineering emails. arXiv e-prints arXiv:2010.03484 (2020)

  18. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-an, W., Ye, H.: Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225 (2018). https://doi.org/10.1109/TII.2017.2789219

    Article  Google Scholar 

  19. Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124579–124607 (2020)

    Article  Google Scholar 

  20. Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124579–124607 (2020). https://doi.org/10.1109/ACCESS.2020.3006143

    Article  Google Scholar 

  21. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv e-prints arXiv:1907.11692 (2019)

  22. Oak, R., Du, M., Yan, D., Takawale, H., Amit, I.: Malware detection on highly imbalanced data through sequence modeling. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, AISec 2019, pp. 37–48. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338501.3357374

  23. Peiravian, N., Zhu, X.: Machine learning for android malware detection using permission and API calls, pp. 300–305 (2013). https://doi.org/10.1109/ICTAI.2013.53

  24. Rahali, A., Akhloufi, M.A.: MalBERT: using transformers for cybersecurity and malicious software detection (2021)

    Google Scholar 

  25. Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11

    Chapter  Google Scholar 

  26. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? arXiv e-prints arXiv:1905.05583 (2019)

  27. Sun, T., Daoudi, N., Allix, K., Bissyandé, T.F.: Android malware detection: looking beyond Dalvik bytecode. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), pp. 34–39. IEEE (2021)

    Google Scholar 

  28. Vaswani, A., et al.: Attention is all you need. 30 (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  29. Wu, D.J., Mao, C.H., Wei, T.E., Lee, H.M., Wu, K.P.: DroidMat: android malware detection through manifest and API calls tracing, pp. 62–69 (2012). https://doi.org/10.1109/AsiaJCIS.2012.18

  30. Xu, H., Liu, B., Shu, L., Yu, P.S.: BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv e-prints arXiv:1904.02232 (2019)

  31. Xu, Z., Fang, X., Yang, G.: Malbert: a novel pre-training method for malware detection. Comput. Secur. 111, 102458 (2021). https://doi.org/10.1016/j.cose.2021.102458, https://www.sciencedirect.com/science/article/pii/S0167404821002820

  32. Yang, W., et al.: End-to-end open-domain question answering with BERTserini. arXiv e-prints arXiv:1902.01718 (2019)

  33. Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-sec: deep learning in android malware detection. SIGCOMM Comput. Commun. Rev. 44(4), 371–372 (2014). https://doi.org/10.1145/2740070.2631434

    Article  Google Scholar 

  34. Zhu, J., et al.: Incorporating BERT into neural machine translation. arXiv e-prints arXiv:2002.06823 (2020)

Download references

Acknowledgment

This work was supported by the Luxembourg National Research Fund (FNR) (12696663). This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Badr Souani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Souani, B., Khanfir, A., Bartel, A., Allix, K., Le Traon, Y. (2022). Android Malware Detection Using BERT. In: Zhou, J., et al. Applied Cryptography and Network Security Workshops. ACNS 2022. Lecture Notes in Computer Science, vol 13285. Springer, Cham. https://doi.org/10.1007/978-3-031-16815-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16815-4_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16814-7

  • Online ISBN: 978-3-031-16815-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics