Android Malware Detection Using BERT

Souani, Badr; Khanfir, Ahmed; Bartel, Alexandre; Allix, Kevin; Le Traon, Yves

doi:10.1007/978-3-031-16815-4_31

Badr Souani²⁴,
Ahmed Khanfir²⁴,
Alexandre Bartel²⁵,
Kevin Allix²⁴ &
…
Yves Le Traon²⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13285))

Included in the following conference series:

International Conference on Applied Cryptography and Network Security

1629 Accesses
5 Citations

Abstract

In this paper, we propose two empirical studies to (1) detect Android malware and (2) classify Android malware into families. We first (1) reproduce the results of MalBERT using BERT models learning with Android application’s manifests obtained from 265k applications (vs. 22k for MalBERT) from the AndroZoo dataset in order to detect malware. The results of the MalBERT paper are excellent and hard to believe as a manifest only roughly represents an application, we therefore try to answer the following questions in this paper. Are the experiments from MalBERT reproducible? How important are Permissions for malware detection? Is it possible to keep or improve the results by reducing the size of the manifests? We then (2) investigate if BERT can be used to classify Android malware into families. The results show that BERT can successfully differentiate malware/goodware with 97% accuracy. Furthermore BERT can classify malware families with 93% accuracy. We also demonstrate that Android permissions are not what allows BERT to successfully classify and even that it does not actually need it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mass Discovery of Android Malware Behavioral Characteristics for Detection Consideration

MalProfiler: Automatic and Effective Classification of Android Malicious Apps in Behavioral Classes

FGFDect: A Fine-Grained Features Classification Model for Android Malware Detection

Notes

1.
https://www.virustotal.com.
2.
To obtain information about malware families, we rely on the AVclass tool [25].
3.
https://tfhub.dev.
4.
The exact model we used can be found at https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4?tf-hub-format=compressed. We note that we also relied on the matching BERT Pre-processor available at https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3?tf-hub-format=compressed.
5.
More information about this model as well as about the other available models of this collection can be found at https://tfhub.dev/google/collections/bert.

References

The free encyclopedia. https://www.wikipedia.org/
Google play store. https://play.google.com/
Tensorflow. https://www.tensorflow.org/
Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., Awajan, A.: Intelligent mobile malware detection using permission requests and API calls. Future Gener. Comput. Syst. 107, 509–521 (2020). https://doi.org/10.1016/j.future.2020.02.002, https://www.sciencedirect.com/science/article/pii/S0167739X19321223
Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of android apps for the research community. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp. 468–471. IEEE (2016)
Google Scholar
Alsoghyer, S., Almomani, I.: On the effectiveness of application permissions for android ransomware detection. In: 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), pp. 94–99 (2020). https://doi.org/10.1109/CDMA47397.2020.00022
Alzaylaee, M.K., Yerima, S.Y., Sezer, S.: DL-droid: deep learning based android malware detection using real devices. Comput. Secur. 89, 101663 (2020). https://doi.org/10.1016/j.cose.2019.101663, https://www.sciencedirect.com/science/article/pii/S0167404819300161
Arora, A., Peddoju, S.K., Conti, M.: PermPair: android malware detection using permission pairs. IEEE Trans. Inf. Forensics Secur. 15, 1968–1982 (2020). https://doi.org/10.1109/TIFS.2019.2950134
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv e-prints arXiv:2002.08155 (2020)
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with Disentangled Attention. arXiv e-prints arXiv:2006.03654 (2020)
Jeffrey, M., Nathan, H., William, G., Ryan, B.: Machine learning-based android malware detection using manifest permissions (2021). https://doi.org/10.24251/HICSS.2021.839
Jimenez, M., Rwemalika, R., Papadakis, M., Sarro, F., Le Traon, Y., Harman, M.: The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 695–705 (2019)
Google Scholar
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? Natural language attack on text classification and entailment. arXiv preprint arXiv:1907.11932 (2019)
Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for android malware detection using deep learning. Digit. Invest. 24, S48–S59 (2018). https://doi.org/10.1016/j.diin.2018.01.007, https://www.sciencedirect.com/science/article/pii/S1742287618300392
Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2019). https://doi.org/10.1109/TIFS.2018.2866319
Article Google Scholar
Lee, Y., Saxe, J., Harang, R.: CATBERT: context-aware tiny BERT for detecting social engineering emails. arXiv e-prints arXiv:2010.03484 (2020)
Li, J., Sun, L., Yan, Q., Li, Z., Srisa-an, W., Ye, H.: Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225 (2018). https://doi.org/10.1109/TII.2017.2789219
Article Google Scholar
Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124579–124607 (2020)
Article Google Scholar
Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124579–124607 (2020). https://doi.org/10.1109/ACCESS.2020.3006143
Article Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv e-prints arXiv:1907.11692 (2019)
Oak, R., Du, M., Yan, D., Takawale, H., Amit, I.: Malware detection on highly imbalanced data through sequence modeling. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, AISec 2019, pp. 37–48. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338501.3357374
Peiravian, N., Zhu, X.: Machine learning for android malware detection using permission and API calls, pp. 300–305 (2013). https://doi.org/10.1109/ICTAI.2013.53
Rahali, A., Akhloufi, M.A.: MalBERT: using transformers for cybersecurity and malicious software detection (2021)
Google Scholar
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Chapter Google Scholar
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? arXiv e-prints arXiv:1905.05583 (2019)
Sun, T., Daoudi, N., Allix, K., Bissyandé, T.F.: Android malware detection: looking beyond Dalvik bytecode. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), pp. 34–39. IEEE (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. 30 (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wu, D.J., Mao, C.H., Wei, T.E., Lee, H.M., Wu, K.P.: DroidMat: android malware detection through manifest and API calls tracing, pp. 62–69 (2012). https://doi.org/10.1109/AsiaJCIS.2012.18
Xu, H., Liu, B., Shu, L., Yu, P.S.: BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv e-prints arXiv:1904.02232 (2019)
Xu, Z., Fang, X., Yang, G.: Malbert: a novel pre-training method for malware detection. Comput. Secur. 111, 102458 (2021). https://doi.org/10.1016/j.cose.2021.102458, https://www.sciencedirect.com/science/article/pii/S0167404821002820
Yang, W., et al.: End-to-end open-domain question answering with BERTserini. arXiv e-prints arXiv:1902.01718 (2019)
Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-sec: deep learning in android malware detection. SIGCOMM Comput. Commun. Rev. 44(4), 371–372 (2014). https://doi.org/10.1145/2740070.2631434
Article Google Scholar
Zhu, J., et al.: Incorporating BERT into neural machine translation. arXiv e-prints arXiv:2002.06823 (2020)

Download references

Acknowledgment

This work was supported by the Luxembourg National Research Fund (FNR) (12696663). This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

University of Luxembourg, Esch-sur-Alzette, Luxembourg
Badr Souani, Ahmed Khanfir, Kevin Allix & Yves Le Traon
Umeå University, Umeå, Sweden
Alexandre Bartel

Authors

Badr Souani
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Khanfir
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Bartel
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Allix
View author publications
You can also search for this author in PubMed Google Scholar
Yves Le Traon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Badr Souani .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Jianying Zhou
University of Bristol, Bristol, UK
Sridhar Adepu
University of Malaga, Malaga, Spain
Cristina Alcaraz
Radboud University Nijmegen, Málaga, The Netherlands
Lejla Batina
Sapienza University of Rome, Rome, Roma, Italy
Emiliano Casalicchio
Singapore University of Technology and Design, Singapore, Singapore
Sudipta Chattopadhyay
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Chenglu Jin
University of Science and Technology of China, Hefei, China
Jingqiang Lin
University of Padua, Padua, Italy
Eleonora Losiouk
Concordia University, Montreal, QC, Canada
Suryadipta Majumdar
Technical University Denmark, Kongens Lyngby, Denmark
Weizhi Meng
Delft University of Technology, Delft, The Netherlands
Stjepan Picek
Zhejiang Gongshang University, Hangzhou, China
Jun Shao
University of Aizu, Aizu-Wakamatsu, Japan
Chunhua Su
City University of Hong Kong, Hong Kong, Hong Kong
Cong Wang
Delft University of Technology, Delft, The Netherlands
Yury Zhauniarovich
Rutgers University, Piscataway, NJ, USA
Saman Zonouz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Souani, B., Khanfir, A., Bartel, A., Allix, K., Le Traon, Y. (2022). Android Malware Detection Using BERT. In: Zhou, J., et al. Applied Cryptography and Network Security Workshops. ACNS 2022. Lecture Notes in Computer Science, vol 13285. Springer, Cham. https://doi.org/10.1007/978-3-031-16815-4_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-16815-4_31
Published: 24 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16814-7
Online ISBN: 978-3-031-16815-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics