Deep learning for effective Android malware detection using API call graph embeddings

Pektaş, Abdurrahman; Acarman, Tankut

doi:10.1007/s00500-019-03940-5

Deep learning for effective Android malware detection using API call graph embeddings

Methodologies and Application
Published: 23 March 2019

Volume 24, pages 1027–1043, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Abdurrahman Pektaş¹ &
Tankut Acarman¹

3559 Accesses
79 Citations
4 Altmetric
Explore all metrics

Abstract

High penetration of Android applications along with their malicious variants requires efficient and effective malware detection methods to build mobile platform security. API call sequence derived from API call graph structure can be used to model application behavior accurately. Behaviors are extracted by following the API call graph, its branching, and order of calls. But identification of similarities in graphs and graph matching algorithms for classification is slow, complicated to be adopted to a new domain, and their results may be inaccurate. In this study, the authors use the API call graph as a graph representation of all possible execution paths that a malware can track during its runtime. The embedding of API call graphs transformed into a low dimension numeric vector feature set is introduced to the deep neural network. Then, similarity detection for each binary function is trained and tested effectively. This study is also focused on maximizing the performance of the network by evaluating different embedding algorithms and tuning various network configuration parameters to assure the best combination of the hyper-parameters and to reach at the highest statistical metric value. Experimental results show that the presented malware classification is reached at 98.86% level in accuracy, 98.65% in F-measure, 98.47% in recall and 98.84% in precision, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Abdul Basit, Maham Zafar, … Kashif Kifayat

Graph convolutional networks: a comprehensive review

Article Open access 10 November 2019

Si Zhang, Hanghang Tong, … Ross Maciejewski

Deep packet: a novel approach for encrypted traffic classification using deep learning

Article 13 May 2019

Mohammad Lotfollahi, Mahdi Jafari Siavoshani, … Mohammdsadegh Saberian

Notes

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283
Google Scholar
Anderson B, Quist D, Neil J, Storlie C, Lane T (2011) Graph-based malware detection using dynamic analysis. J Comput Virol 7(4):247–258. https://doi.org/10.1007/s11416-011-0152-x
Article Google Scholar
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) Drebin: effective and explainable detection of android malware in your pocket. Ndss 14:23–26
Google Scholar
Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, McDaniel P (2014) Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Not 49(6):259–269
Article Google Scholar
Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
MathSciNet MATH Google Scholar
Borgwardt KM, Kriegel HP (2005) Shortest-path kernels on graphs. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05, pp 74–81. IEEE Computer Society, Washington, DC, USA. https://doi.org/10.1109/ICDM.2005.132
Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: AAAI, pp 1145–1152
Dimjašević M, Atzeni S, Ugrina I, Rakamaric Z (2016) Evaluation of android malware detection based on system calls. In: Proceedings of the 2016 ACM on international workshop on security and privacy analytics. ACM, pp 1–8
Fan M, Liu J, Wang W, Li H, Tian Z, Liu T (2017) Dapasa: detecting android piggybacked apps through sensitive subgraph analysis. IEEE Trans Inf Forensics Secur 12(8):1772–1785. https://doi.org/10.1109/TIFS.2017.2687880
Article Google Scholar
Gascon H, Yamaguchi F, Arp D, Rieck K (2013) Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM workshop on artificial intelligence and security, AISec ’13, pp 45–54. ACM, New York, NY, USA. https://doi.org/10.1145/2517312.2517315
Gharib A, Ghorbani A (2017) DNA-Droid: a real-time android ransomware detection framework. Springer, Cham, pp 184–198. https://doi.org/10.1007/978-3-319-64701-2-14
Book Google Scholar
Hashemi H, Azmoodeh A, Hamzeh A, Hashemi S (2017) Graph embedding as a new approach for unknown malware detection. J Comput Virol Hack Tech 13(3):153–166. https://doi.org/10.1007/s11416-016-0278-y
Article Google Scholar
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5:01–11. https://doi.org/10.5121/ijdkp.2015.5201
Article Google Scholar
Hou S, Saas A, Ye Y, Chen L (2016) Droiddelver: An android malware detection system using deep belief network based on api call blocks. In: International conference on Web-age information management. Springer, Cham, pp 54–66
Chapter Google Scholar
Hyperas: A very simple wrapper for convenient hyperparameter optimization. https://github.com/maxpumperla/hyperas. Online; Accessed 10 May 2018
Kadir A.F.A, Stakhanova N, Ghorbani A.A (2015) Android botnets: what urls are telling us. In: International conference on network and system security. Springer, Cham, pp 78–91
Keras (2017) A simplified interface to TensorFlow. https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html. Online; Accessed 7 Oct 2017
Kinable J, Kostakis O (2011) Malware classification based on call graph clustering. J Comput Virol 7(4):233–245. https://doi.org/10.1007/s11416-011-0151-y
Article Google Scholar
Li L, Gao J, Hurier M, Kong P, Bissyandé T.F, Bartel A, Klein J, Le Traon Y (2017) Androzoo++: collecting millions of android apps and their metadata for the research community. ArXiv e-prints
Li Y, Jang J, Hu X, Ou X (2017) Android malware clustering through malicious payload mining. CoRR arXiv:1707.04795
Mariconti E, Onwuzurike L, Andriotis P, Cristofaro ED, Ross GJ, Stringhini G (2016) Mamadroid: detecting android malware by building markov chains of behavioral models. CoRR arXiv:1612.04433
Martinelli F, Marulli F, Mercaldo F Evaluating convolutional neural network for effective mobile malware detection. Procedia Comput Sci 112: 2372 – 2381 (2017). https://doi.org/10.1016/j.procs.2017.08.216. http://www.sciencedirect.com/science/article/pii/S1877050917316204. Knowledge-based and intelligent information & engineering systems: proceedings of the 21st international conference, KES-20176-8 September 2017, Marseille, France
Article Google Scholar
Martín A, Fuentes-Hurtado F, Naranjo V, Camacho D (2017) Evolving deep neural networks architectures for android malware classification. In: 2017 IEEE Congress on evolutionary computation (CEC). IEEE, pp 1659–1666
McLaughlin N, Martinez del Rincon J, Kang B, Yerima S, Miller P, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A, Joon Ahn G (2017) Deep android malware detection. In: Proceedings of the seventh ACM on conference on data and application security and privacy, CODASPY ’17, pp 301–308. ACM, New York, NY, USA. https://doi.org/10.1145/3029806.3029823
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado G.S, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081–1088
Nauman M, Tanveer TA, Khan S, Syed TA (2017) Deep neural architectures for large scale android malware analysis. Cluster Comput. https://doi.org/10.1007/s10586-017-0944-y
Article Google Scholar
Nix R, Zhang J (2017) Classification of android apps and malware using deep neural networks. In: 2017 International joint conference on neural networks (IJCNN), pp 1871–1878. https://doi.org/10.1109/IJCNN.2017.7966078
Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1105–1114
Pektaş A, Acarman T (2014) A dynamic malware analyzer against virtual machine aware malicious software. Secur Commun Netw 7(12):2245–2257
Article Google Scholar
Pektas A, Acarman T (2017) Malware classification based on api calls and behavior analysis. IET Inf Secur. https://doi.org/10.1049/iet-ifs.2017.0430
Article Google Scholar
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 701–710
Rhode M, Burnap P, Jones K (2017) Early stage malware prediction using recurrent neural networks. CoRR arXiv:1708.03513
Ryder BG (1979) Constructing the call graph of a program. IEEE Trans Softw Eng 5(3):216–226. https://doi.org/10.1109/TSE.1979.234183
Article MathSciNet MATH Google Scholar
Shen F, Vecchio JD, Mohaisen A, Ko SY, Ziarek L (2017) Android malware detection using complex-flows. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2430–2437. https://doi.org/10.1109/ICDCS.2017.190
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Symantec: Internet Security Threat Report (2017) https://www.symantHrBec.com/content-/dam/symantec/docs/reports/istr-21-2016-en.pdfHrB
Tam K, Feizollah A, Anuar NB, Salleh R, Cavallaro L (2017) The evolution of android malware and android analysis techniques. ACM Comput Surv 49(4):76:1–76:41. https://doi.org/10.1145/3017427
Article Google Scholar
Tian K, Yao DD, Ryder BG, Tan G, Peng G (2017) Detection of repackaged android malware with code-heterogeneity features. IEEE Trans Dependable Secure Comput PP(99):1. https://doi.org/10.1109/TDSC.2017.2745575
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1225–1234
Wei F, Li Y, Roy S, Ou X, Zhou W (2017) Deep ground truth analysis of current android malware. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, Cham, pp 252–276
Chapter Google Scholar
Wüchner T, Ochoa M, Pretschner A (2015) Robust and effective malware detection through quantitative data flow graph metrics. CoRR arXiv:1502.01609
Wu B, Liu Y, Lang B, Huang L (2017) Dgcnn: Disordered graph convolutional neural network based on the gaussian mixture model. arXiv preprint arXiv:1712.03563
Xiao X, Wang Z, Li Q, Xia S, Jiang Y (2017) Back-propagation neural network on markov chains from system call sequences: a new approach for detecting android malware with system call sequences. IET Inf Secur 11(1):8–15. https://doi.org/10.1049/iet-ifs.2015.0211
Article Google Scholar
Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah A.K (2017) Android malware detection based on system call sequences and lstm. Multimed Tools Appl 1–21
Xu X, Liu C, Feng Q, Yin H, Song L, Song D (2017) Neural network-based graph embedding for cross-platform binary code similarity detection. CoRR arXiv:1708.06525
Xu L, Zhang D, Alvarez MA, Morales JA, Ma X, Cavazos J (2016) Dynamic android malware classification using graph-based representations. In: 2016 IEEE 3rd international conference on cyber security and cloud computing (CSCloud), pp 220–231. https://doi.org/10.1109/CSCloud.2016.27
Yang C, Xu Z, Gu G, Yegneswaran V, Porras P (2014) Droidminer: automated mining and characterization of fine-grained malicious behaviors in android applications. In: Kutyłowski M, Vaidya J (eds) Computer security - ESORICS 2014. Springer, Cham, pp 163–182
Chapter Google Scholar
Yousefi-Azar M, Varadharajan V, Hamey L, Tupakula U (2017) Autoencoder-based feature learning for cyber security applications. In: 2017 International joint conference on neural networks (IJCNN), pp 3854–3861. https://doi.org/10.1109/IJCNN.2017.7966342
Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123
Article Google Scholar
Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. Proc VLDB Endow 2(1):25–36. https://doi.org/10.14778/1687627.1687631
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Galatasaray University, 34349, Ortaköy, Istanbul, Turkey
Abdurrahman Pektaş & Tankut Acarman

Authors

Abdurrahman Pektaş
View author publications
You can also search for this author in PubMed Google Scholar
Tankut Acarman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdurrahman Pektaş.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pektaş, A., Acarman, T. Deep learning for effective Android malware detection using API call graph embeddings. Soft Comput 24, 1027–1043 (2020). https://doi.org/10.1007/s00500-019-03940-5

Download citation

Published: 23 March 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s00500-019-03940-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for effective Android malware detection using API call graph embeddings

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of AI-enabled phishing attacks detection techniques

Graph convolutional networks: a comprehensive review

Deep packet: a novel approach for encrypted traffic classification using deep learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning for effective Android malware detection using API call graph embeddings

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of AI-enabled phishing attacks detection techniques

Graph convolutional networks: a comprehensive review

Deep packet: a novel approach for encrypted traffic classification using deep learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation