Skip to main content
Log in

Deep learning for effective Android malware detection using API call graph embeddings

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

High penetration of Android applications along with their malicious variants requires efficient and effective malware detection methods to build mobile platform security. API call sequence derived from API call graph structure can be used to model application behavior accurately. Behaviors are extracted by following the API call graph, its branching, and order of calls. But identification of similarities in graphs and graph matching algorithms for classification is slow, complicated to be adopted to a new domain, and their results may be inaccurate. In this study, the authors use the API call graph as a graph representation of all possible execution paths that a malware can track during its runtime. The embedding of API call graphs transformed into a low dimension numeric vector feature set is introduced to the deep neural network. Then, similarity detection for each binary function is trained and tested effectively. This study is also focused on maximizing the performance of the network by evaluating different embedding algorithms and tuning various network configuration parameters to assure the best combination of the hyper-parameters and to reach at the highest statistical metric value. Experimental results show that the presented malware classification is reached at 98.86% level in accuracy, 98.65% in F-measure, 98.47% in recall and 98.84% in precision, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/secure-software-engineering/FlowDroid.

  2. http://amd.arguslab.org/.

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283

    Google Scholar 

  • Anderson B, Quist D, Neil J, Storlie C, Lane T (2011) Graph-based malware detection using dynamic analysis. J Comput Virol 7(4):247–258. https://doi.org/10.1007/s11416-011-0152-x

    Article  Google Scholar 

  • Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) Drebin: effective and explainable detection of android malware in your pocket. Ndss 14:23–26

    Google Scholar 

  • Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, McDaniel P (2014) Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Not 49(6):259–269

    Article  Google Scholar 

  • Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305

    MathSciNet  MATH  Google Scholar 

  • Borgwardt KM, Kriegel HP (2005) Shortest-path kernels on graphs. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05, pp 74–81. IEEE Computer Society, Washington, DC, USA. https://doi.org/10.1109/ICDM.2005.132

  • Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: AAAI, pp 1145–1152

  • Dimjašević M, Atzeni S, Ugrina I, Rakamaric Z (2016) Evaluation of android malware detection based on system calls. In: Proceedings of the 2016 ACM on international workshop on security and privacy analytics. ACM, pp 1–8

  • Fan M, Liu J, Wang W, Li H, Tian Z, Liu T (2017) Dapasa: detecting android piggybacked apps through sensitive subgraph analysis. IEEE Trans Inf Forensics Secur 12(8):1772–1785. https://doi.org/10.1109/TIFS.2017.2687880

    Article  Google Scholar 

  • Gascon H, Yamaguchi F, Arp D, Rieck K (2013) Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM workshop on artificial intelligence and security, AISec ’13, pp 45–54. ACM, New York, NY, USA. https://doi.org/10.1145/2517312.2517315

  • Gharib A, Ghorbani A (2017) DNA-Droid: a real-time android ransomware detection framework. Springer, Cham, pp 184–198. https://doi.org/10.1007/978-3-319-64701-2-14

    Book  Google Scholar 

  • Hashemi H, Azmoodeh A, Hamzeh A, Hashemi S (2017) Graph embedding as a new approach for unknown malware detection. J Comput Virol Hack Tech 13(3):153–166. https://doi.org/10.1007/s11416-016-0278-y

    Article  Google Scholar 

  • Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5:01–11. https://doi.org/10.5121/ijdkp.2015.5201

    Article  Google Scholar 

  • Hou S, Saas A, Ye Y, Chen L (2016) Droiddelver: An android malware detection system using deep belief network based on api call blocks. In: International conference on Web-age information management. Springer, Cham, pp 54–66

    Chapter  Google Scholar 

  • Hyperas: A very simple wrapper for convenient hyperparameter optimization. https://github.com/maxpumperla/hyperas. Online; Accessed 10 May 2018

  • Kadir A.F.A, Stakhanova N, Ghorbani A.A (2015) Android botnets: what urls are telling us. In: International conference on network and system security. Springer, Cham, pp 78–91

  • Keras (2017) A simplified interface to TensorFlow. https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html. Online; Accessed 7 Oct 2017

  • Kinable J, Kostakis O (2011) Malware classification based on call graph clustering. J Comput Virol 7(4):233–245. https://doi.org/10.1007/s11416-011-0151-y

    Article  Google Scholar 

  • Li L, Gao J, Hurier M, Kong P, Bissyandé T.F, Bartel A, Klein J, Le Traon Y (2017) Androzoo++: collecting millions of android apps and their metadata for the research community. ArXiv e-prints

  • Li Y, Jang J, Hu X, Ou X (2017) Android malware clustering through malicious payload mining. CoRR arXiv:1707.04795

  • Mariconti E, Onwuzurike L, Andriotis P, Cristofaro ED, Ross GJ, Stringhini G (2016) Mamadroid: detecting android malware by building markov chains of behavioral models. CoRR arXiv:1612.04433

  • Martinelli F, Marulli F, Mercaldo F Evaluating convolutional neural network for effective mobile malware detection. Procedia Comput Sci 112: 2372 – 2381 (2017). https://doi.org/10.1016/j.procs.2017.08.216. http://www.sciencedirect.com/science/article/pii/S1877050917316204. Knowledge-based and intelligent information & engineering systems: proceedings of the 21st international conference, KES-20176-8 September 2017, Marseille, France

    Article  Google Scholar 

  • Martín A, Fuentes-Hurtado F, Naranjo V, Camacho D (2017) Evolving deep neural networks architectures for android malware classification. In: 2017 IEEE Congress on evolutionary computation (CEC). IEEE, pp 1659–1666

  • McLaughlin N, Martinez del Rincon J, Kang B, Yerima S, Miller P, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A, Joon Ahn G (2017) Deep android malware detection. In: Proceedings of the seventh ACM on conference on data and application security and privacy, CODASPY ’17, pp 301–308. ACM, New York, NY, USA. https://doi.org/10.1145/3029806.3029823

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Mikolov T, Sutskever I, Chen K, Corrado G.S, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  • Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081–1088

  • Nauman M, Tanveer TA, Khan S, Syed TA (2017) Deep neural architectures for large scale android malware analysis. Cluster Comput. https://doi.org/10.1007/s10586-017-0944-y

    Article  Google Scholar 

  • Nix R, Zhang J (2017) Classification of android apps and malware using deep neural networks. In: 2017 International joint conference on neural networks (IJCNN), pp 1871–1878. https://doi.org/10.1109/IJCNN.2017.7966078

  • Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1105–1114

  • Pektaş A, Acarman T (2014) A dynamic malware analyzer against virtual machine aware malicious software. Secur Commun Netw 7(12):2245–2257

    Article  Google Scholar 

  • Pektas A, Acarman T (2017) Malware classification based on api calls and behavior analysis. IET Inf Secur. https://doi.org/10.1049/iet-ifs.2017.0430

    Article  Google Scholar 

  • Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 701–710

  • Rhode M, Burnap P, Jones K (2017) Early stage malware prediction using recurrent neural networks. CoRR arXiv:1708.03513

  • Ryder BG (1979) Constructing the call graph of a program. IEEE Trans Softw Eng 5(3):216–226. https://doi.org/10.1109/TSE.1979.234183

    Article  MathSciNet  MATH  Google Scholar 

  • Shen F, Vecchio JD, Mohaisen A, Ko SY, Ziarek L (2017) Android malware detection using complex-flows. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2430–2437. https://doi.org/10.1109/ICDCS.2017.190

  • Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Symantec: Internet Security Threat Report (2017) https://www.symantHrBec.com/content-/dam/symantec/docs/reports/istr-21-2016-en.pdfHrB

  • Tam K, Feizollah A, Anuar NB, Salleh R, Cavallaro L (2017) The evolution of android malware and android analysis techniques. ACM Comput Surv 49(4):76:1–76:41. https://doi.org/10.1145/3017427

    Article  Google Scholar 

  • Tian K, Yao DD, Ryder BG, Tan G, Peng G (2017) Detection of repackaged android malware with code-heterogeneity features. IEEE Trans Dependable Secure Comput PP(99):1. https://doi.org/10.1109/TDSC.2017.2745575

  • Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1225–1234

  • Wei F, Li Y, Roy S, Ou X, Zhou W (2017) Deep ground truth analysis of current android malware. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, Cham, pp 252–276

    Chapter  Google Scholar 

  • Wüchner T, Ochoa M, Pretschner A (2015) Robust and effective malware detection through quantitative data flow graph metrics. CoRR arXiv:1502.01609

  • Wu B, Liu Y, Lang B, Huang L (2017) Dgcnn: Disordered graph convolutional neural network based on the gaussian mixture model. arXiv preprint arXiv:1712.03563

  • Xiao X, Wang Z, Li Q, Xia S, Jiang Y (2017) Back-propagation neural network on markov chains from system call sequences: a new approach for detecting android malware with system call sequences. IET Inf Secur 11(1):8–15. https://doi.org/10.1049/iet-ifs.2015.0211

    Article  Google Scholar 

  • Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah A.K (2017) Android malware detection based on system call sequences and lstm. Multimed Tools Appl 1–21

  • Xu X, Liu C, Feng Q, Yin H, Song L, Song D (2017) Neural network-based graph embedding for cross-platform binary code similarity detection. CoRR arXiv:1708.06525

  • Xu L, Zhang D, Alvarez MA, Morales JA, Ma X, Cavazos J (2016) Dynamic android malware classification using graph-based representations. In: 2016 IEEE 3rd international conference on cyber security and cloud computing (CSCloud), pp 220–231. https://doi.org/10.1109/CSCloud.2016.27

  • Yang C, Xu Z, Gu G, Yegneswaran V, Porras P (2014) Droidminer: automated mining and characterization of fine-grained malicious behaviors in android applications. In: Kutyłowski M, Vaidya J (eds) Computer security - ESORICS 2014. Springer, Cham, pp 163–182

    Chapter  Google Scholar 

  • Yousefi-Azar M, Varadharajan V, Hamey L, Tupakula U (2017) Autoencoder-based feature learning for cyber security applications. In: 2017 International joint conference on neural networks (IJCNN), pp 3854–3861. https://doi.org/10.1109/IJCNN.2017.7966342

  • Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123

    Article  Google Scholar 

  • Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. Proc VLDB Endow 2(1):25–36. https://doi.org/10.14778/1687627.1687631

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdurrahman Pektaş.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pektaş, A., Acarman, T. Deep learning for effective Android malware detection using API call graph embeddings. Soft Comput 24, 1027–1043 (2020). https://doi.org/10.1007/s00500-019-03940-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-03940-5

Keywords

Navigation