Abstract
High penetration of Android applications along with their malicious variants requires efficient and effective malware detection methods to build mobile platform security. API call sequence derived from API call graph structure can be used to model application behavior accurately. Behaviors are extracted by following the API call graph, its branching, and order of calls. But identification of similarities in graphs and graph matching algorithms for classification is slow, complicated to be adopted to a new domain, and their results may be inaccurate. In this study, the authors use the API call graph as a graph representation of all possible execution paths that a malware can track during its runtime. The embedding of API call graphs transformed into a low dimension numeric vector feature set is introduced to the deep neural network. Then, similarity detection for each binary function is trained and tested effectively. This study is also focused on maximizing the performance of the network by evaluating different embedding algorithms and tuning various network configuration parameters to assure the best combination of the hyper-parameters and to reach at the highest statistical metric value. Experimental results show that the presented malware classification is reached at 98.86% level in accuracy, 98.65% in F-measure, 98.47% in recall and 98.84% in precision, respectively.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283
Anderson B, Quist D, Neil J, Storlie C, Lane T (2011) Graph-based malware detection using dynamic analysis. J Comput Virol 7(4):247–258. https://doi.org/10.1007/s11416-011-0152-x
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) Drebin: effective and explainable detection of android malware in your pocket. Ndss 14:23–26
Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, McDaniel P (2014) Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Not 49(6):259–269
Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
Borgwardt KM, Kriegel HP (2005) Shortest-path kernels on graphs. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05, pp 74–81. IEEE Computer Society, Washington, DC, USA. https://doi.org/10.1109/ICDM.2005.132
Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: AAAI, pp 1145–1152
Dimjašević M, Atzeni S, Ugrina I, Rakamaric Z (2016) Evaluation of android malware detection based on system calls. In: Proceedings of the 2016 ACM on international workshop on security and privacy analytics. ACM, pp 1–8
Fan M, Liu J, Wang W, Li H, Tian Z, Liu T (2017) Dapasa: detecting android piggybacked apps through sensitive subgraph analysis. IEEE Trans Inf Forensics Secur 12(8):1772–1785. https://doi.org/10.1109/TIFS.2017.2687880
Gascon H, Yamaguchi F, Arp D, Rieck K (2013) Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM workshop on artificial intelligence and security, AISec ’13, pp 45–54. ACM, New York, NY, USA. https://doi.org/10.1145/2517312.2517315
Gharib A, Ghorbani A (2017) DNA-Droid: a real-time android ransomware detection framework. Springer, Cham, pp 184–198. https://doi.org/10.1007/978-3-319-64701-2-14
Hashemi H, Azmoodeh A, Hamzeh A, Hashemi S (2017) Graph embedding as a new approach for unknown malware detection. J Comput Virol Hack Tech 13(3):153–166. https://doi.org/10.1007/s11416-016-0278-y
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5:01–11. https://doi.org/10.5121/ijdkp.2015.5201
Hou S, Saas A, Ye Y, Chen L (2016) Droiddelver: An android malware detection system using deep belief network based on api call blocks. In: International conference on Web-age information management. Springer, Cham, pp 54–66
Hyperas: A very simple wrapper for convenient hyperparameter optimization. https://github.com/maxpumperla/hyperas. Online; Accessed 10 May 2018
Kadir A.F.A, Stakhanova N, Ghorbani A.A (2015) Android botnets: what urls are telling us. In: International conference on network and system security. Springer, Cham, pp 78–91
Keras (2017) A simplified interface to TensorFlow. https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html. Online; Accessed 7 Oct 2017
Kinable J, Kostakis O (2011) Malware classification based on call graph clustering. J Comput Virol 7(4):233–245. https://doi.org/10.1007/s11416-011-0151-y
Li L, Gao J, Hurier M, Kong P, Bissyandé T.F, Bartel A, Klein J, Le Traon Y (2017) Androzoo++: collecting millions of android apps and their metadata for the research community. ArXiv e-prints
Li Y, Jang J, Hu X, Ou X (2017) Android malware clustering through malicious payload mining. CoRR arXiv:1707.04795
Mariconti E, Onwuzurike L, Andriotis P, Cristofaro ED, Ross GJ, Stringhini G (2016) Mamadroid: detecting android malware by building markov chains of behavioral models. CoRR arXiv:1612.04433
Martinelli F, Marulli F, Mercaldo F Evaluating convolutional neural network for effective mobile malware detection. Procedia Comput Sci 112: 2372 – 2381 (2017). https://doi.org/10.1016/j.procs.2017.08.216. http://www.sciencedirect.com/science/article/pii/S1877050917316204. Knowledge-based and intelligent information & engineering systems: proceedings of the 21st international conference, KES-20176-8 September 2017, Marseille, France
Martín A, Fuentes-Hurtado F, Naranjo V, Camacho D (2017) Evolving deep neural networks architectures for android malware classification. In: 2017 IEEE Congress on evolutionary computation (CEC). IEEE, pp 1659–1666
McLaughlin N, Martinez del Rincon J, Kang B, Yerima S, Miller P, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A, Joon Ahn G (2017) Deep android malware detection. In: Proceedings of the seventh ACM on conference on data and application security and privacy, CODASPY ’17, pp 301–308. ACM, New York, NY, USA. https://doi.org/10.1145/3029806.3029823
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado G.S, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081–1088
Nauman M, Tanveer TA, Khan S, Syed TA (2017) Deep neural architectures for large scale android malware analysis. Cluster Comput. https://doi.org/10.1007/s10586-017-0944-y
Nix R, Zhang J (2017) Classification of android apps and malware using deep neural networks. In: 2017 International joint conference on neural networks (IJCNN), pp 1871–1878. https://doi.org/10.1109/IJCNN.2017.7966078
Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1105–1114
Pektaş A, Acarman T (2014) A dynamic malware analyzer against virtual machine aware malicious software. Secur Commun Netw 7(12):2245–2257
Pektas A, Acarman T (2017) Malware classification based on api calls and behavior analysis. IET Inf Secur. https://doi.org/10.1049/iet-ifs.2017.0430
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 701–710
Rhode M, Burnap P, Jones K (2017) Early stage malware prediction using recurrent neural networks. CoRR arXiv:1708.03513
Ryder BG (1979) Constructing the call graph of a program. IEEE Trans Softw Eng 5(3):216–226. https://doi.org/10.1109/TSE.1979.234183
Shen F, Vecchio JD, Mohaisen A, Ko SY, Ziarek L (2017) Android malware detection using complex-flows. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2430–2437. https://doi.org/10.1109/ICDCS.2017.190
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Symantec: Internet Security Threat Report (2017) https://www.symantHrBec.com/content-/dam/symantec/docs/reports/istr-21-2016-en.pdfHrB
Tam K, Feizollah A, Anuar NB, Salleh R, Cavallaro L (2017) The evolution of android malware and android analysis techniques. ACM Comput Surv 49(4):76:1–76:41. https://doi.org/10.1145/3017427
Tian K, Yao DD, Ryder BG, Tan G, Peng G (2017) Detection of repackaged android malware with code-heterogeneity features. IEEE Trans Dependable Secure Comput PP(99):1. https://doi.org/10.1109/TDSC.2017.2745575
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1225–1234
Wei F, Li Y, Roy S, Ou X, Zhou W (2017) Deep ground truth analysis of current android malware. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, Cham, pp 252–276
Wüchner T, Ochoa M, Pretschner A (2015) Robust and effective malware detection through quantitative data flow graph metrics. CoRR arXiv:1502.01609
Wu B, Liu Y, Lang B, Huang L (2017) Dgcnn: Disordered graph convolutional neural network based on the gaussian mixture model. arXiv preprint arXiv:1712.03563
Xiao X, Wang Z, Li Q, Xia S, Jiang Y (2017) Back-propagation neural network on markov chains from system call sequences: a new approach for detecting android malware with system call sequences. IET Inf Secur 11(1):8–15. https://doi.org/10.1049/iet-ifs.2015.0211
Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah A.K (2017) Android malware detection based on system call sequences and lstm. Multimed Tools Appl 1–21
Xu X, Liu C, Feng Q, Yin H, Song L, Song D (2017) Neural network-based graph embedding for cross-platform binary code similarity detection. CoRR arXiv:1708.06525
Xu L, Zhang D, Alvarez MA, Morales JA, Ma X, Cavazos J (2016) Dynamic android malware classification using graph-based representations. In: 2016 IEEE 3rd international conference on cyber security and cloud computing (CSCloud), pp 220–231. https://doi.org/10.1109/CSCloud.2016.27
Yang C, Xu Z, Gu G, Yegneswaran V, Porras P (2014) Droidminer: automated mining and characterization of fine-grained malicious behaviors in android applications. In: Kutyłowski M, Vaidya J (eds) Computer security - ESORICS 2014. Springer, Cham, pp 163–182
Yousefi-Azar M, Varadharajan V, Hamey L, Tupakula U (2017) Autoencoder-based feature learning for cyber security applications. In: 2017 International joint conference on neural networks (IJCNN), pp 3854–3861. https://doi.org/10.1109/IJCNN.2017.7966342
Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123
Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. Proc VLDB Endow 2(1):25–36. https://doi.org/10.14778/1687627.1687631
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pektaş, A., Acarman, T. Deep learning for effective Android malware detection using API call graph embeddings. Soft Comput 24, 1027–1043 (2020). https://doi.org/10.1007/s00500-019-03940-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-03940-5