Deep neural architectures for large scale android malware analysis

Nauman, Mohammad; Tanveer, Tamleek Ali; Khan, Sohail; Syed, Toqeer Ali

doi:10.1007/s10586-017-0944-y

Deep neural architectures for large scale android malware analysis

Published: 03 June 2017

Volume 21, pages 569–588, (2018)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Mohammad Nauman ORCID: orcid.org/0000-0003-0941-2549¹,
Tamleek Ali Tanveer²,
Sohail Khan³ &
…
Toqeer Ali Syed⁴

1169 Accesses
28 Citations
2 Altmetric
Explore all metrics

Abstract

Android is arguably the most widely used mobile operating system in the world. Due to its widespead use and huge user base, it has attracted a lot of attention from the unsavory crowd of malware writers. Traditionally, techniques to counter such malicious software involved manually analyzing code and figuring out whether it was malicious or benign. However, due to the immense pace at which newer malware families are surfacing, such an approach is no longer feasible. Machine learning offers a way to tackle this issue of speed by automating the classification task. While several efforts have been made to use traditional machine learning techniques to Android malware detection, no reasonable effort has been made to utilize the newer, deep learning models in this domain. In this paper, we apply several deep learning models including fully connected, convolutional and recurrent neural networks as well as autoencoders and deep belief networks to detect Android malware from a large scale dataset of more than 55 GBs of Android malware. Further, we apply Bayesian machine learning to this problem domain to see how it fares with the deep learning based models while also providing insights into the dataset. We show that we are able to achieve better results using these models as compared to the state-of-the-art approaches. Our best model gets an F1 score of 0.986 with an AUC of 0.983 as compared to the existing best F1 score of 0.875 and AUC of 0.953.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A review on the long short-term memory model

Article 13 May 2020

Notes

Not to be confused by the basic, well known machine learning model of Naïve Bayes.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/
Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K., Siemens, C.: Drebin: effective and explainable detection of android malware in your pocket. In: Proceedings of the Annual Symposium on Network and Distributed System Security (NDSS) (2014)
Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.: FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In: ACM SIGPLAN Notices, vol. 49, pp. 259–269, ACM (2014)
Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2012)
MATH Google Scholar
Barrera, D., Van Oorschot, P.: Secure software installation on smartphones. Secur. Priv. IEEE 9(3), 42–48 (2011)
Article Google Scholar
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., Bouchard, N., Warde-Farley, D., Bengio, Y.: Theano: new features and speed improvements. arXiv preprint arXiv:1211.5590 (2012)
Bedini, A.: HDF5 for Python. http://www.h5py.org
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Biswas, A., Shapiro, V.: Approximate distance fields with non-vanishing gradients. Graph. Models 66(3), 133–159 (2004)
Article MATH Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer (2010)
Box, G.E., Tiao, G.C.: Bayesian Inference in Statistical Analysis, vol. 40. Wiley, New York (2011)
MATH Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)
Chollet, F.: Keras: deep learning library for theano and tensorflow. (2015)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated feedback recurrent neural networks. arXiv preprint arXiv:1502.02367 (2015)
Dash, S.K., Suarez-Tangil, G., Khan, S., Tam, K., Ahmadi, M., Kinder, J., Cavallaro, L.: Droidscribe: classifying android malware based on runtime behavior. Mob. Secur. Technol. (MoST 2016) 7148, 1–12 (2016)
Date, P., Hendler, J.A., Carothers, C.D.: Design index for deep neural networks. Proc. Comput. Sci. 88, 131–138 (2016)
Article Google Scholar
Davis, B., Chen, H.: RetroSkeleton: retrofitting android apps. In: Proceedings of the 11th International Conference on Mobile Systems, Applications and Services (MobiSys’13), pp. 25–28 (2013)
Enck, W., Gilbert, P., Chun, B.G., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.N.: TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10), pp. 1–6 (2010)
Enck, W., Ongtang, M., McDaniel, P.: On lightweight mobile phone application certification. In: Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09), pp. 235–245. ACM (2009)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
MathSciNet MATH Google Scholar
Franzke, B., Kosko, B.: Using noise to speed up markov chain monte carlo estimation. Proc. Comput. Sci. 53, 113–120 (2015)
Article Google Scholar
Fuchs, A., Chaudhuri, A., Foster, J.: SCanDroid: automated security certification of Android applications. Technical reports (2009)
Funahashi, K.I., Nakamura, Y.: Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw. 6(6), 801–806 (1993)
Article Google Scholar
Garcia, J., Hammad, M., Pedrood, B., Bagheri-Khaligh, A., Malek, S.: Obfuscation-resilient, efficient, and accurate detection and family identification of android malware. George Mason University, Technical reports (2015)
GData: Mobile malware report: Q2/2015. https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/G_DATA_MobileMWR_Q2_2015_EN.pdf. Accessed 15 July 2016
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, vol. 2. Taylor & Francis, New York (2014)
MATH Google Scholar
Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Article MathSciNet MATH Google Scholar
Hernández-Lobato, J.M., Adams, R.P.: Probabilistic backpropagation for scalable learning of bayesian neural networks. arXiv preprint arXiv:1502.05336 (2015)
Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)
Google Scholar
Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The wake-sleep algorithm for unsupervised neural networks. Science 268(5214), 1158 (1995)
Article Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Homan, M.D., Gelman, A.: The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)
MathSciNet MATH Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)
Jolliffe, I.: Principal Component Analysis. Wiley Online Library (2002)
Karakida, R., Okada, M., Amari, S.I.: Dynamical analysis of contrastive divergence learning. Neural Netw. 79, 78–87 (2016)
Article Google Scholar
Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, New York (2001)
MATH Google Scholar
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. The Handb. Brain Theory Neural Netw. 3361(10), 1995 (1995)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Long, M., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. arXiv preprint arXiv:1605.06636 (2016)
Mansfield-Devine, S.: Android architecture: attacking the weak points. Netw. Secur. 2012(10), 5–12 (2012)
Article Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Ongtang, M., McLaughlin, S., Enck, W., McDaniel, P.: Semantically rich application-centric security in android. In: Proceedings of the Annual Computer Security Applications Conference (ACSAC’09), pp. 340–349. IEEE (2009)
Patil, A., Huard, D., Fonnesbeck, C.J.: PyMC: Bayesian stochastic modelling in python. J. Stat. Softw. 35(4), 1 (2010)
Article Google Scholar
Peng, H., Gates, C., Sarma, B., Li, N., Qi, Y., Potharaju, R., Nita-Rotaru, C., Molloy, I.: Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 241–252. ACM (2012)
Powell, M.J.: A fast algorithm for nonlinearly constrained optimization calculations. In: Numerical analysis, pp. 144–157. Springer (1978)
Salakhutdinov, R., Murray, I.: On the quantitative analysis of deep belief networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 872–879. ACM (2008)
Sarma, B.P., Li, N., Gates, C., Potharaju, R., Nita-Rotaru, C., Molloy, I.: Android permissions: a perspective combining risks and benefits. In: Proceedings of the 17th ACM Symposium on Access Control Models and Technologies, pp. 13–22. ACM (2012)
Sermanet, P., Frome, A., Real, E.: Attention for fine-grained categorization. arXiv preprint arXiv:1412.7054 (2014)
Shabtai, A., Fledel, Y., Elovici, Y.: Securing android-powered mobile devices using selinux. Secur. Priv. IEEE 8(3), 36–44 (2010)
Article Google Scholar
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Article MATH Google Scholar
Symantec: Internet security threat report, volume 20. https://www.symantec.com/security_response/publications/threatreport.jsp Accessed 15 July 2016
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261 (2016)
Tripp, O., Rubin, J.: A bayesian approach to privacy enforcement in smartphones. In: USENIX Security (2014)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)
VXShare: VirusShare. . https://virusshare.com Accessed 3 Jan 2017
Yan, L.K., Yin, H.: DroidScope: Seamlessly reconstructing the os and dalvik semantic views for dynamic android malware analysis. In: USENIX security symposium, pp. 569–584 (2012)
Yang, Z., Hu, Z., Deng, Y., Dyer, C., Smola, A.: Neural machine translation with recurrent attention modeling. arXiv preprint arXiv:1607.05108 (2016)
Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: Security and Privacy (SP), 2012 IEEE Symposium on, pp. 95–109. IEEE (2012)

Download references

Acknowledgements

We would like to thank the maintainers of Drebin [2] the VirshShare site [58] for making their datasets available to us.The computation-intensive MCMC sampling and neural network training were made possible by the generous contribution of the Tesla K40c GPU by NVIDIA Corporation. The content of this paper is not necessarily endorsed by any of the funding agencies.

Author information

Authors and Affiliations

National University of Computer and Emerging Sciences, 160 Industrial Estate, Jamrud Road, Peshawar, 25000, Pakistan
Mohammad Nauman
Institute of Management Sciences, Peshawar, 25000, Pakistan
Tamleek Ali Tanveer
Deanship of Preparatory Year and Supporting Studies, Computer Science Department, Imam Abdulrahman Bin Faisal University, Dammam, Kingdom of Saudi Arabia
Sohail Khan
Faculty of Computer and Information System, Islamic University of Madinah, Madinah, Kingdom of Saudi Arabia
Toqeer Ali Syed

Authors

Mohammad Nauman
View author publications
You can also search for this author in PubMed Google Scholar
Tamleek Ali Tanveer
View author publications
You can also search for this author in PubMed Google Scholar
Sohail Khan
View author publications
You can also search for this author in PubMed Google Scholar
Toqeer Ali Syed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Nauman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nauman, M., Tanveer, T.A., Khan, S. et al. Deep neural architectures for large scale android malware analysis. Cluster Comput 21, 569–588 (2018). https://doi.org/10.1007/s10586-017-0944-y

Download citation

Received: 30 January 2017
Revised: 16 May 2017
Accepted: 22 May 2017
Published: 03 June 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10586-017-0944-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep neural architectures for large scale android malware analysis

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep neural architectures for large scale android malware analysis

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation