Skip to main content
Log in

Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Android security incidents occurred frequently in recent years. To improve the accuracy and efficiency of large-scale Android malware detection, in this work, we propose a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN). First, to improve the accuracy of malware detection, we reconstruct the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware. In the serial convolutional neural network architecture (CNN-S), we use Relu, a non-linear function, as the activation function to increase sparseness and “dropout” to prevent over-fitting. The convolutional layer and pooling layer are combined with the full-connection layer to enhance feature extraction capability. Under these conditions, CNN-S shows powerful ability in feature extraction and malware detection. Second, to reduce the training time, we use deep autoencoder as a pre-training method of CNN. With the combination, deep autoencoder and CNN model (DAE-CNN) can learn more flexible patterns in a short time. We conduct experiments on 10,000 benign apps and 13,000 malicious apps. CNN-S demonstrates a significant improvement compared with traditional machine learning methods in Android malware detection. In details, compared with SVM, the accuracy with the CNN-S model is improved by 5%, while the training time using DAE-CNN model is reduced by 83% compared with CNN-S model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Amos B, Turner H, White J (2013) Applying machine learning classifiers to dynamic android malware detection at scale. In: 9th international wireless communications and mobile computing conference (IWCMC), July 1–5, 2013, Sardinia, Italy, pp 1666–1671

  • Atat R, Liu L, Chen H, Wu J, Li H, Yi Y (2017) Enabling cyber-physical communication in 5G cellular networks: challenges, spatial spectrum sensing, and cyber-security. IET Cyber-Phys Syst Theory Appl 2(1):49–54

    Article  Google Scholar 

  • Bengio Y (2009) Learning deep architectures for AI. Foundations & Trends®. Mach Learn 2(1):1–127

    Article  MathSciNet  MATH  Google Scholar 

  • Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst 28(10):2294–2305

    Article  MathSciNet  Google Scholar 

  • Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017a) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197

    Article  Google Scholar 

  • Chang X, Yu YL, Yang Y, Xing EP (2017b) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632

    Article  Google Scholar 

  • Chen X, Li J, Huang X, Ma J, Lou W (2015) New publicly verifiable databases with efficient updates. IEEE Trans Dependable Secure Comput 12(5):546–556

    Article  Google Scholar 

  • China Internet Security (2016) Research report. http://zt.360.cn/1101061855.php?dtid=1101061451&did=490301065. Accessed Dec 2017

  • Enck W, Gilbert P, Han S, Tendulkar V, Chun B, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. Acm Trans Comput Syst (TOCS) 32(2):1–29

    Article  Google Scholar 

  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics (AISTATS), April 11–13, 2011, Ft. Lauderdale, FL, USA, pp 315–323

  • Guan X, Wang X, Zhang X (2009) Fast intrusion detection based on a non-negative matrix factorization model. J Netw Comput Appl 32(1):31–44

    Article  Google Scholar 

  • Gupta BB, Agrawal DP, Yamaguchi S (2016) Handbook of research on modern cryptographic solutions for computer and cyber Security, 1st edn. IGI Publishing, Hershey, PA. ISBN:1522501053 9781522501053

    Book  Google Scholar 

  • Hamedani K, Liu L, Atat R, Wu J, Yi Y (2018) Reservoir computing meets smart grids: attack detection using delayed feedback networks. IEEE Trans Ind Inf 14(2):734–743

    Article  Google Scholar 

  • Hinton GE, Zemel RS (1994) Autoencoders, minimum description length and Helmholtz free energy. Adv Neural Inf Process Syst (NIPS) 6:3–10

    Google Scholar 

  • Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. CoRR 1207:0580

    Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2014) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Huang HD, Yu CM, Kao HY (2017) R2-D2: color-inspired convolutional neural network (cnn)-based android malware detections. CoRR 1705:04448

    Google Scholar 

  • Ibtihal M, Driss EO, Hassan N (2017) Homomorphic encryption as a service for outsourced images in mobile cloud computing environment. Int J Cloud Appl Comput 7(2):27–40

    Google Scholar 

  • Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), October 25–29, 2014 Doha, Qatar, pp 1746–1751

  • Klieber W, Flynn L, Bhosale A, Jia L, Bauer L (2014) Android taint flow analysis for app sets. In: Proceedings of the 3rd ACM SIGPLAN international workshop on the state of the art in Java program analysis, June 9–11, 2014, Edinburgh, United Kingdom, pp 1–6

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Lee K, Choi HO, Min SD, Lee J, Gupta B, Nam Y (2017) A comparative evaluation of atrial fibrillation detection methods in koreans based on optical recordings using a smartphone. IEEE Access 5:11437–11443

    Article  Google Scholar 

  • Li Q, Li X (2015) Android malware detection based on static analysis of characteristic tree. In: 2015 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC), September 17–19, 2015, Xi’an, China, pp 84–91

  • Li P, Li J, Huang Z, Gao CZ, Chen WB, Chen K (2017a) Privacy-preserving outsourced classification in cloud computing. Cluster Comput. https://doi.org/10.1007/s10586-017-0849-9

  • Li Z, Nie F, Chang X, Yang Y (2017b) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110

    Article  Google Scholar 

  • Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018a) Significant permission identification for machine learning based android malware detection. IEEE Trans Industr Inf PP(99):1–1

    Google Scholar 

  • Li Y, Wang G, Nie L, Wang Q (2018b) Distance metric optimization driven convolutional neural network for age invariant face recognition. Pattern Recogn 75C:51–62

    Article  Google Scholar 

  • Liu X, Liu J, Wang W, He Y, Zhang X (2018) Discovering and understanding android sensor usage behaviors with data flow analysis. World Wide Web 21(1):105–126

    Article  Google Scholar 

  • Lu L, Li Z, Wu Z, Lee W, Jiang G (2012) CHEX: statically vetting android apps for component hijacking vulnerabilities. In: ACM conference on computer and communications security, October 16–18, 2012, Raleigh, NC, USA, pp 229–240

  • Memos VA, Psannis KE, Ishibashi Y, Kim BG, Gupta B (2018) An efficient algorithm for media-based surveillance system (EAMSuS) in IoT smart city framework. Future Gener Comput Syst 83:619–628

    Article  Google Scholar 

  • Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, 2010, Haifa, Israel, pp 807–814

  • Nix R, Zhang J (2017) Classification of Android apps and malware using deep neural networks. Int Jt Conf Neural Netw (IJCNN) May 14–19, 2017, Alaska, USA, pp 1871–1878

  • Pandita R, Xiao X, Yang W, Enck W, Xie T (2013) WHYPER: towards automating risk assessment of mobile applications. Usenix Conf Secur 2013:527–542

    Google Scholar 

  • Pedregosa F, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. Mach Learn Res 12(10):2825–2830

    MathSciNet  MATH  Google Scholar 

  • Rastogi S, Bhushan K, Gupta BB (2016) Android applications repackaging detection techniques for smartphone devices. Procedia Comput Sci 78:26–32

    Article  Google Scholar 

  • Sarma BP, Li N, Gates C, Potharaju R, Nita-Rotaru C, Molloy I (2012) Android permissions: a perspective combining risks and benefits. In: ACM symposium on access control models and technologies. June 21–23, 2017, Indianapolis, IN, USA, pp 13–22

  • Shabtai A, Fledel Y, Elovici Y (2010) Automated static code analysis for classifying android applications using machine learning. In: 2010 international conference on computational intelligence and security (CIS), December 11–14, 2010, Nanning, Guangxi, China, pp 329–333

  • Shen C, Chen Y, Guan X, Maxion Y (2017) Pattern-growth based mining mouse-interaction behavior for an active user authentication system. IEEE Trans Dependable Secur Comput PP(99):1–1

    Article  Google Scholar 

  • Shen J, Zhou T, Chen X, Li J, Susilo W (2018a) Anonymous and traceable group data sharing in cloud computing. IEEE Trans Inf Forensics Secur 13(4):912–925

    Article  Google Scholar 

  • Shen C, Li Y, Chen Y, Guan X, Maxion R (2018b) Performance analysis of multi-motion sensor behavior for active smartphone authentication. IEEE Trans Inf Forensics Secur 13(1):48–62

    Article  Google Scholar 

  • Shen J, Gui Z, Ji S, Shen J, Tan H, Tang Y (2018c) Cloud-aided lightweight certificateless authentication protocol with anonymity for wireless body area networks. Netw Comput Appl 106:117–123

    Article  Google Scholar 

  • Shen C, Chen Y, Guan X (2018d) Performance evaluation of implicit smartphones authentication via sensor-behavior analysis. Inf Sci 430:538–553

    Article  Google Scholar 

  • Wang W, Battiti R (2006) Identifying intrusions in computer networks with principal component analysis. ARES 2006:270–279

    Google Scholar 

  • Wang W, Guan X, Zhang X (2008) Processing of massive audit data streams for real-time anomaly intrusion detection. Comput Commun 31(1):58–72

    Article  Google Scholar 

  • Wang W, Guyet T, Quiniou R, Cordier M, Masseglia F, Zhang X (2014a) Autonomic intrusion detection: adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowl Based Syst 70:103–117

    Article  Google Scholar 

  • Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014b) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forensics Secur 9(11):1869–1882

    Article  Google Scholar 

  • Wang W, He Y, Liu J, Gombault S (2015) Constructing important features from massive network traffic for lightweight intrusion detection. IET Inf Secur 9(6):374–379

    Article  Google Scholar 

  • Wang X, Wang W, He Y, Liu J, Han Z, Zhang X (2017) Characterizing android apps’ behavior for effective detection of malapps at large scale. Future Gener Comput Syst 75:30–45

    Article  Google Scholar 

  • Wang Y, Zhu G, Shi Y (2018a) Transportation spherical watermarking. IEEE Trans Image Process 27(4):2063–2077

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Wang W, Cui Z, Zhou X, Zhao J, Li Y (2018b) A new dynamic firefly algorithm for demand estimation of water resources. Inf Sci 438:95–106

    Article  MathSciNet  Google Scholar 

  • Wang W, Li Y, Wang X, Liu J, Zhang X (2018c) Detecting android malicious apps and categorizing benign apps with ensemble of classifiers. Future Gener Comput Syst 78:987–994

    Article  Google Scholar 

  • Wang W, Liu J, Pitsilis G, Zhang X (2018d) Abstracting massive data for lightweight intrusion detection in computer networks. Inf Sci 433–434:417–430

    Article  MathSciNet  Google Scholar 

  • Wu WC, Hung SH (2014) DroidDolphin: a dynamic android malware detection framework using big data and machine learning. In: Proceedings of the 2014 conference on research in adaptive and convergent systems (RACS), pp 247–252

  • Wu J, Guo S, Li J, Zeng D (2016a) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887

    Article  Google Scholar 

  • Wu J, Guo S, Li J, Zeng D (2016b) Big data meet green challenges: big data toward green applications. IEEE Syst J 10(3):888–900

    Article  Google Scholar 

  • Xie D, Lai X, Lei X, Fan L (2018) Cognitive multiuser energy harvesting decode-and-forward relaying system with direct links. IEEE Access 6:5596–5606

    Article  Google Scholar 

  • Yerima SY, Sezer S, McWilliams G (2014) Analysis of Bayesian classification-based approaches for android malware detection. IET Inf Secur 8(1):25–36

    Article  Google Scholar 

  • Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123

    Article  Google Scholar 

  • Zhang C, Liu C, Zhang X Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150

    Article  Google Scholar 

  • Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: IEEE symposium on security and privacy (SP), May 20–23, 2012, San Francisco, USA, pp 95–109

Download references

Acknowledgements

The work reported in this paper was supported in part by National Key R&D Program of China, under Grant 2017YFB0802805, in part by Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, under Grant AGK2015002, in part by ZTE Corporation Foundation, under Grant K17L00190, in part by funds of Science and Technology on Electronic Information Control Laboratory, under Grant K16GY00040, in part by the Fundamental Research funds for the central Universities of China, under Grant K17JB00060 and K17JB00020, and in part by Natural Science Foundation of China, under Grant U1736114 and 61672092.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Zhao, M. & Wang, J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Human Comput 10, 3035–3043 (2019). https://doi.org/10.1007/s12652-018-0803-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0803-6

Keywords

Navigation