Skip to main content
Log in

Defect Prediction in Android Binary Executables Using Deep Neural Network

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Software defect prediction locates defective code to help developers improve the security of software. However, existing studies on software defect prediction are mostly limited to the source code. Defect prediction for Android binary executables (called apks) has never been explored in previous studies. In this paper, we propose an explorative study of defect prediction in Android apks. We first propose smali2vec, a new approach to generate features that capture the characteristics of smali (decompiled files of apks) files in apks. Smali2vec extracts both token and semantic features of the defective files in apks and such comprehensive features are needed for building accurate prediction models. Then we leverage deep neural network (DNN), which is one of the most common architecture of deep learning networks, to train and build the defect prediction model in order to achieve accuracy. We apply our defect prediction model to more than 90,000 smali files from 50 Android apks and the results show that our model could achieve an AUC (the area under the receiver operating characteristic curve) of 85.98% and it is capable of predicting defects in apks. Furthermore, the DNN is proved to have a better performance than the traditional shallow machine learning algorithms (e.g., support vector machine and naive bayes) used in previous studies. The model has been used in our practical work and helped locate many defective files in apks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Heartbleed, https://en.wikipedia.org/wiki/Heartbleed.

  2. OWASP, https://www.owasp.org/index.php/Category:Vulnerability.

  3. Github, https://github.com.

  4. Xda Forums, Compiling AOSP Standalone apps: https://forum.xda-developers.com/showthread.php?t=1800090.

  5. AVD, Android Vulnerabilities Database, http://android.scap.org.cn/.

  6. CVE, Common Vulnerabilities and Exposures, http://cve.mitre.org/.

  7. F. Dong, S.D. Zhang, S.H. Wang, DNN-based software defect prediction experimental data and code, https://github.com/breezedong/DNN-based-software-defect-prediction.

  8. JesusFreke, smali/baksmali, https://github.com/JesusFreke/smali/wiki.

  9. Dalvik opcodes: http://pallergabor.uw.hu/androidblog/dalvik_opcodes.html.

  10. Android Open Source Project, Dalvik bytecode: https://source.android.com/devices/tech/dalvik/dalvikbytecode.

  11. http://www.antlr.org/.

  12. Google, TensorFlow Wide Deep Learning Tutorial, https://www.tensorflow.org.

References

  1. Bengio, Y. (2009). Learning deep architectures for ai. Foundations & Trends in Machine Learning, 2(1), 1–127.

    Article  MATH  Google Scholar 

  2. Bishnu, P. S., & Bhattacherjee, V. (2012). Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 24(6), 1146–1150.

    Article  Google Scholar 

  3. David, O. E., & Netanyahu, N. S. (2015). Deepsign: Deep learning for automatic malware signature generation and classification. In International Joint Conference on Neural Networks (pp. 1–8).

  4. Deng, L., & Yu, D. (2014). Deep learning: methods and applications. Foundations and Trends® in Signal Processing, 7(3–4), 197–387.

    Article  MathSciNet  MATH  Google Scholar 

  5. Du, Y., Wang, X., & Wang, J. (2015). A static android malicious code detection method based on multisource fusion. Security and Communication Networks, 8(17), 3238–3246.

    Article  Google Scholar 

  6. Dong, S. Z., & Wang, S. (2017). Dnn-based software defect prediction experimental data and code, https://github.com/breezedong/DNN-based-software-defect-prediction. Accessed July 20, 2017.

  7. Ghotra, B., Mcintosh, S., & Hassan, A. E. (2015). Revisiting the impact of classification techniques on the performance of defect prediction models. In IEEE/ACM IEEE International Conference on Software Engineering (pp. 789–800).

  8. Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.

    Article  Google Scholar 

  9. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  10. Jerome, Q., Allix, K., State, R., & Engel, T. (2014). Using opcode-sequences to detect malicious android applications. In IEEE international conference on communications (pp. 914–919).

  11. Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  12. Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.

    Article  Google Scholar 

  13. Ma, Z., Rana, P. K., Taghia, J., Flierl, M., & Leijon, A. (2014). Bayesian estimation of Dirichlet mixture model with variational inference. Pattern Recognition, 47(9), 3143–3157.

    Article  MATH  Google Scholar 

  14. Ma, Z., Tan, Z. H., & Guo, J. (2016). Feature selection for neutral vector in eeg signal classification. Neurocomputing, 174, 937–945.

    Article  Google Scholar 

  15. Ma, Z., Teschendorff, A. E., Leijon, A., Qiao, Y., Zhang, H., & Guo, J. (2015). Variational bayesian matrix factorization for bounded support data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 876–889.

    Article  Google Scholar 

  16. Ma, Z., Xie, J., Li, H., Sun, Q., Si, Z., Zhang, J., et al. (2017). The role of data analysis in the development of intelligent energy networks. IEEE Network, 31(5), 88–95.

    Article  Google Scholar 

  17. Malhotra, R. (2016). An empirical framework for defect prediction using machine learning techniques with Android software. Applied Soft Computing, 49, 1034–1050.

    Article  Google Scholar 

  18. Mclaughlin, N., Rincon, J. M. D., Kang, B. J., Yerima, S., Miller, P., Sezer, S., et al. (2017). Deep android malware detection. In ACM on conference on data and application security and privacy (pp. 301–308).

  19. Mou, L., Li, G., Jin, Z., Zhang, L., & Wang, T. (2014). Tbcnn: A tree-based convolutional neural network for programming language processing. Eprint Arxiv.

  20. Nguyen, V. H., & Le, M. S. T. (2010). Predicting vulnerable software components with dependency graphs. In International workshop on security measurements and metrics (p. 3).

  21. Perl, H., Dechand, S., Smith, M., Arp, D., Yamaguchi, F., Rieck, K., et al. (2015). Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In ACM Sigsac conference on computer and communications security (pp. 426–437).

  22. Prasad, M. C., Florence, L., & Arya, A. (2015). A study on software metrics based software defect prediction using data mining and machine learning techniques. International Journal of Database Theory and Application, 8(3), 179–190.

    Article  Google Scholar 

  23. Scandariato, R., Walden, J., Hovsepyan, A., & Joosen, W. (2014). Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering, 40(10), 993–1006.

    Article  Google Scholar 

  24. Schmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Networks the Official Journal of the International Neural Network Society, 61, 85.

    Article  Google Scholar 

  25. Wang, S., Liu, T., & Tan, L.: Automatically learning semantic features for defect prediction. In IEEE/ACM international conference on software engineering (pp. 297–308).

  26. Xu, P., Yin, Q., Huang, Y., Song, Y. Z., Ma, Z., Wang, L., & Guo, J. (2017). Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval. arXiv preprint arXiv:1705.09888.

  27. Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2013). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters, 21(1), 65–68.

    Article  Google Scholar 

  28. Yuan, Z., Lu, Y., Wang, Z., & Xue, Y. (2014). Droid-sec: Deep learning in android malware detection. ACM Sigcomm Computer Communication Review, 44(4), 371–372.

    Article  Google Scholar 

  29. Zhang, F., Zheng, Q., Zou, Y., & Hassan, A. E. (2016). Cross-project defect prediction using a connectivity-based unsupervised classifier. In IEEE/ACM international conference on software engineering (pp. 309–320).

  30. Zhao, Z., Wang, J., & Bai, J. (2013). Malware detection method based on the control-flow construct feature of software. Iet Information Security, 8(1), 18–24.

    Article  Google Scholar 

  31. Zhao, Z., Wang, J., & Wang, C. (2013). An unknown malware detection scheme based on the features of graph. Security and Communication Networks, 6(2), 239–246.

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China Project (No. 61401038) and the 2016 Frontier and Key Technology Innovation Project of Guangdong Province Science and Technology Department (No. 2016B010110002). It is also supported by the National Key Research and Development Program (2016QY06X1205) and Technology Research and Development Program of Sichuan, China (17ZDYF2583).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfeng Wang.

Appendix: Results of Labelling of the Selected App Projects

Appendix: Results of Labelling of the Selected App Projects

See Table 7.

Table 7 Results of labelling of the selected application projects

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, F., Wang, J., Li, Q. et al. Defect Prediction in Android Binary Executables Using Deep Neural Network. Wireless Pers Commun 102, 2261–2285 (2018). https://doi.org/10.1007/s11277-017-5069-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-017-5069-3

Keywords

Navigation