Skip to main content
Log in

Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Recognizing malware before its installation plays a crucial role in keeping an android device safe. In this paper we describe a supervised method that is able to analyse multiple information (e.g. permissions, api calls and network addresses) that can be retrieved through a broad static analysis of android applications. In particular, we propose a novel multi-view machine learning approach to malware detection, which couples knowledge extracted via both clustering and classification. In an assessment, we evaluate the effectiveness of the proposed method using benchmark Android applications and established machine learning metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://gs.statcounter.com/os-market-share/mobile/worldwide

  2. By changing the static analyser the method can be generalised to any number of views.

  3. The advantage of accounting for aggregate information in Equation 7 is validated empirically in the application scenario of this study (see the results in Section 4.4.1).

  4. The impact of the parameter ρ on the fall-out and sensitivity is investigated in the application scenario of this study (see the results in Section 4.4.1).

  5. https://www.sec.cs.tu-bs.de/~danarp/drebin/

  6. The ANOVA analysis leaves-out competitor Drebin, as (Arp et al. 2014) reports only the sensitivity and fall-out of Drebin averaged on the ten trials of the experiment (an no result on AUC), while the ANOVA analysis is done on the the series of results collected on the ten splits.

References

  • Alam, M. S., & Vuong, S. T. (2013). Random forest classification for detecting android malware. In Proceedings of the 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669.

  • Alzaylaee, M., Yerima, S., & Sezer, S. (2017). Improving dynamic analysis of android apps using hybrid test input generation. In International Conference on Cyber Security and Protection of Digital Services (Cyber Security 2017): Proceedings, pp. 1–8. IEEE, DOI https://doi.org/10.1109/CyberSecPODS.2017.8074845, (to appear in print).

  • Alzaylaee, M. K., Yerima, S. Y., & Sezer, S. (2020). Dl-droid: Deep learning based android malware detection using real devices. Computers & Security, 89(101), 663. https://doi.org/10.1016/j.cose.2019.101663.

    Article  Google Scholar 

  • Andresini, G., Appice, A., & Malerba, D. (2020). Dealing with Class Imbalance in Android Malware Detection by Cascading Clustering and Classification, pp. 173–187. Springer International Publishing: Cham, Switzerland.

  • Appice, A., Guccione, P., & Malerba, D. (2017). A novel spectral-spatial co-training algorithm for the transductive classification of hyperspectral imagery data. Pattern Recognition, 63, 229–245.

    Article  Google Scholar 

  • Appice, A., & Malerba, D. (2016). A co-training strategy for multiple view clustering in process mining. IEEE Trans. Services Computing, 9(6), 832–845.

    Article  Google Scholar 

  • Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., & Rieck, K. (2014). DREBIN : Effective and explainable detection of android malware in your pocket. In Proceedings of the 21st Annual Network and Distributed System Security Symposium. The Internet Society.

  • Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics.

  • Bai, J., & Wang, J. (2016). Improving malware detection using multi-view ensemble learning. Security and Communication Networks, 9(17), 4227–4241.

    Article  Google Scholar 

  • Bhatia, T., & Kaushal, R. (2017). Malware detection in android based on dynamic analysis. In Proceedings of the 2017 International Conference on Cyber Security And Protection Of Digital Services (Cyber Security), pp. 1–6.

  • Bholowalia, P., & Kumar, A. (2014). Article: ebk-means: A clustering technique based on elbow method and k-means in wsn. International Journal of Computer Applications, 105(9), 17–24.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning. 45(1), pp 5–32.

  • Ceci, M., Appice, A., Viktor, H. L., Malerba, D., Paquet, E., & Guo, H. (2012). Transductive relational classification in the co-training paradigm. In Perner, P. (Ed.) Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition, LNCS, vol. 7376, pp. 11–25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_2.

  • Demontis, A., Melis, M., Biggio, B., Maiorca, D., Arp, D., Rieck, K., Corona, I., Giacinto, G., & Roli, F. (2017). Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection. IEEE Transactions on Dependable and Secure Computing. PP. https://doi.org/10.1109/TDSC.2017.2700270.

  • Fan, M., Liu, J., Wang, W., Li, H., Tian, Z., & Liu, T. (2017). Dapasa: Detecting android piggybacked apps through sensitive subgraph analysis. IEEE Transactions on Information Forensics and Security, 12(8), 1772–1785. https://doi.org/10.1109/TIFS.2017.2687880.

    Article  Google Scholar 

  • Fernȧndez, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., & Herrera, F. (2018). Learning from Imbalanced Data Sets Springer.

  • Folino, G., & Pisani, F. (2016). Evolving meta-ensemble of classifiers for handling incomplete and unbalanced datasets in the cyber security domain. Applied Soft Computing, 47, 179–190.

    Article  Google Scholar 

  • Garcia-Ceja, E., Galván-Tejada, C. E., & Brena, R. (2018). Multi-view stacking for activity recognition with sound and accelerometer data. Information Fusion, 40, 45–56.

    Article  Google Scholar 

  • Goyal, R., Spognardi, A., Dragoni, N., & Argyriou, M. (2016). Safedroid: a distributed malware detection service for android. In Proceedings of the 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA), pp. 59–66.

  • Guo, S., Yuan, Q., Lin, F., Wang, F., & Ban, T. (2010). A malware detection algorithm based on multi-view fusion. In Wong, K.w., Mendis, B.S.U., & Bouzerdoum, A. (Eds.) Neural Information Processing. Models and Applications, pp. 259–266. Springer.

  • Idrees, F., & Rajarajan, M. (2014). Investigating the android intents and permissions for malware detection. In Proceedings of the IEEE 10th International Conference on Wireless and Mobile Computing, Networking and Communications, pp. 354–358.

  • Kang, B., Yerima, S. Y., Mclaughlin, K., & Sezer, S. (2016). N-opcode analysis for android malware classification and categorization. In 2016 International conference on cyber security and protection of digital services (cyber security), pp. 1–7.

  • Kapratwar, A., Troia, F., & Stamp, M. (2017). Static and dynamic analysis of android malware. In Proceedings of the 3rd International Conference on Information Systems Security and Privacy, pp. 653–662. SCITEPRESS.

  • Khorshidpour, Z., Hashemi, S., & Hamzeh, A. (2017). Evaluation of random forest classifier in security domain. Applied Intelligence, 47(2), 558–569. https://doi.org/10.1007/s10489-017-0907-2.

    Article  Google Scholar 

  • Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.

    Article  Google Scholar 

  • Kumar, V. (2015). Multi-view ensemble learning using optimal feature set partitioning: An extended experiments and analysis in low dimensional scenario. Procedia Computer Science, 58, 499–506. Second International Symposium on Computer Vision and the Internet.

    Article  Google Scholar 

  • Last, M. (2016). Multi-target classification: Methodology and practical case studies. In Berendt, B., Bringmann, B., Fromont, É., Garriga, G.C., Miettinen, P., Tatti, N., & Tresp, V. (Eds.) Proceedings of the Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Part III, LNCS, vol. 9853, pp. 280–283. Springer.

  • Li, Y., Shen, T., Sun, X., Pan, X., & Mao, B. (2015). Detection, classification and characterization of android malware using api data dependency. In Thuraisingham, B., Wang, X., & Yegneswaran, V. (Eds.) Proceedings of the Security and Privacy in Communication Networks, pp. 23–40. Springer.

  • Lin, W., Wu, Z., Lin, L., Wen, A., & Li, J. (2017). An ensemble random forest algorithm for insurance big data analysis. IEEE Access, 5(16), 568–16,575.

    Google Scholar 

  • Madjarov, G., Kocev, D., Gjorgjevikj, D., & Džeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45 (9), 3084–3104.

    Article  Google Scholar 

  • Miller, S. T., & Busby-Earle, C. (2017). Multi-perspective machine learning a classifier ensemble method for intrusion detection. In Proceedings of the 2017 International Conference on Machine Learning and Soft Computing, ICMLSC ’17, pp. 7–12. ACM, DOI https://doi.org/10.1145/3036290.3036303, (to appear in print).

  • Milosevic, N., Dehghantanha, A., & Choo, K. K. R. (2017). Machine learning aided android malware classification. Computers and Electrical Engineering, 61, 266–274.

    Article  Google Scholar 

  • Narayanan, A., Chandramohan, M., Chen, L., & Liu, Y. (2018). A multi-view context-aware approach to android malware detection and malicious code localization. Empirical Software Engineering, 23(3), 1222–1274. https://doi.org/10.1007/s10664-017-9539-8.

    Article  Google Scholar 

  • Narayanan, A., Soh, C., Chen, L., Liu, Y., & Wang, L. (2018). Apk2vec: Semi-supervised multi-view representation learning for profiling android applications. In IEEE International conference on data mining, ICDM 2018, singapore, november 17-20, 2018, pp. 357–366. IEEE computer society, DOI https://doi.org/10.1109/ICDM.2018.00051, (to appear in print).

  • Nguyen-Vu, L., Ahn, J., & Jung, S. (2019). Android fragmentation in malware detection. Computers & Security, 87 (101), 573. https://doi.org/10.1016/j.cose.2019.101573.

    Article  Google Scholar 

  • NOKIA. (2019). Nokia threat intelligence report – 2019. White paper, online at https://pages.nokia.com/T003B6-Threat-Intelligence-Report-2019.html.

  • Painter, N., & Kadhiwala, B. (2017). Comparative analysis of android malware detection techniques. In Satapathy, S.C., Bhateja, V., & Joshi, A. (Eds.) Proceedings of the International Conference on Data Engineering and Communication Technology, pp. 131–139. Springer.

  • Papagiannopoulou, C., Tsoumakas, G., & Tsamardinos, I. (2015). Discovering and exploiting deterministic label relationships in multi-label learning. In Cao, L., Zhang, C., Joachims, T., Webb, G.I., Margineantu, D.D., & Williams, G. (Eds.) Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 915–924. ACM.

  • Peiravian, N., & Zhu, X. (2013). Machine learning for android malware detection using permission and api calls. In Proceedings of the IEEE 25th International Conference on Tools with Artificial Intelligence, pp. 300–305.

  • Rovelli, P., & Vigfússon, Ý. (2014). Pmds: Permission-based malware detection system. In Prakash, A., & Shyamasundar, R. (Eds.) Proceedings of the Information Systems Security, pp. 338–357. Springer.

  • Roy, S., DeLoach, J., Li, Y., Herndon, N., Caragea, D., Ou, X., Ranganath, V. P., Li, H., & Guevara, N. (2015). Experimental study with real-world data for android app security analysis using machine learning. In Proceedings of the 31st Annual Computer Security Applications Conference, ACSAC 2015, pp. 81–90.

  • Sheen, S., Anitha, R., & Natarajan, V. (2015). Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing, 151, 905–912.

    Article  Google Scholar 

  • Shiqi, L., Shengwei, T., Long, Y., Jiong, Y., & Hua, S. (2018). Android malicious code classification using deep belief network. KSII Transactions on Internet and Information Systems, 12, 454–475. https://doi.org/10.3837/tiis.2018.01.022.

    Article  Google Scholar 

  • Suarez-Tangil, G., Dash, S. K., Ahmadi, M., Kinder, J., Giacinto, G., & Cavallaro, L. (2017). Droidsieve: Fast and accurate classification of obfuscated android malware. In Proceedings of the 7th ACM on Conference on Data and Application Security and Privacy, CODASPY 2017, pp. 309–320.

  • Sun, S., Mao, L., Dong, Z., & Wu, L. (2019). Multiview Deep Learning, (pp. 105–138). Singapore: Springer Singapore.

    Book  Google Scholar 

  • Taheri, R., Javidan, R., Shojafar, M., Pooranian, Z., Miri, A., & Conti, M. (2019). On defending against label flipping attacks on malware detection systems. 1908.04473.

  • Tajoddin, A., & Abadi, M. (2019). Ramd: registry-based anomaly malware detection using one-class ensemble classifiers Applied Intelligence.

  • Talha, K. A., Alper, D. I., & Aydin, C. (2015). Apk auditor: Permission-based android malware detection system. Digital Investigation, 13, 1–14.

    Article  Google Scholar 

  • Tiwari, P. K., & Singh, U. (2015). Android users security via permission based analysis. In Abawajy, J.H., Mukherjea, S., Thampi, S.M., & Ruiz-Martínez, A. (Eds.) Proceedings of the Security in Computing and Communications, pp. 496–505. Springer.

  • Ucci, D., Aniello, L., & Baldoni, R. (2019). Survey of machine learning techniques for malware analysis. Computers &, Security, 81, 123–147. https://doi.org/10.1016/j.cose.2018.11.001.

    Article  Google Scholar 

  • Valmarska, A., & Miljkovic, D. (2017). Robnik-Šikonja, M., lavrač, N.: Multi-view approach to parkinson’s disease quality of life data analysis. In Appice, A., Ceci, M., Loglisci, C., Masciari, E., & Raś, Z.W. (Eds.) Proceedings of the 2016 New Frontiers in Mining Complex Patterns, Selected papers, pp. 163–178. Springer.

  • Vinayakumar, R., BarathiGanesh, H., Poornachandran, P., AnandKumar, M., & Somank., P. (2018). Deep-net: Deep neural network for cyber security use cases. 1812.03519.

  • Wen, L., & Yu, H. (2017). An android malware detection system based on machine learning. In Proceedings of the AIP Conference, vol. 1864. American Institute of Physics.

  • Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.

    Article  Google Scholar 

  • Yerima, S. Y., Sezer, S., & Muttik, I. (2014). Android malware detection using parallel machine learning classifiers. In Proceedings of the 8th International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 37–42.

  • Yu, J., Wang, M., & Tao, D. (2012). Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Transactions on Image Processing, 21(11), 4636–4648.

    Article  MathSciNet  Google Scholar 

  • Zhang, Y., Huang, Q., Ma, X., Yang, Z., & Jiang, J. (2016). Using multi-features and ensemble learning method for imbalanced malware classification. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 965–973.

  • Zhao, J., Xie, X., Xu, X., & Sun, S. (2017). Multi-view learning overview: Recent progress and new challenges. Information Fusion, 38, 43–54.

    Article  Google Scholar 

  • Zhou, Y., & Jiang, X. (2012). Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, pp. 95–109.

Download references

Acknowledgements

The authors wish to thank Lynn Rudd for her help in reading the manuscript, Dragi Kocev for his help in configuring the parameters, in order to learn the multi-target Random Forests, Daniel Arp for providing the benchmark data used in the empirical study and ReCaS-Bari resource team for providing the infrastructure to run the experiential study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annalisa Appice.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Appice, A., Andresini, G. & Malerba, D. Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection. J Intell Inf Syst 55, 1–26 (2020). https://doi.org/10.1007/s10844-020-00598-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00598-6

Keywords

Navigation