Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection

Appice, Annalisa; Andresini, Giuseppina; Malerba, Donato

doi:10.1007/s10844-020-00598-6

Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection

Published: 04 May 2020

Volume 55, pages 1–26, (2020)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Annalisa Appice^1,2,
Giuseppina Andresini¹ &
Donato Malerba^1,2

891 Accesses
22 Citations
Explore all metrics

Abstract

Recognizing malware before its installation plays a crucial role in keeping an android device safe. In this paper we describe a supervised method that is able to analyse multiple information (e.g. permissions, api calls and network addresses) that can be retrieved through a broad static analysis of android applications. In particular, we propose a novel multi-view machine learning approach to malware detection, which couples knowledge extracted via both clustering and classification. In an assessment, we evaluate the effectiveness of the proposed method using benchmark Android applications and established machine learning metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

A survey on semi-supervised learning

Article Open access 15 November 2019

The current state and future of mobile security in the light of the recent mobile security threat reports

Article 30 January 2023

Notes

http://gs.statcounter.com/os-market-share/mobile/worldwide
By changing the static analyser the method can be generalised to any number of views.
The advantage of accounting for aggregate information in Equation 7 is validated empirically in the application scenario of this study (see the results in Section 4.4.1).
The impact of the parameter ρ on the fall-out and sensitivity is investigated in the application scenario of this study (see the results in Section 4.4.1).
https://www.sec.cs.tu-bs.de/~danarp/drebin/
The ANOVA analysis leaves-out competitor Drebin, as (Arp et al. 2014) reports only the sensitivity and fall-out of Drebin averaged on the ten trials of the experiment (an no result on AUC), while the ANOVA analysis is done on the the series of results collected on the ten splits.

References

Alam, M. S., & Vuong, S. T. (2013). Random forest classification for detecting android malware. In Proceedings of the 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669.
Alzaylaee, M., Yerima, S., & Sezer, S. (2017). Improving dynamic analysis of android apps using hybrid test input generation. In International Conference on Cyber Security and Protection of Digital Services (Cyber Security 2017): Proceedings, pp. 1–8. IEEE, DOI https://doi.org/10.1109/CyberSecPODS.2017.8074845, (to appear in print).
Alzaylaee, M. K., Yerima, S. Y., & Sezer, S. (2020). Dl-droid: Deep learning based android malware detection using real devices. Computers & Security, 89(101), 663. https://doi.org/10.1016/j.cose.2019.101663.
Article Google Scholar
Andresini, G., Appice, A., & Malerba, D. (2020). Dealing with Class Imbalance in Android Malware Detection by Cascading Clustering and Classification, pp. 173–187. Springer International Publishing: Cham, Switzerland.
Appice, A., Guccione, P., & Malerba, D. (2017). A novel spectral-spatial co-training algorithm for the transductive classification of hyperspectral imagery data. Pattern Recognition, 63, 229–245.
Article Google Scholar
Appice, A., & Malerba, D. (2016). A co-training strategy for multiple view clustering in process mining. IEEE Trans. Services Computing, 9(6), 832–845.
Article Google Scholar
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., & Rieck, K. (2014). DREBIN : Effective and explainable detection of android malware in your pocket. In Proceedings of the 21st Annual Network and Distributed System Security Symposium. The Internet Society.
Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics.
Bai, J., & Wang, J. (2016). Improving malware detection using multi-view ensemble learning. Security and Communication Networks, 9(17), 4227–4241.
Article Google Scholar
Bhatia, T., & Kaushal, R. (2017). Malware detection in android based on dynamic analysis. In Proceedings of the 2017 International Conference on Cyber Security And Protection Of Digital Services (Cyber Security), pp. 1–6.
Bholowalia, P., & Kumar, A. (2014). Article: ebk-means: A clustering technique based on elbow method and k-means in wsn. International Journal of Computer Applications, 105(9), 17–24.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning. 45(1), pp 5–32.
Ceci, M., Appice, A., Viktor, H. L., Malerba, D., Paquet, E., & Guo, H. (2012). Transductive relational classification in the co-training paradigm. In Perner, P. (Ed.) Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition, LNCS, vol. 7376, pp. 11–25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_2.
Demontis, A., Melis, M., Biggio, B., Maiorca, D., Arp, D., Rieck, K., Corona, I., Giacinto, G., & Roli, F. (2017). Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection. IEEE Transactions on Dependable and Secure Computing. PP. https://doi.org/10.1109/TDSC.2017.2700270.
Fan, M., Liu, J., Wang, W., Li, H., Tian, Z., & Liu, T. (2017). Dapasa: Detecting android piggybacked apps through sensitive subgraph analysis. IEEE Transactions on Information Forensics and Security, 12(8), 1772–1785. https://doi.org/10.1109/TIFS.2017.2687880.
Article Google Scholar
Fernȧndez, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., & Herrera, F. (2018). Learning from Imbalanced Data Sets Springer.
Folino, G., & Pisani, F. (2016). Evolving meta-ensemble of classifiers for handling incomplete and unbalanced datasets in the cyber security domain. Applied Soft Computing, 47, 179–190.
Article Google Scholar
Garcia-Ceja, E., Galván-Tejada, C. E., & Brena, R. (2018). Multi-view stacking for activity recognition with sound and accelerometer data. Information Fusion, 40, 45–56.
Article Google Scholar
Goyal, R., Spognardi, A., Dragoni, N., & Argyriou, M. (2016). Safedroid: a distributed malware detection service for android. In Proceedings of the 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA), pp. 59–66.
Guo, S., Yuan, Q., Lin, F., Wang, F., & Ban, T. (2010). A malware detection algorithm based on multi-view fusion. In Wong, K.w., Mendis, B.S.U., & Bouzerdoum, A. (Eds.) Neural Information Processing. Models and Applications, pp. 259–266. Springer.
Idrees, F., & Rajarajan, M. (2014). Investigating the android intents and permissions for malware detection. In Proceedings of the IEEE 10th International Conference on Wireless and Mobile Computing, Networking and Communications, pp. 354–358.
Kang, B., Yerima, S. Y., Mclaughlin, K., & Sezer, S. (2016). N-opcode analysis for android malware classification and categorization. In 2016 International conference on cyber security and protection of digital services (cyber security), pp. 1–7.
Kapratwar, A., Troia, F., & Stamp, M. (2017). Static and dynamic analysis of android malware. In Proceedings of the 3rd International Conference on Information Systems Security and Privacy, pp. 653–662. SCITEPRESS.
Khorshidpour, Z., Hashemi, S., & Hamzeh, A. (2017). Evaluation of random forest classifier in security domain. Applied Intelligence, 47(2), 558–569. https://doi.org/10.1007/s10489-017-0907-2.
Article Google Scholar
Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.
Article Google Scholar
Kumar, V. (2015). Multi-view ensemble learning using optimal feature set partitioning: An extended experiments and analysis in low dimensional scenario. Procedia Computer Science, 58, 499–506. Second International Symposium on Computer Vision and the Internet.
Article Google Scholar
Last, M. (2016). Multi-target classification: Methodology and practical case studies. In Berendt, B., Bringmann, B., Fromont, É., Garriga, G.C., Miettinen, P., Tatti, N., & Tresp, V. (Eds.) Proceedings of the Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Part III, LNCS, vol. 9853, pp. 280–283. Springer.
Li, Y., Shen, T., Sun, X., Pan, X., & Mao, B. (2015). Detection, classification and characterization of android malware using api data dependency. In Thuraisingham, B., Wang, X., & Yegneswaran, V. (Eds.) Proceedings of the Security and Privacy in Communication Networks, pp. 23–40. Springer.
Lin, W., Wu, Z., Lin, L., Wen, A., & Li, J. (2017). An ensemble random forest algorithm for insurance big data analysis. IEEE Access, 5(16), 568–16,575.
Google Scholar
Madjarov, G., Kocev, D., Gjorgjevikj, D., & Džeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45 (9), 3084–3104.
Article Google Scholar
Miller, S. T., & Busby-Earle, C. (2017). Multi-perspective machine learning a classifier ensemble method for intrusion detection. In Proceedings of the 2017 International Conference on Machine Learning and Soft Computing, ICMLSC ’17, pp. 7–12. ACM, DOI https://doi.org/10.1145/3036290.3036303, (to appear in print).
Milosevic, N., Dehghantanha, A., & Choo, K. K. R. (2017). Machine learning aided android malware classification. Computers and Electrical Engineering, 61, 266–274.
Article Google Scholar
Narayanan, A., Chandramohan, M., Chen, L., & Liu, Y. (2018). A multi-view context-aware approach to android malware detection and malicious code localization. Empirical Software Engineering, 23(3), 1222–1274. https://doi.org/10.1007/s10664-017-9539-8.
Article Google Scholar
Narayanan, A., Soh, C., Chen, L., Liu, Y., & Wang, L. (2018). Apk2vec: Semi-supervised multi-view representation learning for profiling android applications. In IEEE International conference on data mining, ICDM 2018, singapore, november 17-20, 2018, pp. 357–366. IEEE computer society, DOI https://doi.org/10.1109/ICDM.2018.00051, (to appear in print).
Nguyen-Vu, L., Ahn, J., & Jung, S. (2019). Android fragmentation in malware detection. Computers & Security, 87 (101), 573. https://doi.org/10.1016/j.cose.2019.101573.
Article Google Scholar
NOKIA. (2019). Nokia threat intelligence report – 2019. White paper, online at https://pages.nokia.com/T003B6-Threat-Intelligence-Report-2019.html.
Painter, N., & Kadhiwala, B. (2017). Comparative analysis of android malware detection techniques. In Satapathy, S.C., Bhateja, V., & Joshi, A. (Eds.) Proceedings of the International Conference on Data Engineering and Communication Technology, pp. 131–139. Springer.
Papagiannopoulou, C., Tsoumakas, G., & Tsamardinos, I. (2015). Discovering and exploiting deterministic label relationships in multi-label learning. In Cao, L., Zhang, C., Joachims, T., Webb, G.I., Margineantu, D.D., & Williams, G. (Eds.) Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 915–924. ACM.
Peiravian, N., & Zhu, X. (2013). Machine learning for android malware detection using permission and api calls. In Proceedings of the IEEE 25th International Conference on Tools with Artificial Intelligence, pp. 300–305.
Rovelli, P., & Vigfússon, Ý. (2014). Pmds: Permission-based malware detection system. In Prakash, A., & Shyamasundar, R. (Eds.) Proceedings of the Information Systems Security, pp. 338–357. Springer.
Roy, S., DeLoach, J., Li, Y., Herndon, N., Caragea, D., Ou, X., Ranganath, V. P., Li, H., & Guevara, N. (2015). Experimental study with real-world data for android app security analysis using machine learning. In Proceedings of the 31st Annual Computer Security Applications Conference, ACSAC 2015, pp. 81–90.
Sheen, S., Anitha, R., & Natarajan, V. (2015). Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing, 151, 905–912.
Article Google Scholar
Shiqi, L., Shengwei, T., Long, Y., Jiong, Y., & Hua, S. (2018). Android malicious code classification using deep belief network. KSII Transactions on Internet and Information Systems, 12, 454–475. https://doi.org/10.3837/tiis.2018.01.022.
Article Google Scholar
Suarez-Tangil, G., Dash, S. K., Ahmadi, M., Kinder, J., Giacinto, G., & Cavallaro, L. (2017). Droidsieve: Fast and accurate classification of obfuscated android malware. In Proceedings of the 7th ACM on Conference on Data and Application Security and Privacy, CODASPY 2017, pp. 309–320.
Sun, S., Mao, L., Dong, Z., & Wu, L. (2019). Multiview Deep Learning, (pp. 105–138). Singapore: Springer Singapore.
Book Google Scholar
Taheri, R., Javidan, R., Shojafar, M., Pooranian, Z., Miri, A., & Conti, M. (2019). On defending against label flipping attacks on malware detection systems. 1908.04473.
Tajoddin, A., & Abadi, M. (2019). Ramd: registry-based anomaly malware detection using one-class ensemble classifiers Applied Intelligence.
Talha, K. A., Alper, D. I., & Aydin, C. (2015). Apk auditor: Permission-based android malware detection system. Digital Investigation, 13, 1–14.
Article Google Scholar
Tiwari, P. K., & Singh, U. (2015). Android users security via permission based analysis. In Abawajy, J.H., Mukherjea, S., Thampi, S.M., & Ruiz-Martínez, A. (Eds.) Proceedings of the Security in Computing and Communications, pp. 496–505. Springer.
Ucci, D., Aniello, L., & Baldoni, R. (2019). Survey of machine learning techniques for malware analysis. Computers &, Security, 81, 123–147. https://doi.org/10.1016/j.cose.2018.11.001.
Article Google Scholar
Valmarska, A., & Miljkovic, D. (2017). Robnik-Šikonja, M., lavrač, N.: Multi-view approach to parkinson’s disease quality of life data analysis. In Appice, A., Ceci, M., Loglisci, C., Masciari, E., & Raś, Z.W. (Eds.) Proceedings of the 2016 New Frontiers in Mining Complex Patterns, Selected papers, pp. 163–178. Springer.
Vinayakumar, R., BarathiGanesh, H., Poornachandran, P., AnandKumar, M., & Somank., P. (2018). Deep-net: Deep neural network for cyber security use cases. 1812.03519.
Wen, L., & Yu, H. (2017). An android malware detection system based on machine learning. In Proceedings of the AIP Conference, vol. 1864. American Institute of Physics.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.
Article Google Scholar
Yerima, S. Y., Sezer, S., & Muttik, I. (2014). Android malware detection using parallel machine learning classifiers. In Proceedings of the 8th International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 37–42.
Yu, J., Wang, M., & Tao, D. (2012). Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Transactions on Image Processing, 21(11), 4636–4648.
Article MathSciNet Google Scholar
Zhang, Y., Huang, Q., Ma, X., Yang, Z., & Jiang, J. (2016). Using multi-features and ensemble learning method for imbalanced malware classification. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 965–973.
Zhao, J., Xie, X., Xu, X., & Sun, S. (2017). Multi-view learning overview: Recent progress and new challenges. Information Fusion, 38, 43–54.
Article Google Scholar
Zhou, Y., & Jiang, X. (2012). Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, pp. 95–109.

Download references

Acknowledgements

The authors wish to thank Lynn Rudd for her help in reading the manuscript, Dragi Kocev for his help in configuring the parameters, in order to learn the multi-target Random Forests, Daniel Arp for providing the benchmark data used in the empirical study and ReCaS-Bari resource team for providing the infrastructure to run the experiential study.

Author information

Authors and Affiliations

Department of Informatics, Università degli Studi di Bari Aldo Moro, via Orabona, 4-I-70125, Bari, Italy
Annalisa Appice, Giuseppina Andresini & Donato Malerba
Consorzio Interuniversitario Nazionale per l’Informatica - CINI, via Orabona, 4-I-70125, Bari, Italy
Annalisa Appice & Donato Malerba

Authors

Annalisa Appice
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppina Andresini
View author publications
You can also search for this author in PubMed Google Scholar
Donato Malerba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Annalisa Appice.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Appice, A., Andresini, G. & Malerba, D. Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection. J Intell Inf Syst 55, 1–26 (2020). https://doi.org/10.1007/s10844-020-00598-6

Download citation

Received: 03 October 2019
Revised: 29 January 2020
Accepted: 17 March 2020
Published: 04 May 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10844-020-00598-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A survey on semi-supervised learning

The current state and future of mobile security in the light of the recent mobile security threat reports

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A survey on semi-supervised learning

The current state and future of mobile security in the light of the recent mobile security threat reports

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation