Malware detection and classification using community detection and social network analysis

Reddy, Varshini; Kolli, Naimisha; Balakrishnan, N.

doi:10.1007/s11416-021-00387-x

Malware detection and classification using community detection and social network analysis

Original Paper
Published: 14 May 2021

Volume 17, pages 333–346, (2021)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

1093 Accesses
5 Citations
Explore all metrics

Abstract

Despite the efforts of antivirus vendors and researchers to overcome the threat of malware and its growth, malware remains a rampant problem causing significant economic and intellectual property loss. Malware developers evade commercial detection tools by introducing minor code changes and obfuscation, leading to the creation of variants of known malware families. The volume of malware variants being introduced is increasing every day, resulting in the need for new methods to detect and classify malware with high scalability in less time. To this end, we propose a novel technique that exploits community detection properties and social network analysis concepts. The proposed method is based on system call graphs obtained by extracting the system calls found in the execution of the malware files. To study the inherent characteristics of different malware families, we extract features conforming to community and social network properties and use them for classification. A set of 5 models ranging from using only OS-level actions, to the model that includes community-level features and social network features have been presented. The highest performance has been shown to arise when community-level features and social network features were used in combination with malware class-level features. A suite of 9 machine learning algorithms have been used, and the results have been compared. Our evaluation results demonstrate that our combined approach outperforms many previously used methods in malware detection and classification, being able to achieve precision, recall, and accuracy of more than 0.97 using Multilayer Perceptron and k-Nearest Neighbors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic malware classification and new malware detection using machine learning

Article 01 September 2017

Machine Learning and Network Traffic to Distinguish Between Malware and Benign Applications

Classification and online clustering of zero-day malware

Article Open access 12 February 2024

Availability of data and material

Most of the codes used are in public domain such as CalmAV, Cuckoo Sandbox and the ML Packages. The feature extraction and feature reduction codes are custom built.

References

Infographic - Internet Security Insights Q1 2019. https://www.watchguard.com/wgrd-resource-center/infographic/internet-security-insights-q1-2019 (2018).
Ucci, D., Aniello, L., Baldoni, R.: Survey of machine learning techniques for malware analysis. Comput. Secur. 81, 123–147 (2019)
Article Google Scholar
Souri, A., Hosseini, R.: A state-of-the-art survey of malware detection approaches using data mining techniques. HCIS 8(1), 3 (2018). https://doi.org/10.1186/s13673-018-0125-x
Article Google Scholar
Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classification of malware: research developments, trends and challenges. J. Netw. Comput. Appl. 153, 102526 (2020)
Article Google Scholar
Latha, H Pa RM.: Classification of malware detection using machine learning algorithms-a survey. Int. J. Sci. Res. Technol. 9(2), 1796–1802 (2020)
Google Scholar
Jang, J.W., Woo, J., Mohaisen, A., Yun, J., Kim, H.K.: Mal-netminer: Malware classification approach based on social network analysis of system call graph. Math. Probl. Eng. 2015, 1–20 (2015)
Google Scholar
Kim, H.M., Song, H.M., Seo, J.W., Kim, H.K.: Andro-simnet: Android malware family classification using social network analysis. In: 2018 16th Annual Conference on Privacy, Security and Trust (PST) 2018, pp. 1–8. IEEE
Cruickshank, I., Johnson, A., Davison, T., Elder, M., Carley, K.M.: Detecting malware communities using socio-cultural cognitive mapping. Comput. Math. Organ. Theory 26(3), 307–319 (2020)
Article Google Scholar
Cruickshank, I.J., Carley, K.M.: Analysis of malware communities using multi-modal features. IEEE Access 8, 77435–77448 (2020)
Article Google Scholar
Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM conference on Computer and communications security 2011, pp. 309–320
Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 50(3), 1–40 (2017)
Article Google Scholar
Balram, N., Hsieh, G., McFall, C.: Static Malware Analysis Using Machine Learning Algorithms on APT1 Dataset with String and PE Header Features. In: 2019 International Conference on Computational Science and Computational Intelligence (CSCI) 2019, pp. 90–95. IEEE
Yewale, A., Singh, M.: Malware detection based on opcode frequency. In: 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) 2016, pp. 646–649. IEEE
Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Australasian Joint Conference on Artificial Intelligence 2016, pp. 137–149. Springer
Ye, Y., Wang, D., Li, T., Ye, D., Jiang, Q.: An intelligent PE-malware detection system based on association mining. J. Comput. Virol. 4(4), 323–334 (2008)
Article Google Scholar
Chowdhury, M., Rahman, A., Islam, R.: Malware analysis and detection using data mining and machine learning classification. In: International Conference on Applications and Techniques in Cyber Security and Intelligence 2017, pp. 266–274. Springer
Sharma, A.B., Prakash, B.A.: Graphs for Malware Detection: The Next Frontier.
Park, Y., Reeves, D.S., Stamp, M.: Deriving common malware behavior through graph clustering. Comput. Secur. 39, 419–430 (2013)
Article Google Scholar
Eskandari, M., Hashemi, S.: A graph mining approach for detecting unknown malwares. J. Vis. Lang. Comput. 23(3), 154–162 (2012)
Article Google Scholar
Elhadi, A.A.E., Maarof, M.A., Barry, B.I.: Improving the detection of malware behaviour using simplified data dependent API call graph. Int. J. Secur. Appl. 7(5), 29–42 (2013)
Google Scholar
Chau, D.H.P., Nachenberg, C., Wilhelm, J., Wright, A., Faloutsos, C.: Polonium: Tera-scale graph mining and inference for malware detection. In: Proceedings of the 2011 SIAM International Conference on Data Mining 2011, pp. 131–142. SIAM
Chen, L., Li, T., Abdulhayoglu, M., Ye, Y.: Intelligent malware detection based on file relation graphs. In: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) 2015, pp. 85–92. IEEE
Venkatesh, B., Choudhury, S.H., Nagaraja, S., Balakrishnan, N.: BotSpot: fast graph based identification of structured P2P bots. J. Comput. Virol. Hack. Tech. 11(4), 247–261 (2015)
Article Google Scholar
Bhattacharya, A., Goswami, R.T.: Community based feature selection method for detection of android malware. J. Global Inf. Manag. (JGIM) 26(3), 54–77 (2018)
Article Google Scholar
Kim, C.W.: Ntmaldetect: A machine learning approach to malware detection using native API system calls. arXiv preprint. arXiv1802.05412 (2018).
Du, Y., Wang, J., Li, Q.: An android malware detection approach using community structures of weighted function call graphs. IEEE Access 5, 17478–17486 (2017)
Article Google Scholar
Fan, M., Liu, J., Luo, X., Chen, K., Chen, T., Tian, Z., Zhang, X., Zheng, Q., Liu, T.: Frequent subgraph based familial classification of android malware. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE) 2016, pp. 24–35. IEEE
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Article MathSciNet Google Scholar
Kim, S.: PE header analysis for malware detection. (2018).
Kolli, N., Balakrishnan, N.: Hybrid Features for Churn Prediction in Mobile Telecom Networks with Data Constraints.
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), 10008 (2008)
Article Google Scholar
Van Steen, M.: An introduction to graph theory and complex networks. Copyrighted material (2010).
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning series). In. e MIT Press, Cambridge, England (2016)
MATH Google Scholar
Géron, A.: Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. Massachusetts, O’Reilly Media (2019)
Google Scholar
Roccia, T.: Malware packers use tricks to avoid analysis, detection. McAfee Blogs (2017).
Devi, D., Nandi, S.: Detection of packed malware. In: Proceedings of the First International Conference on Security of Internet of Things 2012, pp. 22–26
Yan, W., Zhang, Z., Ansari, N.: Revealing packed malware. IEEE Secur. Priv. 6(5), 65–69 (2008)
Article Google Scholar
Afianian, A., Niksefat, S., Sadeghiyan, B., Baptiste, D.: Malware dynamic analysis evasion techniques: a survey. ACM Comput. Surv. (CSUR) 52(6), 1–28 (2019)
Article Google Scholar
Miramirkhani, N., Appini, M.P., Nikiforakis, N., Polychronakis, M.: Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. In: 2017 IEEE Symposium on Security and Privacy (SP) 2017, pp. 1009–1024. IEEE
Lindorfer, M., Kolbitsch, C., Comparetti, P.M.: Detecting environment-sensitive malware. In: International Workshop on Recent Advances in Intrusion Detection 2011, pp. 338–357. Springer

Download references

Acknowledgements

The authors acknowledge the reviewers for giving very perceptive commnets and suggestions which improved the quality of the paper significantly.

Funding

This work was supported by the grants from the Ministry of Communication and Information Technology of the Government of India under the Information Security and Awareness (ISEA) program.

Author information

Authors and Affiliations

Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, 560012, India
Varshini Reddy, Naimisha Kolli & N. Balakrishnan

Authors

Varshini Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Naimisha Kolli
View author publications
You can also search for this author in PubMed Google Scholar
N. Balakrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Balakrishnan.

Ethics declarations

Conflicts of interest

There is absolutely no conflict of interest amongst the authors, the organization that they work for or with any private or public entities.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reddy, V., Kolli, N. & Balakrishnan, N. Malware detection and classification using community detection and social network analysis. J Comput Virol Hack Tech 17, 333–346 (2021). https://doi.org/10.1007/s11416-021-00387-x

Download citation

Received: 15 August 2020
Accepted: 28 April 2021
Published: 14 May 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11416-021-00387-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malware detection and classification using community detection and social network analysis

Abstract

Access this article

Similar content being viewed by others

Automatic malware classification and new malware detection using machine learning

Machine Learning and Network Traffic to Distinguish Between Malware and Benign Applications

Classification and online clustering of zero-day malware

Availability of data and material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Malware detection and classification using community detection and social network analysis

Abstract

Access this article

Similar content being viewed by others

Automatic malware classification and new malware detection using machine learning

Machine Learning and Network Traffic to Distinguish Between Malware and Benign Applications

Classification and online clustering of zero-day malware

Availability of data and material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation