Abstract
Ransomware is a critical security concern, and developing applications for ransomware detection is paramount. Machine learning models are helpful in detecting and classifying ransomware. However, the high dimensionality of ransomware datasets divided into various feature groups such as API calls, Directory, and Registry logs has made it difficult for researchers to create effective machine learning models. Class imbalance also leads to poor results when classifying ransomware families. To tackle these challenges, in this paper we propose a three-stage feature selection method that effectively reduces the dimensionality of the data and considers the varying importance of the different feature groups in the classification of ransomware families. We also applied cost-sensitive learning and re-sampling of the training data using SMOTE to address data imbalance. We applied these techniques to the Elderan ransomware dataset. Our results show that the proposed feature selection method significantly improves the detection of ransomware compared to other state-of-art studies using the same dataset. Furthermore, the data balancing techniques (cost-sensitive learning and SMOTE) were effective in the multi-class classification of ransomware.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbasi, M.S., Al-Sahaf, H., Welch, I.: Particle swarm optimization: a wrapper-based feature selection method for ransomware detection and classification. In: Castillo, P.A., Jiménez Laredo, J.L., Fernández de Vega, F. (eds.) EvoApplications 2020. LNCS, vol. 12104, pp. 181–196. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43722-0_12
Almomani, I., et al.: Android ransomware detection based on a hybrid evolutionary approach in the context of highly imbalanced data. IEEE Access 9, 57674–57691 (2021)
Almousa, M., Basavaraju, S., Anwar, M.: Api-based ransomware detection using machine learning-based threat detection models. In: 2021 18th International Conference on Privacy, Security and Trust (PST), pp. 1–7. IEEE (2021)
Aurangzeb, S., Anwar, H., Naeem, M.A., Aleem, M.: BigRC-EML: big-data based ransomware classification using ensemble machine learning. Clust. Comput. 25(5), 3405–3422 (2022)
Avila, R., Khoury, R., Pere, C., Khanmohammadi, K.: Employing feature selection to improve the performance of intrusion detection systems. In: Aïmeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds.) FPS 2021. LNCS, vol. 13291, pp. 93–112. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08147-7_7
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Beaman, C., Barkworth, A., Akande, T.D., Hakak, S., Khan, M.K.: Ransomware: recent advances, analysis, challenges and future research directions. Comput. Secur. 111, 102490 (2021)
Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Brownlee, J.: Imbalanced classification with Python: Better Metrics, Balance Skewed Classes, Cost-sensitive Learning. Machine Learning Mastery (2020)
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, Q., Bridges, R.A.: Automated behavioral analysis of malware: a case study of wannacry ransomware. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 454–460. IEEE (2017)
Collier, R.: NHS ransomware attack spreads worldwide (2017)
Cyber Security Policy: Securing cyber resilience in health and care: October 2018 progress update (2018). https://www.gov.uk/government/publications/securing-cyber-resilience-in-health-and-care-october-2018-update
Goyal, M., Kumar, R.: Machine learning for malware detection on balanced and imbalanced datasets. In: 2020 International Conference on Decision Aid Sciences and Application (DASA), pp. 867–871. IEEE (2020)
Khan, F., Ncube, C., Ramasamy, L.K., Kadry, S., Nam, Y.: A digital DNA sequencing engine for ransomware detection using machine learning. IEEE Access 8, 119710–119719 (2020)
Kshetri, N., Voas, J.: Do crypto-currencies fuel ransomware? IT Prof. 19(5), 11–15 (2017)
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Ma, Y., He, H.: Imbalanced Learning: Foundations, Algorithms, and Applications (2013)
McIntosh, T., Kayes, A., Chen, Y.P.P., Ng, A., Watters, P.: Ransomware mitigation in the modern era: a comprehensive review, research challenges, and future directions. ACM Comput. Surv. (CSUR) 54(9), 1–36 (2021)
Meland, P.H., Bayoumy, Y.F.F., Sindre, G.: The ransomware-as-a-service economy within the darknet. Comput. Secur. 92, 101762 (2020)
Moreira, C.C., de Sales Jr, C.D.S., Moreira, D.C.: Understanding ransomware actions through behavioral feature analysis. J. Commun. Inf. Syst. 37(1), 61–76 (2022)
Pang, Y., Peng, L., Chen, Z., Yang, B., Zhang, H.: Imbalanced learning based on adaptive weighting and gaussian function synthesizing with an application on android malware detection. Inf. Sci. 484, 95–112 (2019)
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Sgandurra, D., Muñoz-González, L., Mohsen, R., Lupu, E.C.: Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv preprint arXiv:1609.03020 (2016)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J 27(3), 379–423 (1948)
Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. 513, 429–441 (2020)
Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)
Urdan, T.C.: Statistics in Plain English. Routledge, Abingdon (2011)
Wu, D., Guo, P., Wang, P.: Malware detection based on cascading XGboost and cost sensitive. In: 2020 International Conference on Computer Communication and Network Security (CCNS), pp. 201–205. IEEE (2020)
Acknowledgements
This work was funded by Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning (18/CRT/6183).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Onwuegbuche, F.C., Jurcut, A.D., Pasquale, L. (2023). Enhancing Ransomware Classification with Multi-stage Feature Selection and Data Imbalance Correction. In: Dolev, S., Gudes, E., Paillier, P. (eds) Cyber Security, Cryptology, and Machine Learning. CSCML 2023. Lecture Notes in Computer Science, vol 13914. Springer, Cham. https://doi.org/10.1007/978-3-031-34671-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-34671-2_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34670-5
Online ISBN: 978-3-031-34671-2
eBook Packages: Computer ScienceComputer Science (R0)