Enhancing Ransomware Classification with Multi-stage Feature Selection and Data Imbalance Correction

Onwuegbuche, Faithful Chiagoziem; Jurcut, Anca Delia; Pasquale, Liliana

doi:10.1007/978-3-031-34671-2_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13914))

Included in the following conference series:

International Symposium on Cyber Security, Cryptology, and Machine Learning

668 Accesses
1 Citations

Abstract

Ransomware is a critical security concern, and developing applications for ransomware detection is paramount. Machine learning models are helpful in detecting and classifying ransomware. However, the high dimensionality of ransomware datasets divided into various feature groups such as API calls, Directory, and Registry logs has made it difficult for researchers to create effective machine learning models. Class imbalance also leads to poor results when classifying ransomware families. To tackle these challenges, in this paper we propose a three-stage feature selection method that effectively reduces the dimensionality of the data and considers the varying importance of the different feature groups in the classification of ransomware families. We also applied cost-sensitive learning and re-sampling of the training data using SMOTE to address data imbalance. We applied these techniques to the Elderan ransomware dataset. Our results show that the proposed feature selection method significantly improves the detection of ransomware compared to other state-of-art studies using the same dataset. Furthermore, the data balancing techniques (cost-sensitive learning and SMOTE) were effective in the multi-class classification of ransomware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbasi, M.S., Al-Sahaf, H., Welch, I.: Particle swarm optimization: a wrapper-based feature selection method for ransomware detection and classification. In: Castillo, P.A., Jiménez Laredo, J.L., Fernández de Vega, F. (eds.) EvoApplications 2020. LNCS, vol. 12104, pp. 181–196. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43722-0_12
Chapter Google Scholar
Almomani, I., et al.: Android ransomware detection based on a hybrid evolutionary approach in the context of highly imbalanced data. IEEE Access 9, 57674–57691 (2021)
Article Google Scholar
Almousa, M., Basavaraju, S., Anwar, M.: Api-based ransomware detection using machine learning-based threat detection models. In: 2021 18th International Conference on Privacy, Security and Trust (PST), pp. 1–7. IEEE (2021)
Google Scholar
Aurangzeb, S., Anwar, H., Naeem, M.A., Aleem, M.: BigRC-EML: big-data based ransomware classification using ensemble machine learning. Clust. Comput. 25(5), 3405–3422 (2022)
Article Google Scholar
Avila, R., Khoury, R., Pere, C., Khanmohammadi, K.: Employing feature selection to improve the performance of intrusion detection systems. In: Aïmeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds.) FPS 2021. LNCS, vol. 13291, pp. 93–112. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08147-7_7
Chapter Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Article Google Scholar
Beaman, C., Barkworth, A., Akande, T.D., Hakak, S., Khan, M.K.: Ransomware: recent advances, analysis, challenges and future research directions. Comput. Secur. 111, 102490 (2021)
Article Google Scholar
Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Article Google Scholar
Brownlee, J.: Imbalanced classification with Python: Better Metrics, Balance Skewed Classes, Cost-sensitive Learning. Machine Learning Mastery (2020)
Google Scholar
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Chen, Q., Bridges, R.A.: Automated behavioral analysis of malware: a case study of wannacry ransomware. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 454–460. IEEE (2017)
Google Scholar
Collier, R.: NHS ransomware attack spreads worldwide (2017)
Google Scholar
Cyber Security Policy: Securing cyber resilience in health and care: October 2018 progress update (2018). https://www.gov.uk/government/publications/securing-cyber-resilience-in-health-and-care-october-2018-update
Goyal, M., Kumar, R.: Machine learning for malware detection on balanced and imbalanced datasets. In: 2020 International Conference on Decision Aid Sciences and Application (DASA), pp. 867–871. IEEE (2020)
Google Scholar
Khan, F., Ncube, C., Ramasamy, L.K., Kadry, S., Nam, Y.: A digital DNA sequencing engine for ransomware detection using machine learning. IEEE Access 8, 119710–119719 (2020)
Article Google Scholar
Kshetri, N., Voas, J.: Do crypto-currencies fuel ransomware? IT Prof. 19(5), 11–15 (2017)
Article Google Scholar
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Article Google Scholar
Ma, Y., He, H.: Imbalanced Learning: Foundations, Algorithms, and Applications (2013)
Google Scholar
McIntosh, T., Kayes, A., Chen, Y.P.P., Ng, A., Watters, P.: Ransomware mitigation in the modern era: a comprehensive review, research challenges, and future directions. ACM Comput. Surv. (CSUR) 54(9), 1–36 (2021)
Article Google Scholar
Meland, P.H., Bayoumy, Y.F.F., Sindre, G.: The ransomware-as-a-service economy within the darknet. Comput. Secur. 92, 101762 (2020)
Article Google Scholar
Moreira, C.C., de Sales Jr, C.D.S., Moreira, D.C.: Understanding ransomware actions through behavioral feature analysis. J. Commun. Inf. Syst. 37(1), 61–76 (2022)
Google Scholar
Pang, Y., Peng, L., Chen, Z., Yang, B., Zhang, H.: Imbalanced learning based on adaptive weighting and gaussian function synthesizing with an application on android malware detection. Inf. Sci. 484, 95–112 (2019)
Article Google Scholar
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Article Google Scholar
Sgandurra, D., Muñoz-González, L., Mohsen, R., Lupu, E.C.: Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv preprint arXiv:1609.03020 (2016)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J 27(3), 379–423 (1948)
Article MathSciNet MATH Google Scholar
Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. 513, 429–441 (2020)
Article MathSciNet Google Scholar
Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)
Google Scholar
Urdan, T.C.: Statistics in Plain English. Routledge, Abingdon (2011)
Book MATH Google Scholar
Wu, D., Guo, P., Wang, P.: Malware detection based on cascading XGboost and cost sensitive. In: 2020 International Conference on Computer Communication and Network Security (CCNS), pp. 201–205. IEEE (2020)
Google Scholar

Download references

Acknowledgements

This work was funded by Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning (18/CRT/6183).

Author information

Authors and Affiliations

SFI Center for Research Training in Machine Learning (ML-Labs), Dublin, Ireland
Faithful Chiagoziem Onwuegbuche
School of Computing, University College Dublin, Dublin, Ireland
Faithful Chiagoziem Onwuegbuche, Anca Delia Jurcut & Liliana Pasquale

Authors

Faithful Chiagoziem Onwuegbuche
View author publications
You can also search for this author in PubMed Google Scholar
Anca Delia Jurcut
View author publications
You can also search for this author in PubMed Google Scholar
Liliana Pasquale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faithful Chiagoziem Onwuegbuche .

Editor information

Editors and Affiliations

Ben-Gurion University of the Negev, Be’er Sheva, Israel
Shlomi Dolev
Ben-Gurion University of the Negev, Be’er Sheva, Israel
Ehud Gudes
Zama, Meythet, France
Pascal Paillier

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Onwuegbuche, F.C., Jurcut, A.D., Pasquale, L. (2023). Enhancing Ransomware Classification with Multi-stage Feature Selection and Data Imbalance Correction. In: Dolev, S., Gudes, E., Paillier, P. (eds) Cyber Security, Cryptology, and Machine Learning. CSCML 2023. Lecture Notes in Computer Science, vol 13914. Springer, Cham. https://doi.org/10.1007/978-3-031-34671-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-34671-2_20
Published: 21 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34670-5
Online ISBN: 978-3-031-34671-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Ransomware Classification with Multi-stage Feature Selection and Data Imbalance Correction