ABSTRACT
Ransomware attacks continue to be a prominent cybersecurity threat and the subject of considerable research activity. Despite frequent high profile public reports of ransomware attacks, we found a paucity of tangible open behavioral activity data for large collections of real world ransomware binaries. The lack of such open datasets introduces barriers to research that may otherwise lead to innovative approaches to ransomware mitigation. We have constructed a dataset of ransomware activity logs and corresponding provenance graphs. They are derived from the sandboxed execution of all ransomware-tagged binaries in the widely-known MalwareBazaar. We also provide the code for orchestrating the log collection and provenance inference steps. The aim is to enable other researchers to customize and extend it for their analyses. We hope that the dataset will facilitate the discovery of innovative and effective ransomware mitigation strategies.
- [n. d.]. Cuckoo Sandbox. https://cuckoosandbox.org/Google Scholar
- [n. d.]. FBI No Longer Negotiating with Ransomware Group That Leaked Oakland Data. https://abc7news.com/oakland-ransomware-hacked-data-leaked-fbi-dark-web/13225220/Google Scholar
- [n. d.]. MalwareBazaar. https://bazaar.abuse.ch/browse/Google Scholar
- [n. d.]. Ransomware Full Recovery Could Take Months, Dallas Officials Say. https://www.dallasnews.com/news/politics/2023/05/11/ransomware-full-recovery-could-take-months-dallas-officials-say/Google Scholar
- [n. d.]. Tukey five-number summary. https://en.wikipedia.org/wiki/Five-number_summaryGoogle Scholar
- Muhammad Ejaz Ahmed, Hyoungshick Kim, Seyit Camtepe, and Surya Nepal. 2021. Peeler: Profiling Kernel-Level Events to Detect Ransomware. 26th European Symposium on Research in Computer Security (2021).Google ScholarDigital Library
- Mathieu Barre, Ashish Gehani, and Vinod Yegneswaran. 2019. Mining Data Provenance to Detect Advanced Persistent Threats. 11th USENIX Workshop on the Theory and Practice of Provenance (TaPP) (2019).Google Scholar
- Gordon Blair. 2022. Test of Time Award. ACM Middleware (2022). https://middleware-conf.github.io/2022/awards/#testOfTimeGoogle Scholar
- Simon Davies, Richard Macfarlane, and William J Buchanan. 2022. NapierOne: A Modern Mixed File Data Set Alternative to Govdocs1. Forensic Science International: Digital Investigation 40 (2022).Google Scholar
- Feng Dong, Liu Wang, Xu Nie, Fei Shao, Haoyu Wang, Ding Li, Xiapu Luo, and Xusheng Xiao. 2023. DISTDET: A Cost-Effective Distributed Cyber Threat Detection System. 30th USENIX Security Symposium (2023).Google Scholar
- John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C North, and Gordon Woodhull. 2002. Graphviz — Open Source Graph Drawing Tools. 9th International Symposium on Graph Drawing (2002).Google ScholarCross Ref
- Ashish Gehani, Raza Ahmad, Hassaan Irshad, Jianqiao Zhu, and Jignesh Patel. 2021. Digging Into "Big Provenance" (With SPADE). Commun. ACM 64(12) (2021).Google Scholar
- Ashish Gehani and Dawood Tariq. 2012. SPADE: Support for Provenance Auditing in Distributed Environments. 13th ACM/IFIP/USENIX International Middleware Conference (2012).Google Scholar
- REPROD GitHub. [n. d.]. Code for orchestrating ransomware execution log and provenance collection. https://github.com/REPROD-provGoogle Scholar
- Xueyuan Han, James Mickens, Ashish Gehani, Margo Seltzer, and Thomas Pasquier. 2020. Xanthus: Push-button Orchestration of Host Provenance Data Collection. 3rd ACM Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS) (2020).Google Scholar
- Manabu Hirano, Ryo Hodota, and Ryotaro Kobayashi. 2022. RanSAP: An Open Dataset of Ransomware Storage Access Patterns for Training Machine Learning Models. Forensic Science International: Digital Investigation 40 (2022).Google Scholar
- Hassaan Irshad, Gabriela Ciocarlie, Ashish Gehani, Vinod Yegneswaran, Kyu Hyung Lee, Jignesh Patel, Somesh Jha, Yonghwi Kwon, Dongyan Xu, and Xiangyu Zhang. 2021. TRACE: Enterprise-Wide Provenance Tracking For Real-Time APT Detection. IEEE Transactions on Information Forensics and Security (TIFS) 16 (2021).Google Scholar
- Amin Kharaz, Sajjad Arshad, Collin Mulliner, William Robertson, and Engin Kirda. 2016. UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware. 25th USENIX Security Symposium (2016).Google Scholar
- Rui Mei, Han-Bing Yan, and Zhi-Hui Han. 2021. RansomLens: Understanding Ransomware via Causality Analysis on System Provenance Graph. Science of Cyber Security (2021).Google Scholar
- Richard Vanderford. 2023. Merck’s Insurers On the Hook in $1.4 Billion NotPetya Attack, Court Says. Wall Street Journal (2023).Google Scholar
- Aldin Vehabovic, Nasir Ghani, Elias Bou-Harb, Jorge Crichigno, and Aysegul Yayimli. 2022. Ransomware Detection and Classification Strategies. IEEE International Black Sea Conference on Communications and Networking (2022).Google Scholar
- Christian Wojner. [n. d.]. DensityScout. https://cert.at/en/downloads/software/software-densityscoutGoogle Scholar
- REPROD Zenodo. [n. d.]. Ransomware execution trace and provenance data. https://doi.org/10.5281/zenodo.7933806Google ScholarCross Ref
Index Terms
- Towards Reproducible Ransomware Analysis
Recommendations
Sorting Ransomware from Malware Utilizing Machine Learning Methods with Dynamic Analysis
MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingRansomware attacks have grown significantly in the past dozen years and have disrupted businesses that engage with personal data. In this paper, we discuss the identification of ransomware, malware, and benign software from one another using machine ...
Ransomware Network Traffic Analysis for Pre-encryption Alert
Foundations and Practice of SecurityAbstractCyber Security researchers are in an ongoing battle against ransomware attacks. Some exploits begin with social engineering methods to install payloads on victims’ computers, followed by a communication with command and control servers for data ...
Behavioural analysis and results of malware and ransomware using optimal behavioural feature set
Ransomware is the subset of malware that is considered the most jeopardising malware. In a malware/ransomware attack, attacker encrypts all the essential data files and demands the ransom to get all the important files that it has encrypted. There are ...
Comments