skip to main content
10.1145/3585059.3611416acmconferencesArticle/Chapter ViewAbstractPublication PagesiteConference Proceedingsconference-collections
research-article

A Comprehensive Dataset Towards Hands-on Experience Enhancement in a Research-Involved Cybersecurity Program

Published: 11 October 2023 Publication History

Abstract

Undergraduate research activities have demonstrated their ability to enhance learning outcomes, improve student retention, and foster critical thinking. In particular, cybersecurity is a highly practical discipline, and relying solely on passive learning or observation is far from sufficient. By actively engaging in research, students can better comprehend and apply the knowledge that they have acquired, thus cultivating their ability to solve practical problems. On the other hand, we observe that well-established datasets hold paramount importance for AI-driven cybersecurity applications, while pertinent data are usually a precious and scarce resource. In this paper, we introduce a comprehensive dataset SecAtlas towards research-involved cybersecurity programs using real-world cases. The dataset construction provides students with various realistic samples of cyber-attacks and vulnerabilities, which can make them experience the complexity of cybersecurity and acquire relevant skills. More importantly, SecAtlas greatly facilitates the engagement of students in subsequent data-centric cybersecurity research. Finally, we conduct a set of studies to evaluate the collected dataset. The investigation results demonstrate the effectiveness of applying the proposed dataset to practical security projects.

References

[1]
National Security Agency. 2023. Ghidra - Software Reverse Engineering Framework. https://ghidra-sre.org/. Accessed: 2023-05.
[2]
Sunwoo Ahn, Seonggwan Ahn, Hyungjoon Koo, and Yunheung Paek. 2022. Practical binary code similarity detection with BERT-based transferable similarity learning. In The Annual Computer Security Applications Conference (ACSAC).
[3]
Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. 2017. Neural Nets Can Learn Function Type Signatures From Binaries. In USENIX Security Symposium.
[4]
The MITRE Corporation. 2023. CVE. https://www.cve.org/. Accessed: 2023-05.
[5]
Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. 2020. Deepbindiff: Learning program-wide code representations for binary diffing. In the 27th Network and Distributed System Security Symposium (NDSS).
[6]
Han Gao, Shaoyin Cheng, Yinxing Xue, and Weiming Zhang. 2021. A lightweight framework for function name reassignment based on large-scale stripped binaries. In The ACM SIGSOFT International Symposium on Software Testing and Analysis.
[7]
Antonios Gkortzis, Dimitris Mitropoulos, and Diomidis Spinellis. 2018. VulinOSS: a dataset of security vulnerabilities in open-source systems. In the 15th International conference on mining software repositories.
[8]
Hex-Rays. 2023. IDA Pro - The Interactive Disassembler. https://www.hex-rays.com/ida-pro/. Accessed: 2023-05.
[9]
Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2022. Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned. IEEE Transactions on Software Engineering 49, 4 (2022), 1661–1682.
[10]
Nickolaos Koroniotis, Nour Moustafa, Elena Sitnikova, and Benjamin Turnbull. 2019. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Generation Computer Systems 100 (2019), 779–796.
[11]
Nathaniel Lageman, Eric D Kilmer, Robert J Walls, and Patrick D McDaniel. 2016. BinDNN: Resilient Function Matching Using Deep Learning. In International Conference on Security and Privacy in Communication Systems (SecureComm).
[12]
Xuezixiang Li, Yu Qu, and Heng Yin. 2021. Palmtree: Learning an assembly language model for instruction embedding. In The ACM Conference on Computer and Communications Security (CCS).
[13]
National Institute of Standards and Technology. 2023. National Vulnerability Database. https://nvd.nist.gov/. Accessed: 2023-05.
[14]
GNU Project. 2023. GNU packages. https://www.gnu.org/software/software.en.html. Accessed: 2023-03-01.
[15]
Check Point Research. 2023. Cyber Security Report. https://resources.checkpoint.com/cyber-security-resources/2023-cyber-security-report. Accessed: 2023-05.
[16]
Junghwan Rhee, Myungah Park, Fei Zuo, Shuai Zhang, Gang Qian, Goutam Mylavarapu, Hong Sung, and Thomas Turner. 2023. Developing incident response-focused cybersecurity undergraduate curricula. Journal of Computing Sciences in Colleges 38, 7 (2023), 65–74.
[17]
Navin Sabharwal and Piyush Pandey. 2020. Container Application Monitoring Using Sysdig. In Monitoring Microservices and Containerized Applications: Deployment, Configuration, and Best Practices for Prometheus and Alert Manager. Apress, Berkeley, CA, 235–269.
[18]
IBM Security. 2022. Cost of a Data Breach Report. https://www.ibm.com/reports/data-breach. Accessed: 2023-05.
[19]
Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks. In USENIX Security Symposium.
[20]
Madhukar Shrestha, Yonghyun Kim, Jeehyun Oh, Junghwan Rhee, Yung Ryn Choe, Fei Zuo, Myungah Park, and Gang Qian. 2023. ProvSec: Cybersecurity System Provenance Analysis Benchmark Dataset. In the 21st IEEE/ACIS International Conference on Software Engineering Research, Management and Applications.
[21]
Yuan Tian, Julia Lawall, and David Lo. 2012. Identifying linux bug fixing patches. In the 34th IEEE international conference on software engineering.
[22]
Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: jump-aware transformer for binary code similarity detection. In The ACM SIGSOFT International Symposium on Software Testing and Analysis.
[23]
Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, and Sushil Jajodia. 2021. PatchDB: A large-scale security patch dataset. In the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[24]
Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2021. SPI: Automated identification of security patches via commits. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 1 (2021), 1–27.
[25]
Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, and Zhexin Zhang. 2019. Neural machine translation inspired binary code similarity comparison beyond function pairs. In the 26th Network and Distributed System Security Symposium (NDSS).
[26]
Fei Zuo, Xin Zhang, Yuqi Song, Junghwan Rhee, and Jicheng Fu. 2023. Commit message can help: Security patch detection in open source software via Transformer. In the 21st IEEE/ACIS International Conference on Software Engineering Research, Management and Applications.

Cited By

View all
  • (2024)ChatGPT as an Assembly Language Interpreter for Computing EducationJournal of Computing Sciences in Colleges10.5555/3715622.371563340:2(73-82)Online publication date: 1-Oct-2024
  • (2024)Vulnerability discovery based on source code patch commit mining: a systematic literature reviewInternational Journal of Information Security10.1007/s10207-023-00795-823:2(1513-1526)Online publication date: 6-Jan-2024
  • (2023)ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with LabelsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00014-911:2(112-123)Online publication date: 15-Nov-2023

Index Terms

  1. A Comprehensive Dataset Towards Hands-on Experience Enhancement in a Research-Involved Cybersecurity Program

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGITE '23: Proceedings of the 24th Annual Conference on Information Technology Education
    October 2023
    230 pages
    ISBN:9798400701306
    DOI:10.1145/3585059
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cybersecurity education
    2. Dataset
    3. Software patch
    4. Static binary analysis
    5. System provenance

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGITE '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 176 of 429 submissions, 41%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ChatGPT as an Assembly Language Interpreter for Computing EducationJournal of Computing Sciences in Colleges10.5555/3715622.371563340:2(73-82)Online publication date: 1-Oct-2024
    • (2024)Vulnerability discovery based on source code patch commit mining: a systematic literature reviewInternational Journal of Information Security10.1007/s10207-023-00795-823:2(1513-1526)Online publication date: 6-Jan-2024
    • (2023)ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with LabelsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00014-911:2(112-123)Online publication date: 15-Nov-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media