research-article

A Comprehensive Dataset Towards Hands-on Experience Enhancement in a Research-Involved Cybersecurity Program

Authors:

Gang QianAuthors Info & Claims

SIGITE '23: Proceedings of the 24th Annual Conference on Information Technology Education

Pages 118 - 124

https://doi.org/10.1145/3585059.3611416

Published: 11 October 2023 Publication History

Abstract

Undergraduate research activities have demonstrated their ability to enhance learning outcomes, improve student retention, and foster critical thinking. In particular, cybersecurity is a highly practical discipline, and relying solely on passive learning or observation is far from sufficient. By actively engaging in research, students can better comprehend and apply the knowledge that they have acquired, thus cultivating their ability to solve practical problems. On the other hand, we observe that well-established datasets hold paramount importance for AI-driven cybersecurity applications, while pertinent data are usually a precious and scarce resource. In this paper, we introduce a comprehensive dataset SecAtlas towards research-involved cybersecurity programs using real-world cases. The dataset construction provides students with various realistic samples of cyber-attacks and vulnerabilities, which can make them experience the complexity of cybersecurity and acquire relevant skills. More importantly, SecAtlas greatly facilitates the engagement of students in subsequent data-centric cybersecurity research. Finally, we conduct a set of studies to evaluate the collected dataset. The investigation results demonstrate the effectiveness of applying the proposed dataset to practical security projects.

References

[1]

National Security Agency. 2023. Ghidra - Software Reverse Engineering Framework. https://ghidra-sre.org/. Accessed: 2023-05.

[2]

Sunwoo Ahn, Seonggwan Ahn, Hyungjoon Koo, and Yunheung Paek. 2022. Practical binary code similarity detection with BERT-based transferable similarity learning. In The Annual Computer Security Applications Conference (ACSAC).

Digital Library

[3]

Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. 2017. Neural Nets Can Learn Function Type Signatures From Binaries. In USENIX Security Symposium.

[4]

The MITRE Corporation. 2023. CVE. https://www.cve.org/. Accessed: 2023-05.

[5]

Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. 2020. Deepbindiff: Learning program-wide code representations for binary diffing. In the 27th Network and Distributed System Security Symposium (NDSS).

[6]

Han Gao, Shaoyin Cheng, Yinxing Xue, and Weiming Zhang. 2021. A lightweight framework for function name reassignment based on large-scale stripped binaries. In The ACM SIGSOFT International Symposium on Software Testing and Analysis.

Digital Library

[7]

Antonios Gkortzis, Dimitris Mitropoulos, and Diomidis Spinellis. 2018. VulinOSS: a dataset of security vulnerabilities in open-source systems. In the 15th International conference on mining software repositories.

Digital Library

[8]

Hex-Rays. 2023. IDA Pro - The Interactive Disassembler. https://www.hex-rays.com/ida-pro/. Accessed: 2023-05.

[9]

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2022. Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned. IEEE Transactions on Software Engineering 49, 4 (2022), 1661–1682.

Digital Library

[10]

Nickolaos Koroniotis, Nour Moustafa, Elena Sitnikova, and Benjamin Turnbull. 2019. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Generation Computer Systems 100 (2019), 779–796.

Digital Library

[11]

Nathaniel Lageman, Eric D Kilmer, Robert J Walls, and Patrick D McDaniel. 2016. BinDNN: Resilient Function Matching Using Deep Learning. In International Conference on Security and Privacy in Communication Systems (SecureComm).

[12]

Xuezixiang Li, Yu Qu, and Heng Yin. 2021. Palmtree: Learning an assembly language model for instruction embedding. In The ACM Conference on Computer and Communications Security (CCS).

Digital Library

[13]

National Institute of Standards and Technology. 2023. National Vulnerability Database. https://nvd.nist.gov/. Accessed: 2023-05.

[14]

GNU Project. 2023. GNU packages. https://www.gnu.org/software/software.en.html. Accessed: 2023-03-01.

[15]

Check Point Research. 2023. Cyber Security Report. https://resources.checkpoint.com/cyber-security-resources/2023-cyber-security-report. Accessed: 2023-05.

[16]

Junghwan Rhee, Myungah Park, Fei Zuo, Shuai Zhang, Gang Qian, Goutam Mylavarapu, Hong Sung, and Thomas Turner. 2023. Developing incident response-focused cybersecurity undergraduate curricula. Journal of Computing Sciences in Colleges 38, 7 (2023), 65–74.

Digital Library

[17]

Navin Sabharwal and Piyush Pandey. 2020. Container Application Monitoring Using Sysdig. In Monitoring Microservices and Containerized Applications: Deployment, Configuration, and Best Practices for Prometheus and Alert Manager. Apress, Berkeley, CA, 235–269.

[18]

IBM Security. 2022. Cost of a Data Breach Report. https://www.ibm.com/reports/data-breach. Accessed: 2023-05.

[19]

Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks. In USENIX Security Symposium.

[20]

Madhukar Shrestha, Yonghyun Kim, Jeehyun Oh, Junghwan Rhee, Yung Ryn Choe, Fei Zuo, Myungah Park, and Gang Qian. 2023. ProvSec: Cybersecurity System Provenance Analysis Benchmark Dataset. In the 21st IEEE/ACIS International Conference on Software Engineering Research, Management and Applications.

[21]

Yuan Tian, Julia Lawall, and David Lo. 2012. Identifying linux bug fixing patches. In the 34th IEEE international conference on software engineering.

Digital Library

[22]

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: jump-aware transformer for binary code similarity detection. In The ACM SIGSOFT International Symposium on Software Testing and Analysis.

Digital Library

[23]

Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, and Sushil Jajodia. 2021. PatchDB: A large-scale security patch dataset. In the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[24]

Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2021. SPI: Automated identification of security patches via commits. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 1 (2021), 1–27.

[25]

Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, and Zhexin Zhang. 2019. Neural machine translation inspired binary code similarity comparison beyond function pairs. In the 26th Network and Distributed System Security Symposium (NDSS).

[26]

Fei Zuo, Xin Zhang, Yuqi Song, Junghwan Rhee, and Jicheng Fu. 2023. Commit message can help: Security patch detection in open source software via Transformer. In the 21st IEEE/ACIS International Conference on Software Engineering Research, Management and Applications.

Cited By

Zuo FTompkins CQian GRhee JQu XYang B(2024)ChatGPT as an Assembly Language Interpreter for Computing EducationJournal of Computing Sciences in Colleges10.5555/3715622.371563340:2(73-82)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.5555/3715622.3715633
Zuo FRhee J(2024)Vulnerability discovery based on source code patch commit mining: a systematic literature reviewInternational Journal of Information Security10.1007/s10207-023-00795-823:2(1513-1526)Online publication date: 6-Jan-2024
https://dl.acm.org/doi/10.1007/s10207-023-00795-8
Shrestha MKim YOh JRhee JChoe YZuo FPark MQian G(2023)ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with LabelsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00014-911:2(112-123)Online publication date: 15-Nov-2023
https://doi.org/10.1007/s44227-023-00014-9

Index Terms

A Comprehensive Dataset Towards Hands-on Experience Enhancement in a Research-Involved Cybersecurity Program
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Information technology education

Recommendations

A Case Study of a Cybersecurity Programme: Curriculum Design, Resource Management, and Reflections
SIGCSE '20: Proceedings of the 51st ACM Technical Symposium on Computer Science Education

Cybersecurity is an area of growing international importance. In response to global shortages of Cybersecurity skills, many universities have introduced degree programmes in Cybersecurity. These programmes aim to prepare students to become Cybersecurity ...
Game based Cybersecurity Training for High School Students
SIGCSE '18: Proceedings of the 49th ACM Technical Symposium on Computer Science Education

Cybersecurity is critical to the national infrastructure, federal and local government, military, industry, and personal privacy. To defend the U.S. against the cyber threats, a significant demand for skilled cybersecurity workforce is predicted in ...
Bringing Up Cybersecurity Degree Programs: (Abstract Only)
SIGCSE '18: Proceedings of the 49th ACM Technical Symposium on Computer Science Education

Due to the ongoing demand for cybersecurity professionals, universities have begun to step up to the challenge of providing degrees in cybersecurity or related disciplines. However, growth in academic programs has been unfocused, in part because there ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGITE '23: Proceedings of the 24th Annual Conference on Information Technology Education

October 2023

230 pages

ISBN:9798400701306

DOI:10.1145/3585059

Editors:
Ying Xie
Kennesaw State University, USA
,
Becky Rutherfoord
Kennesaw State University, USA
,
Hyesung Park
Georgia Gwinnett College, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGITE: ACM Special Interest Group on Information Technology Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SIGITE '23

Sponsor:

SIGITE

SIGITE '23: The 24th Annual Conference on Information Technology Education

October 11 - 14, 2023

GA, Marietta, USA

Acceptance Rates

Overall Acceptance Rate 176 of 429 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
100
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)4

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zuo FTompkins CQian GRhee JQu XYang B(2024)ChatGPT as an Assembly Language Interpreter for Computing EducationJournal of Computing Sciences in Colleges10.5555/3715622.371563340:2(73-82)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.5555/3715622.3715633
Zuo FRhee J(2024)Vulnerability discovery based on source code patch commit mining: a systematic literature reviewInternational Journal of Information Security10.1007/s10207-023-00795-823:2(1513-1526)Online publication date: 6-Jan-2024
https://dl.acm.org/doi/10.1007/s10207-023-00795-8
Shrestha MKim YOh JRhee JChoe YZuo FPark MQian G(2023)ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with LabelsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00014-911:2(112-123)Online publication date: 15-Nov-2023
https://doi.org/10.1007/s44227-023-00014-9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten