skip to main content
survey

Vulnerabilities and Security Patches Detection in OSS: A Survey

Published: 07 October 2024 Publication History

Abstract

Over the past decade, Open Source Software (OSS) has experienced rapid growth and widespread adoption, attributed to its openness and editability. However, this expansion has also brought significant security challenges, particularly introducing and propagating software vulnerabilities. Despite the use of machine learning and formal methods to tackle these issues, there remains a notable gap in comprehensive surveys that summarize and analyze both Vulnerability Detection (VD) and Security Patch Detection (SPD) in OSS. This article seeks to bridge this gap through an extensive survey that evaluates 127 technical studies published between 2014 and 2023, structured around the Vulnerability-Patch lifecycle. We begin by delineating the six critical events that constitute the Vulnerability-Patch lifecycle, leading to an in-depth exploration of the Vulnerability-Patch ecosystem. We then systematically review the databases commonly used in VD and SPD, and analyze their characteristics. Subsequently, we examine existing VD methods, focusing on traditional and deep learning based approaches. Additionally, we organize current security patch identification methods by kernel type and discuss techniques for detecting the presence of security patches. Based on our comprehensive review, we identify open research questions and propose future research directions that merit further exploration.

References

[1]
Mahmoud Alfadel, Diego Elias Costa, and Emad Shihab. 2023. Empirical analysis of security vulnerabilities in Python packages. Empirical Software Engineering 28, 3 (2023), 59.
[2]
Wenyan An, Liwei Chen, Jinxin Wang, Gewangzi Du, Gang Shi, and Dan Meng. 2020. AVDHRAM: Automated vulnerability detection based on hierarchical representation and attention mechanism. In Proceedings of the 2020 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, and Social Computing and Networking (ISPA/BDCloud/SocialCom/SustainCom’20). IEEE, 337–344.
[3]
Android. 2023. Android Security Bulletins. Retrieved September 10, 2024 from https://source.android.com/security/bulletin
[4]
Android. 2023. Dalvik Executable Format. Retrieved September 10, 2024 from https://source.android.com/devices/tech/dalvik/dex-format.html
[5]
Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to fix bugs automatically. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–27.
[6]
Deepali Bassi and Hardeep Singh. 2023. A systematic literature review on software vulnerability prediction models. IEEE Access 11 (2023), 110289–110311.
[7]
Rohan Bavishi, Hiroaki Yoshida, and Mukul R. Prasad. 2019. Phoenix: Automated data-driven synthesis of repairs for static analysis violations. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 613–624.
[8]
Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. 2007. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering 33, 9 (2007), 577–591.
[9]
Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: Automated collection of vulnerabilities and their fixes from open-source software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 30–39.
[10]
Philippe Biondi, Raphaël Rigo, Sarah Zennou, and Xavier Mehrenberger. 2017. BinCAT: Purrfecting binary static analysis. In Proceedings of the Symposium sur la sécurité des technologies de linformation et des communications.
[11]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2329–2344.
[12]
Quang-Cuong Bui, Riccardo Scandariato, and Nicolás E. Díaz Ferreyra. 2022. Vul4j: A dataset of reproducible Java vulnerabilities geared towards the study of program repair techniques. In Proceedings of the 19th International Conference on Mining Software Repositories. 464–468.
[13]
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08), Vol. 8. 209–224.
[14]
Cristian Cadar and Dawson Engler. 2005. Execution generated test cases: How to make systems code crash itself. In Proceedings of the International SPIN Workshop on Model Checking of Software. 2–23.
[15]
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008. EXE: Automatically generating inputs of death. ACM Transactions on Information and System Security 12, 2 (2008), 1–38.
[16]
Sicong Cao, Xiaobing Sun, Lili Bo, Ying Wei, and Bin Li. 2021. BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection. Information and Software Technology 136 (2021), 106576.
[17]
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2022. Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering 48 (2022), 3280–3296.
[18]
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. BINGO: Cross-architecture cross-OS binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 678–689.
[19]
Checkmarx. 2023. Home Page. Retrieved September 10, 2024 from https://www.checkmarx.com/
[20]
Peng Chen, Jianzhong Liu, and Hao Chen. 2019. Matryoshka: Fuzzing deeply nested branches. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 499–513.
[21]
Tianyu Chen, Lin Li, Taotao Qian, Zeyu Wang, Guangtai Liang, Ding Li, Qianxiang Wang, and Tao Xie. 2023. Identifying vulnerability patches by comprehending code commits with comprehensive change contexts. arXiv preprint arXiv:2310.02530 (2023).
[22]
Xiarun Chen, Qien Li, Zhou Yang, Yongzhi Liu, Shaosen Shi, Chenglin Xie, and Weiping Wen. 2021. VulChecker: Achieving more effective taint analysis by identifying sanitizers automatically. In Proceedings of the 2021 IEEE 20th International Conference on Trust, Security, and Privacy in Computing and Communications (TrustCom’21). IEEE, 774–782.
[23]
Yizheng Chen, Zhoujie Ding, Lamya Alowain, Xinyun Chen, and David Wagner. 2023. DiverseVul: A new vulnerable source code dataset for deep learning based vulnerability detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions, and Defenses. 654–668.
[24]
Yaohui Chen, Peng Li, Jun Xu, Shengjian Guo, Rundong Zhou, Yulong Zhang, Tao Wei, and Long Lu. 2020. SAVIOR: Towards bug-driven hybrid testing. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP’20). IEEE, 1580–1596.
[25]
Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, and Yulei Sui. 2021. DeepWukong: Statically detecting software vulnerabilities using deep graph neural network. ACM Transactions on Software Engineering and Methodology 30, 3 (2021), 1–33.
[26]
Boris Chernis and Rakesh Verma. 2018. Machine learning methods for software vulnerability detection. In Proceedings of the 4th ACM International Workshop on Security and Privacy Analytics. 31–39.
[27]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[28]
Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hai Jin. 2024. Graph neural networks for vulnerability detection: A counterfactual explanation. arXiv preprint arXiv:2404.15687 (2024).
[29]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[30]
Lei Cui, Zhiyu Hao, Yang Jiao, Haiqiang Fei, and Xiaochun Yun. 2020. VulDetector: Detecting vulnerabilities using weighted feature graph comparison. IEEE Transactions on Information Forensics and Security 16 (2020), 2004–2017.
[31]
Cyber Safety Review Board. 2021. Review of the December 2021 Log4j Event. Retrieved September 9, 2024 from https://www.cisa.gov/sites/default/files/publications/CSRB-Report-on-Log4-July-11-2022_508.pdf
[32]
Xiaowen Da, Limin Mao, and Mingjie Wu. 2017. Research on a vulnerability location technology based on patch matching and static taint analysis. Netinfo Security17, 9 (2017), 5–9.
[33]
Hanjun Dai, Bo Dai, and Le Song. 2016. Discriminative embeddings of latent variable models for structured data. In Proceedings of the International Conference on Machine Learning. 2702–2711.
[34]
Hoa Khanh Dam, Truyen Tran, Trang Pham, Shien Wee Ng, John Grundy, and Aditya Ghose. 2017. Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:1708.02368 (2017).
[35]
Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical similarity of binaries. ACM SIGPLAN Notices 51, 6 (2016), 266–280.
[36]
Yaniv David, Nimrod Partush, and Eran Yahav. 2018. FirmUp: Precise static detection of common vulnerabilities in firmware. ACM SIGPLAN Notices 53, 2 (2018), 392–404.
[37]
Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. ACM SIGPLAN Notices 49, 6 (2014), 349–360.
[38]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional Transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[39]
Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2016. Kam1n0: MapReduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 461–470.
[40]
Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP’19). IEEE, 472–489.
[41]
Trevor Dunlap, Elizabeth Lin, William Enck, and Bradley Reaves. 2023. VFCFinder: Seamlessly pairing security advisories and patches. arXiv preprint arXiv:2311.01532 (2023).
[42]
Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. 2014. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). 303–317.
[43]
William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. 2014. TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems 32, 2 (2014), 1–29.
[44]
Michael English, Chris Exton, Irene Rigon, and Brendan Cleary. 2009. Fault detection and prediction in an open-source software project. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering. 1–11.
[45]
Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient cross-architecture identification of bugs in binary code. In Proceedings of the Network and Distributed System Security Symposium (NDSS’16), Vol. 52. 58–79.
[46]
Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. AC/C++ code vulnerability dataset with code changes and CVE summaries. In Proceedings of the 17th International Conference on Mining Software Repositories. 508–512.
[47]
Mohammad Reza Farhadi, Benjamin C. M. Fung, Philippe Charland, and Mourad Debbabi. 2014. BinClone: Detecting code clones in malware. In Proceedings of the 2014 8th International Conference on Software Security and Reliability (SERE’14). IEEE, 78–87.
[48]
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 480–491.
[49]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
[50]
Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems 9, 3 (1987), 319–349.
[51]
Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: A Transformer-based line-level vulnerability prediction. In Proceedings of the 19th International Conference on Mining Software Repositories. 608–620.
[52]
Michael Fu, Chakkrit Kla Tantithamthavorn, Van Nguyen, and Trung Le. 2023. ChatGPT for vulnerability detection, classification, and repair: How far are we? In Proceedings of the 2023 30th Asia-Pacific Software Engineering Conference (APSEC’23). IEEE, 632–636.
[53]
Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2017. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Computing Surveys 50, 4 (2017), 1–36.
[54]
Git. 2023. Home Page. Retrieved September 10, 2024 from https://git-scm.com/
[55]
GitHub. 2023. Google/Honggfuzz. Retrieved September 10, 2024 from https://github.com/google/honggfuzz
[56]
Google. 2024. Open Source Vulnerability. Retrieved September 10, 2024 from https://osv.dev/
[57]
Google Project Zero. 2019. Five Years of “Make 0Day Hard.” Retrieved September 9, 2024 from https://i.blackhat.com/USA-19/Thursday/us-19-Hawkes-Project-Zero-Five-Years-Of-Make-0day-Hard.pdf
[58]
Marco Gori, Gabriele Monfardini, and Franco Scarselli. 2005. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Vol. 2. 729–734.
[59]
Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18, 5-6 (2005), 602–610.
[60]
Zhibin Guan, Xiaomeng Wang, Wei Xin, Jiajie Wang, and Li Zhang. 2020. A survey on deep learning-based source code defect analysis. In Proceedings of the 2020 5th International Conference on Computer and Communication Systems (ICCCS’20). IEEE, 167–171.
[61]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2020. GraphCodeBERT: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
[62]
Hazim Hanif, Mohd Hairul Nizam Md. Nasir, Mohd Faizal Ab Razak, Ahmad Firdaus, and Nor Badrul Anuar. 2021. The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches. Journal of Network and Computer Applications 179 (2021), 103009.
[63]
David Hin, Andrey Kan, Huaming Chen, and M. Ali Babar. 2022. LineVD: Statement-level vulnerability detection using graph neural networks. In Proceedings of the 19th International Conference on Mining Software Repositories. 596–607.
[64]
Yutao Hu, Suyuan Wang, Wenke Li, Junru Peng, Yueming Wu, Deqing Zou, and Hai Jin. 2023. Interpreters for GNN-based vulnerability detection: Are we there yet? In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1407–1419.
[65]
Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2017. Binary code clone detection across architectures and compiling configurations. In Proceedings of the 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC’17). IEEE, 88–98.
[66]
Emanuele Iannone, Roberta Guadagni, Filomena Ferrucci, Andrea De Lucia, and Fabio Palomba. 2022. The secret life of software vulnerabilities: A large-scale empirical study. IEEE Transactions on Software Engineering 49, 1 (2022), 44–63.
[67]
Sanghoon Jeon and Huy Kang Kim. 2021. AutoVAS: An automated vulnerability analysis system with a deep learning approach. Computers & Security 106 (2021), 102308.
[68]
Jiajun Jiang, Luyao Ren, Yingfei Xiong, and Lingming Zhang. 2019. Inferring program transformations from singular examples via big code. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). IEEE, 255–266.
[69]
Zheyue Jiang, Yuan Zhang, Jun Xu, Qi Wen, Zhenghe Wang, Xiaohan Zhang, Xinyu Xing, Min Yang, and Zhemin Yang. 2020. PDiff: Semantic-based patch presence testing for downstream kernels. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1149–1163.
[70]
Joern. 2023. Home Page. Retrieved September 10, 2024 from http://mlsec.org/joern/
[71]
Rauli Kaksonen, Marko Laakso, and Ari Takanen. 2001. Software security assessment through specification mutations and fault injection. In Communications and Multimedia Security Issues of the New Century. Springer, 173–183.
[72]
Wooseok Kang, Byoungho Son, and Kihong Heo. 2022. TRACER: Signature-based static analysis for detecting recurring vulnerabilities. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1695–1708.
[73]
Evangelos Katsadouros and Charalampos Patrikakis. 2022. A survey on vulnerability prediction using GNNs. In Proceedings of the 26th Pan-Hellenic Conference on Informatics. 38–43.
[74]
Staffs Keele and others. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical report, ver. 2.3 ebse technical report. ebse.
[75]
Soolin Kim, Jusop Choi, Muhammad Ejaz Ahmed, Surya Nepal, and Hyoungshick Kim. 2022. VulDeBERT: A vulnerability detection system using BERT. In Proceedings of the 2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW’22). IEEE, 69–74.
[76]
Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A scalable approach for vulnerable code clone discovery. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP’17). IEEE, 595–614.
[77]
Jesse Kornblum. 2006. Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3 (2006), 91–97.
[78]
Zhe Lang, Shouguo Yang, Yiran Cheng, Xiaoling Zhang, Zhiqiang Shi, and Limin Sun. 2021. PMatch: Semantic-based patch detection for binary programs. In Proceedings of the 2021 IEEE International Performance, Computing, and Communications Conference (IPCCC’21). IEEE, 1–10.
[79]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188–1196.
[80]
Triet Huynh Minh Le, David Hin, Roland Croft, and M. Ali Babar. 2021. DeepCVA: Automated commit-level vulnerability assessment with deep multi-task learning. In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 717–729.
[81]
Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object recognition with gradient-based learning. In Shape, Contour and Grouping in Computer Vision. Springer, 319–345.
[82]
Seungsoo Lee, Changhoon Yoon, Chanhee Lee, Seungwon Shin, Vinod Yegneswaran, and Phillip A. Porras. 2017. DELTA: A security assessment framework for software-defined networks. In Proceedings of the Network and Distributed System Security Symposium (NDSS’17).
[83]
Hongzhe Li, Hyuckmin Kwon, Jonghoon Kwon, and Heejo Lee. 2014. A scalable approach for vulnerability discovery based on security patches. In Proceedings of the International Conference on Applications and Techniques in Information Security. 109–122.
[84]
Hongrui Li, Lili Zhou, Mingming Xing, and Hafsah Binti Taha. 2021. Vulnerability detection algorithm of lightweight Linux Internet of Things application with symbolic execution method. In Proceedings of the 2021 International Symposium on Computer Technology and Information Science (ISCTIS’21). 24–27.
[85]
Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. 2017. CCLearner: A deep learning-based clone detection approach. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME’17). IEEE, 249–260.
[86]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
[87]
Yi Li, Shaohua Wang, and Tien N. Nguyen. 2021. Vulnerability detection with fine-grained interpretations. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 292–303.
[88]
Zhen Li, Deqing Zou, Shouhuai Xu, Zhaoxuan Chen, Yawei Zhu, and Hai Jin. 2021. VulDeeLocator: A deep learning-based fine-grained vulnerability detector. IEEE Transactions on Dependable and Secure Computing 19, 4 (2021), 2821–2837.
[89]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: An automated vulnerability detection system based on code similarity analysis. In Proceedings of the 32nd Annual Conference on Computer Security Applications. 201–213.
[90]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2021), 2244–2258.
[91]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018).
[92]
Guanjun Lin, Sheng Wen, Qing-Long Han, Jun Zhang, and Yang Xiang. 2020. Software vulnerability detection using deep neural networks: A survey. Proceedings of the IEEE 108, 10 (2020), 1825–1848.
[93]
Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, and Yang Xiang. 2017. POSTER: Vulnerability discovery with function representation learning from unlabeled projects. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2539–2541.
[94]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[95]
LLVM Compiler Infrastructure. 2023. LibFuzzer. Retrieved September 10, 2024 from https://llvm.org/docs/LibFuzzer.html
[96]
Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic inference of code transforms for patch generation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 727–739.
[97]
Aravind Machiry, Nilo Redini, Eric Camellini, Christopher Kruegel, and Giovanni Vigna. 2020. SPIDER: Enabling fast patch propagation in related software repositories. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP’20). IEEE, 1562–1579.
[98]
Valentin J. M. Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo. 2019. The art, science, and engineering of fuzzing: A survey. IEEE Transactions on Software Engineering 47, 11 (2019), 2312–2331.
[99]
George Mathew, Chris Parnin, and Kathryn T. Stolee. 2020. SLACC: Simion-based language agnostic code clones. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 210–221.
[100]
Michal Zalewski. 2023. American Fuzzy Lop. Retrieved September 10, 2024 from https://lcamtuf.coredump.cx/afl/
[101]
Microsoft. 2023. Microsoft Security Blog. Retrieved September 10, 2024 from https://www.microsoft.com/security/blog/
[102]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[103]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 (2013), 1–9.
[104]
Yisroel Mirsky, George Macon, Michael Brown, Carter Yagemann, Matthew Pruett, Evan Downing, Sukarno Mertoguno, and Wenke Lee. 2023. VulChecker: Graph-based vulnerability localization in source code. In Proceedings of the 31st USENIX Security Symposium.
[105]
Mozilla Security. 2023. Peach fuzzing platform. Retrieved September 12, 2024 from https://community.peachfuzzer.com/WhatIsPeach.html
[106]
National Institute of Standards and Technology. 2023. National Vulnerability Database. Retrieved September 10, 2024 from https://nvd.nist.gov/vuln
[107]
National Institute of Standards and Technology. 2023. NIST Software Assurance Reference Dataset. Retrieved September 10, 2024 from https://samate.nist.gov/SARD
[108]
National Institute of Standards and Technology. 2023. Common Platform Enumeration. Retrieved September 10, 2024 from https://nvd.nist.gov/products/cpe
[109]
Weina Niu, Xiaosong Zhang, Xiaojiang Du, Lingyuan Zhao, Rong Cao, and Mohsen Guizani. 2020. A deep learning based static taint analysis approach for IoT software vulnerability location. Measurement 152 (2020), 107139.
[110]
Yu Nong, Mohammed Aldeen, Long Cheng, Hongxin Hu, Feng Chen, and Haipeng Cai. 2024. Chain-of-thought prompting of large language models for discovering and fixing software vulnerabilities. arXiv preprint arXiv:2402.17230 (2024).
[111]
Shengyi Pan, Jiayuan Zhou, Filipe Roseiro Cogo, Xin Xia, Lingfeng Bao, Xing Hu, Shanping Li, and Ahmed E. Hassan. 2022. Automated unearthing of dangerous issue reports. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 834–846.
[112]
Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-architecture bug search in binary executables. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, 709–724.
[113]
Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and Cédric Dangremont. 2019. A manually-curated dataset of fixes to vulnerabilities of open-source software. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR’19). IEEE, 383–387.
[114]
Moumita Das Purba, Arpita Ghosh, Benjamin J. Radford, and Bill Chu. 2023. Software vulnerability detection using large language models. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW’23). IEEE, 112–119.
[115]
Weizhong Qiang, Yuehua Liao, Guozhong Sun, Laurence T. Yang, Deqing Zou, and Hai Jin. 2017. Patch-related vulnerability detection based on symbolic execution. IEEE Access 5 (2017), 20777–20784.
[116]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever2018. Improving language understanding by generative pre-training. Preprint.
[117]
David A. Ramos and Dawson Engler. 2015. Under-constrained symbolic execution: Correctness checking for real code. In Proceedings of the 24th USENIX Security Symposium (USENIX Security’15). 49–64.
[118]
Sofia Reis and Rui Abreu. 2021. A ground-truth dataset of real security patches. arXiv preprint arXiv:2110.09635 (2021).
[119]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 49–61.
[120]
Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning syntactic program transformations from examples. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 404–415.
[121]
Chanchal Kumar Roy and James R. Cordy. 2007. A survey on software clone detection research. Queen’s School of Computing TR 541, 115 (2007), 64–68.
[122]
Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18). IEEE, 757–762.
[123]
Fayozbek Rustamov, Juhwan Kim, Jihyeon Yu, and Joobeom Yun. 2021. Exploratory review of hybrid fuzzing for automated vulnerability detection. IEEE Access 9 (2021), 131166–131190.
[124]
Andreas Saebjornsen. 2014. Detecting Fine-Grained Similarity in Binaries. University of California, Davis.
[125]
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes. 2018. Oreo: Detection of clones in the twilight zone. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 354–365.
[126]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering. 1157–1168.
[127]
Arthur D. Sawadogo, Tegawendé F. Bissyandé, Naouel Moha, Kevin Allix, Jacques Klein, Li Li, and Yves Le Traon. 2022. SSPCatcher: Learning to catch security patches. Empirical Software Engineering 27, 6 (2022), 151.
[128]
S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[129]
Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. 2010. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In Proceedings of the 2010 IEEE Symposium on Security and Privacy. IEEE, 317–331.
[130]
Abubakar Omari Abdallah Semasaba, Wei Zheng, Xiaoxue Wu, and Samuel Akwasi Agyemang. 2020. Literature survey of deep learning-based vulnerability analysis on source code. IET Software 14, 6 (2020), 654–664.
[131]
Lucas Serrano, Van-Anh Nguyen, Ferdian Thung, Lingxiao Jiang, David Lo, Julia Lawall, and Gilles Muller. 2020. SPINFER. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC’20). 235–248.
[132]
Muhammad Shahzad, Muhammad Zubair Shafiq, and Alex X. Liu. 2012. A large scale exploratory analysis of software vulnerability life cycles. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE’12). IEEE, 771–781.
[133]
Ridwan Shariffdeen, Xiang Gao, Gregory J. Duck, Shin Hwei Tan, Julia Lawall, and Abhik Roychoudhury. 2021. Automated patch backporting in Linux (experience paper). In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 633–645.
[134]
Zhidong Shen and Si Chen. 2020. A survey of automatic software vulnerability detection, program repair, and defect prediction techniques. Security and Communication Networks 2020 (2020), 1–16.
[135]
Abdullah Sheneamer and Jugal Kalita. 2016. A survey of software clone detection techniques. International Journal of Computer Applications 137, 10 (2016), 1–21.
[136]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK: (State of) the art of war: Offensive techniques in binary analysis. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP’16). IEEE, 138–157.
[137]
Kanchan Singh, Sakshi S. Grover, and Ranjini Kishen Kumar. 2022. Cyber security vulnerability detection using natural language processing. In Proceedings of the 2022 IEEE World AI IoT Congress (AIIoT’22). IEEE, 174–178.
[138]
Benjamin Steenhoek, Md. Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Earl T. Barr, and Wei Le. 2024. A comprehensive study of the capabilities of large language models for vulnerability detection. arXiv preprint arXiv:2403.17218 (2024).
[139]
Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting fuzzing through selective symbolic execution. In Proceedings of the Network and Distributed System Security Symposium (NDSS’16), Vol. 16. 1–16.
[140]
Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 476–480.
[141]
Jeffrey Svajlenko and Chanchal K. Roy. 2020. A survey on the evaluation of clone detection performance and benchmarking. arXiv preprint arXiv:2006.15682 (2020).
[142]
Xin Tan, Yuan Zhang, Jiajun Cao, Kun Sun, Mi Zhang, and Min Yang. 2022. Understanding the practice of security patch management across multiple branches in OSS projects. In Proceedings of the ACM Web Conference 2022. 767–777.
[143]
Xin Tan, Yuan Zhang, Chenyuan Mi, Jiajun Cao, Kun Sun, Yifan Lin, and Min Yang. 2021. Locating the security patches for disclosed OSS vulnerabilities with vulnerability-commit correlation ranking. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 3282–3299.
[144]
Wei Tang, Mingwei Tang, Minchao Ban, Ziguo Zhao, and Mingjun Feng. 2023. CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection. Journal of Systems and Software 199 (2023), 111623.
[145]
Xunzhu Tang, Zhenghan Chen, Kisub Kim, Haoye Tian, Saad Ezzini, and Jacques Klein. 2023. Just-in-time security patch detection—LLM at the rescue for data augmentation. arXiv preprint arXiv:2312.01241 (2023).
[146]
Xunzhu Tang, Zhenghan Chen, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, and Tegawende F. Bissyande. 2023. Multilevel semantic embedding of software patches: A fine-to-coarse grained approach towards security patch detection. arXiv preprint arXiv:2308.15233 (2023).
[147]
The MITRE Corporation. 2021. Common Vulnerability and Exposures. Retrieved September 10, 2024 from https://www.cve.org
[148]
The MITRE Corporation. 2021. CVE Details. Retrieved September 10, 2024 from https://www.cvedetails.com/
[149]
The MITRE Corporation. 2024. Common Weakness Enumeration. Retrieved September 10, 2024 from https://cwe.mitre.org/
[150]
Frank Tip. 1994. A Survey of Program Slicing Techniques. Centrum voor Wiskunde en Informatica, Amsterdam.
[151]
Ubuntu. 2023. Ubuntu CVE Reports. Retrieved September 10, 2024 from https://ubuntu.com/security/cves
[152]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.
[153]
Spandan Veggalam, Sanjay Rawat, Istvan Haller, and Herbert Bos. 2016. IFuzzer: An evolutionary interpreter fuzzer using genetic programming. In Proceedings of the European Symposium on Research in Computer Security. 581–601.
[154]
Huanting Wang, Guixin Ye, Zhanyong Tang, Shin Hwei Tan, Songfang Huang, Dingyi Fang, Yansong Feng, Lizhong Bian, and Zheng Wang. 2020. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Transactions on Information Forensics and Security 16 (2020), 1943–1958.
[155]
Jingjing Wang, Minhuan Huang, Yuanping Nie, and Jin Li. 2021. Static analysis of source code vulnerability using machine learning techniques: A survey. In Proceedings of the 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD’21). IEEE, 76–86.
[156]
Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K. Roy. 2018. CCAligner: A token based large-gap clone detector. In Proceedings of the 40th International Conference on Software Engineering. 1066–1077.
[157]
Shu Wang, Xinda Wang, Kun Sun, Sushil Jajodia, Haining Wang, and Qi Li. 2022. GraphSPD: Graph-based security patch detection with enriched code semantics. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP’22). IEEE, 604–621.
[158]
Xinda Wang, Kun Sun, Archer Batcheller, and Sushil Jajodia. 2019. Detecting “0-day” vulnerability: An empirical study of secret security patch in OSS. In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’19). IEEE, 485–492.
[159]
Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, and Sushil Jajodia. 2021. PatchDB: A large-scale security patch dataset. In Proceedings of the 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’21). IEEE, 149–160.
[160]
Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, Sushil Jajodia, Sanae Benchaaboun, and Frank Geck. 2021. PatchRNN: A deep learning-based system for security patch identification. In Proceedings of the 2021 IEEE Military Communications Conference (MILCOM’21). IEEE, 595–600.
[161]
Laura Wartschinski, Yannic Noller, Thomas Vogel, Timo Kehrer, and Lars Grunske. 2022. VUDENC: Vulnerability detection with deep learning on a natural codebase for Python. Information and Software Technology 144 (2022), 106809.
[162]
Huihui Wei and Ming Li. 2017. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3034–3040.
[163]
Mark Weiser. 1984. Program slicing. IEEE Transactions on Software Engineering4 (1984), 352–357.
[164]
Xin-Cheng Wen, Yupan Chen, Cuiyun Gao, Hongyu Zhang, Jie M. Zhang, and Qing Liao. 2023. Vulnerability detection with graph simplification and enhanced graph representation learning. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE’23). IEEE, 2275–2286.
[165]
Seunghoon Woo, Hyunji Hong, Eunjin Choi, and Heejo Lee. 2022. MOVERY: A precise approach for modified vulnerable code clone discovery from modified open-source software components. In Proceedings of the 31st USENIX Security Symposium (USENIX Security’22). 3037–3053.
[166]
Bozhi Wu, Shangqing Liu, Ruitao Feng, Xiaofei Xie, Jingkai Siow, and Shang-Wei Lin. 2022. Enhancing security patch identification by capturing structures in commits. IEEE Transactions on Dependable and Secure Computing 2022 (2022), 1–15.
[167]
Yueming Wu, Deqing Zou, Shihan Dou, Wei Yang, Duo Xu, and Hai Jin. 2022. VulCNN: An image-inspired scalable vulnerability detection system. In Proceedings of the 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). IEEE, 2365–2376.
[168]
Yang Xiao, Bihuan Chen, Chendong Yu, Zhengzi Xu, Zimu Yuan, Feng Li, Binghong Liu, Yang Liu, Wei Huo, Wei Zou, and Wenchang Shi. 2020. MVP: Detecting vulnerabilities using patch-enhanced vulnerability signatures. In Proceedings of the 29th USENIX Security Symposium (USENIX Security’20). 1165–1182.
[169]
Congying Xu, Bihuan Chen, Chenhao Lu, Kaifeng Huang, Xin Peng, and Yang Liu. 2021. TRACER: Finding patches for open source software vulnerabilities. arXiv preprint arXiv:2112.02240 (2021).
[170]
Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch based vulnerability matching for binary programs. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 376–387.
[171]
Zhengzi Xu, Bihuan Chen, Mahinthan Chandramohan, Yang Liu, and Fu Song. 2017. SPAIN: Security patch analysis for binaries towards understanding the pain and pills. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 462–472.
[172]
Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of the 2014 IEEE Symposium on Security and Privacy. IEEE, 590–604.
[173]
Fabian Yamaguchi, Christian Wressnegger, Hugo Gascon, and Konrad Rieck. 2013. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security. 499–510.
[174]
Shouguo Yang, Zhengzi Xu, Yang Xiao, Zhe Lang, Wei Tang, Yang Liu, Zhiqiang Shi, Hong Li, and Limin Sun. 2023. Towards practical binary code similarity detection: Vulnerability verification via patch semantic analysis. ACM Transactions on Software Engineering and Methodology 32, 6 (2023), 1–29.
[175]
Yuan Yuan, Weiqiang Kong, Gang Hou, Yan Hu, Masahiko Watanabe, and Akira Fukuda. 2020. From local to global semantic clone detection. In Proceedings of the 2019 6th International Conference on Dependable Systems and Their Applications (DSA’20). IEEE, 13–24.
[176]
Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A practical concolic execution engine tailored for hybrid fuzzing. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). 745–761.
[177]
Peng Zeng, Guanjun Lin, Lei Pan, Yonghang Tai, and Jun Zhang. 2020. Software vulnerability analysis and discovery using deep learning techniques: A survey. IEEE Access 8 (2020), 197158–197172.
[178]
Chenyuan Zhang, Hao Liu, Jiutian Zeng, Kejing Yang, Yuhong Li, and Hui Li. 2023. Prompt-enhanced software vulnerability detection using ChatGPT. arXiv preprint arXiv:2308.12697 (2023).
[179]
Hang Zhang and Zhiyun Qian. 2018. Precise and accurate patch presence test for binaries. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). 887–902.
[180]
Haibo Zhang and Kouichi Sakurai. 2021. A survey of software clone detection from security perspective. IEEE Access 9 (2021), 48157–48173.
[181]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 783–794.
[182]
Yingzhou Zhang. 2023. LLVM-Slicing. Retrieved September 10, 2024 from https://github.com/zhangyz/llvm-slicing
[183]
Yifan Zhang, Junwen Yang, Haoyu Dong, Qingchen Wang, Huajie Shao, Kevin Leach, and Yu Huang. 2022. ASTRO: An AST-assisted approach for generalizable neural clone detection. arXiv preprint arXiv:2208.08067 (2022).
[184]
Zheng Zhang, Hang Zhang, Zhiyun Qian, and Billy Lau. 2021. An investigation of the Android kernel patch ecosystem. In Proceedings of the 30th USENIX Security Symposium (USENIX Security’21). 3649–3666.
[185]
Gang Zhao and Jeff Huang. 2018. DeepSim: Deep learning code functional similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 141–151.
[186]
Lei Zhao, Yue Duan, Heng Yin, and Jifeng Xuan. 2019. Send hardest problems my way: Probabilistic path prioritization for hybrid fuzzing. In Proceedings of the Network and Distributed System Security Symposium (NDSS’19).
[187]
Qianchong Zhao, Cheng Huang, and Liuhu Dai. 2023. VULDEFF: Vulnerability detection method based on function fingerprints and code differences. Knowledge-Based Systems 260 (2023), 110–139.
[188]
Weining Zheng, Yuan Jiang, and Xiaohong Su. 2021. Vu1SPG: Vulnerability detection based on slice property graph representation learning. In Proceedings of the 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE’21). IEEE, 457–467.
[189]
Jiayuan Zhou, Michael Pacheco, Zhiyuan Wan, Xin Xia, David Lo, Yuan Wang, and Ahmed E. Hassan. 2021. Finding a needle in a haystack: Automated mining of silent vulnerability fixes. In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 705–716.
[190]
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in Neural Information Processing Systems 32 (2019), 1–11.
[191]
Yaqin Zhou and Asankhaya Sharma. 2017. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 914–919.
[192]
Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2021. SPI: Automated identification of security patches via commits. ACM Transactions on Software Engineering and Methodology 31, 1 (2021), 1–27.
[193]
Noah Ziems and Shaoen Wu. 2021. Security vulnerability detection using deep learning natural language processing. In Proceedings of the 2021 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS’21). IEEE, 1–6.
[194]
Deqing Zou, Hanchao Qi, Zhen Li, Song Wu, Hai Jin, Guozhong Sun, Sujuan Wang, and Yuyi Zhong. 2017. SCVD: A new semantics-based approach for cloned vulnerable code detection. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 325–344.
[195]
Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2019. \(\mu\)VulDeePecker: A deep learning-based system for multiclass vulnerability detection. IEEE Transactions on Dependable and Secure Computing 18, 5 (2019), 2224–2236.
[196]
Yue Zou, Bihuan Ban, Yinxing Xue, and Yun Xu. 2020. CCGraph: A PDG-based code clone detector with approximate graph matching. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). IEEE, 931–942.
[197]
Fei Zuo and Junghwan Rhee. 2024. Vulnerability discovery based on source code patch commit mining: A systematic literature review. International Journal of Information Security 12 (2024), 1513–1526.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 57, Issue 1
January 2025
984 pages
EISSN:1557-7341
DOI:10.1145/3696794
  • Editors:
  • David Atienza,
  • Michela Milano
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2024
Online AM: 09 September 2024
Accepted: 22 August 2024
Revised: 26 July 2024
Received: 09 June 2023
Published in CSUR Volume 57, Issue 1

Check for updates

Author Tags

  1. Open source software
  2. vulnerability detection
  3. security patch detection
  4. software security
  5. AI security

Qualifiers

  • Survey

Funding Sources

  • National Key R&D Program of China
  • Natural Science Basic Research Program of Shaanxi Province
  • Fundamental Research Funds for the Central Universities
  • Tencent Security Yunding Lab

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 1,167
    Total Downloads
  • Downloads (Last 12 months)1,167
  • Downloads (Last 6 weeks)229
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media