skip to main content
10.1145/3643991.3644885acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

AW4C: A Commit-Aware C Dataset for Actionable Warning Identification

Published: 02 July 2024 Publication History

Abstract

Excessive non-actionable warnings generated by static program analysis tools can hinder developers from utilizing these tools effectively. Leveraging learning-based approaches for actionable warning identification has demonstrated promise in boosting developer productivity, minimizing the risk of bugs, and reducing code smells. However, the small sizes of existing datasets have limited the model choices for machine learning researchers, and the lack of aligned fix commits limits the scope of the dataset for research. In this paper, we present AW4C, an actionable warning C dataset that contains 38,134 actionable warnings mined from more than 500 repositories on GitHub. These warnings are generated via Cppcheck, and most importantly, each warning is precisely mapped to the commit where the corrective action occurred. To the best of our knowledge, this is the largest publicly available actionable warning dataset for C programming language to date. The dataset is suited for use in machine/deep learning models and can support a wide range of tasks, such as actionable warning identification and vulnerability detection. Furthermore, we have released our dataset1 and a general framework for collecting actionable warnings on GitHub2 to facilitate other researchers to replicate our work and validate their innovative ideas.

References

[1]
Enas A. Alikhashashneh, Rajeev R. Raje, and James H. Hill. 2018. Using Machine Learning Techniques to Classify and Predict Static Code Analysis Tool Warnings. In 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA).
[2]
Anon. [n. d.]. Reducing False Positives of Static Program Analysis in Industry. https://conf.researchr.org/getImage/ase-2023/orig/Challenge.pdf
[3]
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2022. Deep Learning based Vulnerability Detection: Are We There Yet? IEEE Transactions on Software Engineering (Sep 2022), 3280--3296.
[4]
Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering.
[5]
Xiuting Ge, Chunrong Fang, Jia Liu, Mingshuang Qing, Xuanye Li, and Zhihong Zhao. 2023. An unsupervised feature selection approach for actionable warning identification. Expert Systems with Applications 227 (Oct 2023), 120152.
[6]
Quinn Hanam, Lin Tan, Reid Holmes, and Patrick Lam. 2014. Finding patterns in static analysis alerts: improving actionable alert ranking. In Proceedings of the 11th Working Conference on Mining Software Repositories.
[7]
Sarah Heckman and Laurie Williams. 2011. A systematic literature review of actionable alert identification techniques for automated static code analysis. Information and Software Technology (Apr 2011), 363--387.
[8]
HongJin Kang, KhaiLoong Aw, and David Lo. [n. d.]. Detecting False Alarms from Automatic Static Analysis Tools: How Far are We? ([n. d.]).
[9]
Anant Kharkar, RoshanakZilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin Clement, and Neel Sundaresan. [n. d.]. Learning to Reduce False Positives in Analytic Bug Detectors. ([n. d.]).
[10]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2022. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing (Jul 2022), 2244--2258.
[11]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings 2018 Network and Distributed System Security Symposium.
[12]
Tukaram Muske and Alexander Serebrenik. 2021. Survey of Approaches for Postprocessing of Static Analysis Alarms. ACM Computing Surveys,ACM Computing Surveys (Oct 2021).
[13]
Antonino Sabetta and Michele Bezzi. 2018. A Practical Approach to the Automatic Classification of Security-Relevant Commits. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).
[14]
Junjie Wang, Song Wang, and Qing Wang. 2018. Is there a "golden" feature set for static warning identification?: an experimental evaluation. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.
[15]
Xueqi Yang, Zhe Yu, Junjie Wang, and Tim Menzies. 2021. Understanding static code warnings: An incremental AI approach. Expert Systems with Applications 167 (Apr 2021), 114134.
[16]
Rahul Yedida, HongJin Kang, Huy Tu, Xueqi Yang, David Lo, and Tim Menzies. 2022. How to Find Actionable Static Analysis Warnings: A Case Study with FindBugs. (May 2022).
[17]
Yaqin Zhou and Asankhaya Sharma. 2017. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.
[18]
Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2022. SPI: Automated Identification of Security Patches via Commits. ACM Transactions on Software Engineering and Methodology (Jan 2022), 1--27.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
April 2024
788 pages
ISBN:9798400705878
DOI:10.1145/3643991
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. static program analysis
  2. actionable warning identification

Qualifiers

  • Research-article

Funding Sources

Conference

MSR '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 35
    Total Downloads
  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)4
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media