research-article

AW4C: A Commit-Aware C Dataset for Actionable Warning Identification

Authors:

Xiaohong Zhang,

Dan YangAuthors Info & Claims

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

Pages 133 - 137

https://doi.org/10.1145/3643991.3644885

Published: 02 July 2024 Publication History

Abstract

Excessive non-actionable warnings generated by static program analysis tools can hinder developers from utilizing these tools effectively. Leveraging learning-based approaches for actionable warning identification has demonstrated promise in boosting developer productivity, minimizing the risk of bugs, and reducing code smells. However, the small sizes of existing datasets have limited the model choices for machine learning researchers, and the lack of aligned fix commits limits the scope of the dataset for research. In this paper, we present AW4C, an actionable warning C dataset that contains 38,134 actionable warnings mined from more than 500 repositories on GitHub. These warnings are generated via Cppcheck, and most importantly, each warning is precisely mapped to the commit where the corrective action occurred. To the best of our knowledge, this is the largest publicly available actionable warning dataset for C programming language to date. The dataset is suited for use in machine/deep learning models and can support a wide range of tasks, such as actionable warning identification and vulnerability detection. Furthermore, we have released our dataset¹ and a general framework for collecting actionable warnings on GitHub² to facilitate other researchers to replicate our work and validate their innovative ideas.

References

[1]

Enas A. Alikhashashneh, Rajeev R. Raje, and James H. Hill. 2018. Using Machine Learning Techniques to Classify and Predict Static Code Analysis Tool Warnings. In 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA).

[2]

Anon. [n. d.]. Reducing False Positives of Static Program Analysis in Industry. https://conf.researchr.org/getImage/ase-2023/orig/Challenge.pdf

[3]

Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2022. Deep Learning based Vulnerability Detection: Are We There Yet? IEEE Transactions on Software Engineering (Sep 2022), 3280--3296.

[4]

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering.

Digital Library

[5]

Xiuting Ge, Chunrong Fang, Jia Liu, Mingshuang Qing, Xuanye Li, and Zhihong Zhao. 2023. An unsupervised feature selection approach for actionable warning identification. Expert Systems with Applications 227 (Oct 2023), 120152.

Digital Library

[6]

Quinn Hanam, Lin Tan, Reid Holmes, and Patrick Lam. 2014. Finding patterns in static analysis alerts: improving actionable alert ranking. In Proceedings of the 11th Working Conference on Mining Software Repositories.

Digital Library

[7]

Sarah Heckman and Laurie Williams. 2011. A systematic literature review of actionable alert identification techniques for automated static code analysis. Information and Software Technology (Apr 2011), 363--387.

Digital Library

[8]

HongJin Kang, KhaiLoong Aw, and David Lo. [n. d.]. Detecting False Alarms from Automatic Static Analysis Tools: How Far are We? ([n. d.]).

[9]

Anant Kharkar, RoshanakZilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin Clement, and Neel Sundaresan. [n. d.]. Learning to Reduce False Positives in Analytic Bug Detectors. ([n. d.]).

[10]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2022. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing (Jul 2022), 2244--2258.

[11]

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings 2018 Network and Distributed System Security Symposium.

[12]

Tukaram Muske and Alexander Serebrenik. 2021. Survey of Approaches for Postprocessing of Static Analysis Alarms. ACM Computing Surveys,ACM Computing Surveys (Oct 2021).

[13]

Antonino Sabetta and Michele Bezzi. 2018. A Practical Approach to the Automatic Classification of Security-Relevant Commits. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[14]

Junjie Wang, Song Wang, and Qing Wang. 2018. Is there a "golden" feature set for static warning identification?: an experimental evaluation. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.

Digital Library

[15]

Xueqi Yang, Zhe Yu, Junjie Wang, and Tim Menzies. 2021. Understanding static code warnings: An incremental AI approach. Expert Systems with Applications 167 (Apr 2021), 114134.

Digital Library

[16]

Rahul Yedida, HongJin Kang, Huy Tu, Xueqi Yang, David Lo, and Tim Menzies. 2022. How to Find Actionable Static Analysis Warnings: A Case Study with FindBugs. (May 2022).

[17]

Yaqin Zhou and Asankhaya Sharma. 2017. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.

Digital Library

[18]

Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. 2022. SPI: Automated Identification of Security Patches via Commits. ACM Transactions on Software Engineering and Methodology (Jan 2022), 1--27.

Digital Library

Index Terms

AW4C: A Commit-Aware C Dataset for Actionable Warning Identification
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Software maintenance tools

Recommendations

Machine Learning for Actionable Warning Identification: A Comprehensive Survey
Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based ...
An unsupervised feature selection approach for actionable warning identification
Abstract
Static Analysis Tools (SATs) are widely applied to detect defects in software projects. However, SATs are overshadowed by a large number of unactionable warnings, which severely hinder the usability of SATs. To address this problem, ...
Highlights
- An unsupervised feature selection approach for actionable warning identification.
Is there a "golden" feature set for static warning identification?: an experimental evaluation
ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Background: The most important challenge regarding the use of static analysis tools (e.g., FindBugs) is that there are a large number of warnings that are not acted on by developers. Many features have been proposed to build classification models for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

April 2024

788 pages

ISBN:9798400705878

DOI:10.1145/3643991

Chair:
Diomidis Spinellis,
Program Chair:
Alberto Bacchelli,
Program Co-chair:
Eleni Constantinou

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MSR '24

Sponsor:

SIGSOFT

MSR '24: 21st International Conference on Mining Software Repositories

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
35
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)4

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten