research-article

Open access

TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs

Authors:

Federica Sarro,

Yun MaAuthors Info & Claims

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

Pages 113 - 117

https://doi.org/10.1145/3643991.3644870

Published: 02 July 2024 Publication History

Abstract

We call a program that passes existing tests but still contains bugs as a buggy plausible program. Bugs in such a program can bypass the testing environment and enter the production environment, causing unpredictable consequences. Therefore, discovering and fixing such bugs is a fundamental and critical problem. However, no existing bug dataset is purposed to collect this kind of bug, posing significant obstacles to relevant research. To address this gap, we introduce TrickyBugs, a bug dataset with 3,043 buggy plausible programs sourced from human-written submissions of 324 real-world competition coding tasks. We identified the buggy plausible programs from approximately 400,000 submissions, and all the bugs in TrickyBugs were not previously detected. We hope that TrickyBugs can effectively facilitate research in the fields of automated program repair, fault localization, test generation, and test adequacy.

References

[1]

Earl T Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2014. The oracle problem in software testing: A survey. IEEE transactions on software engineering 41, 5 (2014), 507--525.

Digital Library

[2]

Ole-Johan Dahl, Edsger Wybe Dijkstra, and Charles Antony Richard Hoare. 1972. Structured programming. Academic Press Ltd.

[3]

Robert B Evans and Alberto Savoia. 2007. Differential testing: a new approach to change detection. In The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering: Companion Papers. 549--552.

Digital Library

[4]

AtCoder Inc. 2016. Atcoder testcases. https://atcoder.jp/posts/21

[5]

René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437--440.

Digital Library

[6]

Claire Le Goues, Neal Holtschulte, Edward K Smith, Yuriy Brun, Premkumar Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Transactions on Software Engineering 41, 12 (2015), 1236--1256.

Digital Library

[7]

Tsz-On Li, Wenxi Zong, Yibo Wang, Haoye Tian, Ying Wang, Shing-Chi Cheung, and Jeff Kramer. 2023. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, 14--26.

Digital Library

[8]

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d'Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals. 2022. Competition-level code generation with AlphaCode. Science 378, 6624 (2022), 1092--1097.

[9]

Derrick Lin, James Koppel, Angela Chen, and Armando Solar-Lezama. 2017. QuixBugs: A multi-lingual program repair benchmark set based on the Quixey Challenge. In Proceedings Companion of the 2017 ACM SIGPLAN international conference on systems, programming, languages, and applications: software for humanity. 55--56.

Digital Library

[10]

Kaibo Liu, Yudong Han, Jie M Zhang, Zhenpeng Chen, Federica Sarro, Mark Harman, Gang Huang, and Yun Ma. 2023. Who Judges the Judge: An Empirical Study on Online Judge Tests. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis.

Digital Library

[11]

luogu dev. 2019. Cyaron. https://github.com/luogu-dev/cyaron.

[12]

Mikhail Mirzayanov. 2009. Codeforces. https://codeforces.com/

[13]

Kenko Nakamura. 2015. Atcoder problems models. https://kenkoooo.com/atcoder/resources/problem-models.json

[14]

Fernando Petrulio, David Ackermann, Enrico Fregnan, Gül Calikli, Marco Castelluccio, Sylvestre Ledru, Calixte Denizet, Emma Humphries, and Alberto Bacchelli. 2022. SZZ in the time of pull requests. arXiv preprint arXiv:2209.03311 (2022).

[15]

Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladmir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al. 2021. Project codenet: A large-scale ai for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2105.12655 1035 (2021).

[16]

Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, and Rocco Oliveto. 2021. Evaluating szz implementations through a developer-informed oracle. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 436--447.

Digital Library

[17]

D. Sobania, M. Briesch, C. Hanna, and J. Petke. 2023. An Analysis of the Automatic Bug Fixing Performance of ChatGPT. (may 2023), 23--30.

[18]

Shin Hwei Tan, Jooyong Yi, Sergey Mechtaev, Abhik Roychoudhury, et al. 2017. Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 180--182.

Digital Library

[19]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2023. Software testing with large language model: Survey, landscape, and vision. arXiv preprint arXiv:2307.07221 (2023).

[20]

Yonghao Wu, Zheng Li, Jie M Zhang, and Yong Liu. 2023. ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair. arXiv preprint arXiv:2310.16253 (2023).

Index Terms

TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

An Empirical Study on the Adequacy of Testing in Open Source Projects
APSEC '14: Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference - Volume 01

During software maintenance, testing is a crucial activity to ensure the quality of code as it evolves over time. With the increasing size and complexity of software, adequate software testing has become increasingly important. Code coverage is an ...
Evaluating non-adequate test-case reduction
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

Given two test cases, one larger and one smaller, the smaller test case is preferred for many purposes. A smaller test case usually runs faster, is easier to understand, and is more convenient for debugging. However, smaller test cases also tend to ...
How to Train Your Neural Bug Detector: Artificial vs Real Bugs
ASE '23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering

ct Real bug fixes found in open source repositories seem to be the perfect source for learning to localize and repair real bugs. Yet, the scale of existing bug fix collections is typically too small for training data-intensive neural approaches. Neural ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

April 2024

788 pages

ISBN:9798400705878

DOI:10.1145/3643991

Chair:
Diomidis Spinellis,
Program Chair:
Alberto Bacchelli,
Program Co-chair:
Eleni Constantinou

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ERC Advanced Grant

Conference

MSR '24

Sponsor:

SIGSOFT

MSR '24: 21st International Conference on Mining Software Repositories

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
268
Total Downloads

Downloads (Last 12 months)268
Downloads (Last 6 weeks)36

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten