ABSTRACT
Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.
- Shirin Akbarinasaji, Bora Caglayan, and Ayse Bener. 2018. Predicting Bug-fixing Time. J. Syst. Softw. 136, C (Feb. 2018), 173--186. Google ScholarDigital Library
- Shirin Akbarinasaji, Bora Caglayan, and Ayse Basar Bener. 2018. Predicting bug-fixing time: A replication study using an open source software project. Journal of Systems and Software 136 (2018), 173--186. Google ScholarDigital Library
- Wisam Haitham Abbood Al-Zubaidi, Hoa Khanh Dam, Aditya Ghose, and Xiaodong Li. 2017. Multi-objective Search-based Approach to Estimate Issue Resolution Time. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE). ACM, New York, NY, USA, 53--62. Google ScholarDigital Library
- G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta. 2011. How Long Does a Bug Survive? An Empirical Study. In 2011 18th Working Conference on Reverse Engineering. 191--200. Google ScholarDigital Library
- M. Habayeb, A. Miranskyy, S. S. Murtaza, L. Buchanan, and A. Bener. 2015. The Firefox Temporal Defect Dataset. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 498--501. Google ScholarDigital Library
- M. Habayeb, S. S. Murtaza, A. Miranskyy, and A. B. Bener. 2018. On the Use of Hidden Markov Model to Predict the Time to Fix Bugs. IEEE Transactions on Software Engineering 44, 12 (Dec 2018), 1224--1244.Google ScholarDigital Library
- M. R. Karim, A. Ihara, X. Yang, H. Iida, and K. Matsumoto. 2017. Understanding Key Features of High-Impact Bug Reports. In 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP). 53--58.Google Scholar
- A. Lamkanfi, J. Pérez, and S. Demeyer. 2013. The Eclipse and Mozilla defect tracking dataset: A genuine dataset for mining bug information. In 2013 10th Working Conference on Mining Software Repositories (MSR). 203--206. Google ScholarDigital Library
- Marco Ortu, Giuseppe Destefanis, Bram Adams, Alessandro Murgia, Michele Marchesi, and Roberto Tonelli. 2015. The JIRA Repository Dataset: Understanding Social Aspects of Software Development. In Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '15). ACM, New York, NY, USA, Article 1, 4 pages. Google ScholarDigital Library
- Ripon K. Saha, Sarfraz Khurshid, and Dewayne E. Perry. 2015. Understanding the Triaging and Fixing Processes of Long Lived Bugs. Inf. Softw. Technol. 65, C (Sept. 2015), 114--128. Google ScholarDigital Library
- M. Sharma, P. Bedi, K. K. Chaturvedi, and V. B. Singh. 2012. Predicting the priority of a reported bug using machine learning techniques and cross project validation. In 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA). 539--545.Google Scholar
- P. K. Singh, D. Agarwal, and A. Gupta. 2015. A systematic review on software defect prediction. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom). 1793--1797.Google Scholar
- Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2018. ACM Press, New York, New York, USA, 908--911. Google ScholarDigital Library
- Y. Tian, D. Lo, and C. Sun. 2013. DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis. In 2013 IEEE International Conference on Software Maintenance. 200--209. Google ScholarDigital Library
- Yuan Tian, David Lo, Xin Xia, and Chengnian Sun. 2015. Automated prediction of bug report priority using multi-factor analysis. Empirical Software Engineering 20, 5 (01 Oct 2015), 1354--1383. Google ScholarDigital Library
- Q. Umer, H.Liu, and Y. Sultan. 2018. Emotion Based Automated Priority Prediction for Bug Reports. IEEE Access 6 (2018), 35743--35752.Google ScholarCross Ref
- Romi Wahono. 2015. A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering 1 (05 2015).Google Scholar
- X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 461--470. Google ScholarDigital Library
- C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. 2007. How Long Will It Take to Fix This Bug?. In Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007). 1--1. Google ScholarDigital Library
- Y. Xu and M. Zhou. 2018. A Multi-level Dataset of Linux Kernel Patchwork. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 54--57. Google ScholarDigital Library
- F. Zhang, F. Khomh, Y. Zou, and A. E. Hassan. 2012. An Empirical Study on Factors Impacting Bug Fixing Time. In 2012 19th Working Conference on Reverse Engineering. 225--234. Google ScholarDigital Library
- H. Zhang, L. Gong, and S. Versteeg. 2013. Predicting bug-fixing time: An empirical study of commercial software projects. In 2013 35th International Conference on Software Engineering (ICSE). 1042--1051. Google ScholarDigital Library
- J. Zhu, M. Zhou, and H. Mei. 2016. Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). 472--475. Google ScholarDigital Library
- T. Zimmermann, N. Nagappan, P. J. Guo, and B. Murphy. 2012. Characterizing and predicting which bugs get reopened. In 2012 34th International Conference on Software Engineering (ICSE). 1074--1083. Google ScholarDigital Library
Index Terms
From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects
Recommendations
An Effective Approach for Routing the Bug Reports to the Right Fixers
Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on InternetwareRouting the bug reports to potential fixers (i.e., bug triaging), is an integral step in software development and maintenance. However, manually inspecting and assigning bug reports is tedious and time-consuming, especially in those software projects ...
Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability EngineeringFor complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...
Studying the fix-time for bugs in large open source projects
Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software EngineeringBackground: Bug fixing lies at the core of most software maintenance efforts. Most prior studies examine the effort needed to fix a bug (fix-effort). However, the effort needed to fix a bug may not correlate with the calendar time needed to fix it (fix-...
Comments