skip to main content
10.1145/3345629.3345639acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects

Authors Info & Claims
Published:18 September 2019Publication History

ABSTRACT

Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.

References

  1. Shirin Akbarinasaji, Bora Caglayan, and Ayse Bener. 2018. Predicting Bug-fixing Time. J. Syst. Softw. 136, C (Feb. 2018), 173--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Shirin Akbarinasaji, Bora Caglayan, and Ayse Basar Bener. 2018. Predicting bug-fixing time: A replication study using an open source software project. Journal of Systems and Software 136 (2018), 173--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Wisam Haitham Abbood Al-Zubaidi, Hoa Khanh Dam, Aditya Ghose, and Xiaodong Li. 2017. Multi-objective Search-based Approach to Estimate Issue Resolution Time. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE). ACM, New York, NY, USA, 53--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta. 2011. How Long Does a Bug Survive? An Empirical Study. In 2011 18th Working Conference on Reverse Engineering. 191--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Habayeb, A. Miranskyy, S. S. Murtaza, L. Buchanan, and A. Bener. 2015. The Firefox Temporal Defect Dataset. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 498--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Habayeb, S. S. Murtaza, A. Miranskyy, and A. B. Bener. 2018. On the Use of Hidden Markov Model to Predict the Time to Fix Bugs. IEEE Transactions on Software Engineering 44, 12 (Dec 2018), 1224--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. R. Karim, A. Ihara, X. Yang, H. Iida, and K. Matsumoto. 2017. Understanding Key Features of High-Impact Bug Reports. In 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP). 53--58.Google ScholarGoogle Scholar
  8. A. Lamkanfi, J. Pérez, and S. Demeyer. 2013. The Eclipse and Mozilla defect tracking dataset: A genuine dataset for mining bug information. In 2013 10th Working Conference on Mining Software Repositories (MSR). 203--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marco Ortu, Giuseppe Destefanis, Bram Adams, Alessandro Murgia, Michele Marchesi, and Roberto Tonelli. 2015. The JIRA Repository Dataset: Understanding Social Aspects of Software Development. In Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '15). ACM, New York, NY, USA, Article 1, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ripon K. Saha, Sarfraz Khurshid, and Dewayne E. Perry. 2015. Understanding the Triaging and Fixing Processes of Long Lived Bugs. Inf. Softw. Technol. 65, C (Sept. 2015), 114--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Sharma, P. Bedi, K. K. Chaturvedi, and V. B. Singh. 2012. Predicting the priority of a reported bug using machine learning techniques and cross project validation. In 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA). 539--545.Google ScholarGoogle Scholar
  12. P. K. Singh, D. Agarwal, and A. Gupta. 2015. A systematic review on software defect prediction. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom). 1793--1797.Google ScholarGoogle Scholar
  13. Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2018. ACM Press, New York, New York, USA, 908--911. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Tian, D. Lo, and C. Sun. 2013. DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis. In 2013 IEEE International Conference on Software Maintenance. 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuan Tian, David Lo, Xin Xia, and Chengnian Sun. 2015. Automated prediction of bug report priority using multi-factor analysis. Empirical Software Engineering 20, 5 (01 Oct 2015), 1354--1383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Q. Umer, H.Liu, and Y. Sultan. 2018. Emotion Based Automated Priority Prediction for Bug Reports. IEEE Access 6 (2018), 35743--35752.Google ScholarGoogle ScholarCross RefCross Ref
  17. Romi Wahono. 2015. A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering 1 (05 2015).Google ScholarGoogle Scholar
  18. X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 461--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. 2007. How Long Will It Take to Fix This Bug?. In Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007). 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Xu and M. Zhou. 2018. A Multi-level Dataset of Linux Kernel Patchwork. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 54--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Zhang, F. Khomh, Y. Zou, and A. E. Hassan. 2012. An Empirical Study on Factors Impacting Bug Fixing Time. In 2012 19th Working Conference on Reverse Engineering. 225--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Zhang, L. Gong, and S. Versteeg. 2013. Predicting bug-fixing time: An empirical study of commercial software projects. In 2013 35th International Conference on Software Engineering (ICSE). 1042--1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Zhu, M. Zhou, and H. Mei. 2016. Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). 472--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Zimmermann, N. Nagappan, P. J. Guo, and B. Murphy. 2012. Characterizing and predicting which bugs get reopened. In 2012 34th International Conference on Software Engineering (ICSE). 1074--1083. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            PROMISE'19: Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering
            September 2019
            103 pages
            ISBN:9781450372336
            DOI:10.1145/3345629

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 September 2019

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate64of125submissions,51%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader