skip to main content
10.1145/2597073.2597099acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Characterizing and predicting blocking bugs in open source projects

Published: 31 May 2014 Publication History

Abstract

As software becomes increasingly important, its quality becomes an increasingly important issue. Therefore, prior work focused on software quality and proposed many prediction models to identify the location of software bugs, to estimate their fixing-time, etc. However, one special type of severe bugs is blocking bugs. Blocking bugs are software bugs that prevent other bugs from being fixed. These blocking bugs may increase maintenance costs, reduce overall quality and delay the release of the software systems.
In this paper, we study blocking-bugs in six open source projects and propose a model to predict them. Our goal is to help developers identify these blocking bugs early on. We collect the bug reports from the bug tracking systems of the projects, then we obtain 14 different factors related to, for example, the textual description of the bug, the location the bug is found in and the people involved with the bug. Based on these factors we build decision trees for each project to predict whether a bug will be a blocking bug or not. Then, we analyze these decision trees in order to determine which factors best indicate these blocking bugs. Our results show that our prediction models achieve F-measures of 15-42%, which is a two- to four-fold improvement over the baseline random predictors. We also find that the most important factors in determining blocking bugs are the comment text, comment size, the number of developers in the CC list of the bug report and the reporter's experience. Our analysis shows that our models reduce the median time to identify a blocking bug by 3-18 days.

References

[1]
L. Erlikh, “Leveraging legacy system dollars for e-business,” IT Professional, vol. 2, no. 3, pp. 17–23, May 2000.
[2]
G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” Tech. Rep., 2002.
[3]
M. D’Ambros, M. Lanza, and R. Robbes, “On the relationship between change coupling and software defects,” Working Conference on Reverse Engineering, pp. 135–144, 2009.
[4]
T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, “Predicting fault incidence using software change history,” IEEE Transactions of Software Engineering, vol. 26, no. 7, pp. 653–661, July 2000.
[5]
R. Moser, W. Pedrycz, and G. Succi, “A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction,” in ICSE ’08: Proceedings of the 30th international conference on Software engineering, 2008, pp. 181–190.
[6]
P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing,” in Software Engineering, 2007. ICSE 2007. 29th International Conference on, 2007, pp. 499–510.
[7]
X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and execution information,” in Software Engineering, 2008. ICSE ’08. ACM/IEEE 30th International Conference on, 2008, pp. 461–470.
[8]
N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim, “Duplicate bug reports considered harmful really,” in Software Maintenance, 2008. ICSM 2008. IEEE International Conference on, 2008, pp. 337–345.
[9]
M. Sharma, P. Bedi, K. Chaturvedi, and V. Singh, “Predicting the priority of a reported bug using machine learning techniques and cross project validation,” in Intelligent Systems Design and Applications (ISDA), 2012 12th International Conference on, 2012, pp. 539–545.
[10]
A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, 2010, pp. 1–10.
[11]
L. Marks, Y. Zou, and A. E. Hassan, “Studying the fix-time for bugs in large open source projects,” in Proceedings of the 7th International Conference on Predictive Models in Software Engineering. ACM, 2011, pp. 11:1–11:8.
[12]
L. D. Panjer, “Predicting eclipse bug lifetimes,” in Mining Software Repositories, 2007. ICSE Workshops MSR ’07. Fourth International Workshop on, 2007, pp. 29–29.
[13]
C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller, “How long will it take to fix this bug?” in Proceedings of the Fourth International Workshop on Mining Software Repositories. IEEE Computer Society, 2007.
[14]
E. Giger, M. Pinzger, and H. Gall, “Predicting the fix time of bugs,” in Proceedings of the 2Nd International Workshop on Recommendation Systems for Software Engineering. ACM, 2010, pp. 52–56.
[15]
J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proceedings of the 28th International Conference on Software Engineering. ACM, 2006, pp. 361–370.
[16]
W. Zou, Y. Hu, J. Xuan, and H. Jiang, “Towards training set reduction for bug triage,” in Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference. IEEE Computer Society, 2011, pp. 576–581.
[17]
J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage: Recommenders for development-oriented decisions,” ACM Trans. Softw. Eng. Methodol., vol. 20, no. 3, pp. 10:1–10:35, Aug. 2011.
[18]
E. Shihab, A. Ihara, Y. Kamei, W. Ibrahim, M. Ohira, B. Adams, A. Hassan, and K.-i. Matsumoto, “Studying re-opened bugs in open source software,” Empirical Software Engineering, vol. 18, no. 5, pp. 1005–1042, 2013.
[19]
T. Zimmermann, N. Nagappan, P. J. Guo, and B. Murphy, “Characterizing and predicting which bugs get reopened,” in Proceedings of the 2012 International Conference on Software Engineering, 2012, pp. 1074–1083.
[20]
S. Zaman, B. Adams, and A. E. Hassan, “A qualitative study on performance bugs,” in Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on. IEEE, 2012, pp. 199–208.
[21]
G. Antoniol, K. Ayari, M. Di Penta, F. Khomh, and Y.-G. Guéhéneuc, “Is it a bug or an enhancement?: A text-based approach to classify change requests,” in Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. ACM, 2008, pp. 23:304–23:318.
[22]
A. E. Hassan and K. Zhang, “Using decision trees to predict the certification result of a build,” in Automated Software Engineering, 2006. ASE’06. 21st IEEE/ACM International Conference on. IEEE, 2006, pp. 189–198.
[23]
C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on, 2011, pp. 253–262.
[24]
A. Lamkanfi, S. Demeyer, Q. Soetens, and T. Verdonck, “Comparing mining algorithms for predicting the severity of a reported bug,” in Software Maintenance and Reengineering (CSMR), 2011 15th European Conference on, 2011, pp. 249–258.
[25]
N. Jalbert and W. Weimer, “Automated duplicate detection for bug tracking systems,” in Dependable Systems and Networks With FTCS and DCC, 2008. DSN 2008. IEEE International Conference on, 2008, pp. 52–61.
[26]
W. Ibrahim, N. Bettenburg, E. Shihab, B. Adams, and A. Hassan, “Should i contribute to this discussion?” in Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, 2010, pp. 181–190.
[27]
P. Graham, “A plan for spam,” Available on: http://paulgraham.com/spam.html, Aug. 2003.
[28]
J. R. Quinlan, C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993.
[29]
L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
[30]
R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006, pp. 161–168.
[31]
M. C. Monard and G. Batista, “Learning with skewed class distributions,” Advances in Logic, Artificial Intelligence and Robotics, pp. 173–180, 2002.
[32]
J. Van Hulse, T. M. Khoshgoftaar, and A. Napolitano, “Experimental perspectives on learning from imbalanced data,” in Proceedings of the 24th International Conference on Machine Learning. ACM, 2007, pp. 935–942.
[33]
G. M. Weiss, “Mining with rarity: A unifying framework,” SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 7–19, Jun. 2004.
[34]
B. Efron, “Estimating the error rate of a prediction rule: improvement on cross-validation,” Journal of the American Statistical Association, vol. 78, no. 382, pp. 316–331, 1983.
[35]
X. Xia, D. Lo, X. Wang, X. Yang, S. Li, and J. Sun, “A comparative study of supervised learning algorithms for re-opened bug prediction,” in Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on, 2013, pp. 331–334.
[36]
T. Menzies and A. Marcus, “Automated severity assessment of software defect reports,” in Software Maintenance, 2008. ICSM 2008. IEEE International Conference on, 2008, pp. 346–355.
[37]
D. Cubranic and G. C. Murphy, “Automatic bug triage using text categorization,” in In SEKE 2004: Proceedings of the Sixteenth International Conference on Software Engineering and Knowledge Engineering. KSI Press, 2004, pp. 92–97.
[38]
M. M. Rahman, G. Ruhe, and T. Zimmermann, “Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects,” in Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE Computer Society, 2009, pp. 439–442.
[39]
P. Bhattacharya and I. Neamtiu, “Bug-fix time prediction models: Can we do better?” in Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, 2011, pp. 207–210.
[40]
T. Zimmermann and N. Nagappan, “Predicting defects using network analysis on dependency graphs,” in Proceedings of the 30th International Conference on Software Engineering, 2008, pp. 531–540.

Cited By

View all
  • (2024)Predicting Software Defect Complexity and Accuracy using Bug Tracking and ClusteringInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJETIR-1237(200-204)Online publication date: 10-Jul-2024
  • (2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
  • (2024)Why and how bug blocking relations are breakableInformation and Software Technology10.1016/j.infsof.2023.107354166:COnline publication date: 1-Feb-2024
  • Show More Cited By

Index Terms

  1. Characterizing and predicting blocking bugs in open source projects

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories
      May 2014
      427 pages
      ISBN:9781450328630
      DOI:10.1145/2597073
      • General Chair:
      • Premkumar Devanbu,
      • Program Chairs:
      • Sung Kim,
      • Martin Pinzger
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 May 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Code Metrics
      2. Post-release Defects
      3. Process Metrics

      Qualifiers

      • Article

      Conference

      ICSE '14
      Sponsor:

      Upcoming Conference

      ICSE 2025

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Predicting Software Defect Complexity and Accuracy using Bug Tracking and ClusteringInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJETIR-1237(200-204)Online publication date: 10-Jul-2024
      • (2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
      • (2024)Why and how bug blocking relations are breakableInformation and Software Technology10.1016/j.infsof.2023.107354166:COnline publication date: 1-Feb-2024
      • (2024)On the value of instance selection for bug resolution prediction performanceJournal of Software: Evolution and Process10.1002/smr.2710Online publication date: 2-Jul-2024
      • (2023)How Machine Learning Can Help Developers2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453880(1-6)Online publication date: 6-Dec-2023
      • (2023)The significant impact of parameter tuning on blocking bug predictionInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-01975-414:5(1703-1717)Online publication date: 19-Jun-2023
      • (2023)NRPredictor: an ensemble learning and feature selection based approach for predicting the non-reproducible bugsInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-01902-714:3(989-1009)Online publication date: 8-May-2023
      • (2023)A multi-model framework for semantically enhancing detection of quality-related bug report descriptionsEmpirical Software Engineering10.1007/s10664-022-10280-w28:2Online publication date: 11-Feb-2023
      • (2022)Systematic Review of Machine Learning-Based Open-Source Software Maintenance Effort EstimationRecent Advances in Computer Science and Communications10.2174/266625581666622060911071216:3Online publication date: Mar-2022
      • (2022)Predictive Models in Software Engineering: Challenges and OpportunitiesACM Transactions on Software Engineering and Methodology10.1145/350350931:3(1-72)Online publication date: 9-Apr-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media