Article

Characterizing and predicting blocking bugs in open source projects

Authors:

Harold Valdivia Garcia,

Emad ShihabAuthors Info & Claims

MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories

Pages 72 - 81

https://doi.org/10.1145/2597073.2597099

Published: 31 May 2014 Publication History

Abstract

As software becomes increasingly important, its quality becomes an increasingly important issue. Therefore, prior work focused on software quality and proposed many prediction models to identify the location of software bugs, to estimate their fixing-time, etc. However, one special type of severe bugs is blocking bugs. Blocking bugs are software bugs that prevent other bugs from being fixed. These blocking bugs may increase maintenance costs, reduce overall quality and delay the release of the software systems.

In this paper, we study blocking-bugs in six open source projects and propose a model to predict them. Our goal is to help developers identify these blocking bugs early on. We collect the bug reports from the bug tracking systems of the projects, then we obtain 14 different factors related to, for example, the textual description of the bug, the location the bug is found in and the people involved with the bug. Based on these factors we build decision trees for each project to predict whether a bug will be a blocking bug or not. Then, we analyze these decision trees in order to determine which factors best indicate these blocking bugs. Our results show that our prediction models achieve F-measures of 15-42%, which is a two- to four-fold improvement over the baseline random predictors. We also find that the most important factors in determining blocking bugs are the comment text, comment size, the number of developers in the CC list of the bug report and the reporter's experience. Our analysis shows that our models reduce the median time to identify a blocking bug by 3-18 days.

References

[1]

L. Erlikh, “Leveraging legacy system dollars for e-business,” IT Professional, vol. 2, no. 3, pp. 17–23, May 2000.

Digital Library

[2]

G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” Tech. Rep., 2002.

[3]

M. D’Ambros, M. Lanza, and R. Robbes, “On the relationship between change coupling and software defects,” Working Conference on Reverse Engineering, pp. 135–144, 2009.

Digital Library

[4]

T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, “Predicting fault incidence using software change history,” IEEE Transactions of Software Engineering, vol. 26, no. 7, pp. 653–661, July 2000.

Digital Library

[5]

R. Moser, W. Pedrycz, and G. Succi, “A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction,” in ICSE ’08: Proceedings of the 30th international conference on Software engineering, 2008, pp. 181–190.

Digital Library

[6]

P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing,” in Software Engineering, 2007. ICSE 2007. 29th International Conference on, 2007, pp. 499–510.

Digital Library

[7]

X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and execution information,” in Software Engineering, 2008. ICSE ’08. ACM/IEEE 30th International Conference on, 2008, pp. 461–470.

Digital Library

[8]

N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim, “Duplicate bug reports considered harmful really,” in Software Maintenance, 2008. ICSM 2008. IEEE International Conference on, 2008, pp. 337–345.

[9]

M. Sharma, P. Bedi, K. Chaturvedi, and V. Singh, “Predicting the priority of a reported bug using machine learning techniques and cross project validation,” in Intelligent Systems Design and Applications (ISDA), 2012 12th International Conference on, 2012, pp. 539–545.

[10]

A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, 2010, pp. 1–10.

[11]

L. Marks, Y. Zou, and A. E. Hassan, “Studying the fix-time for bugs in large open source projects,” in Proceedings of the 7th International Conference on Predictive Models in Software Engineering. ACM, 2011, pp. 11:1–11:8.

Digital Library

[12]

L. D. Panjer, “Predicting eclipse bug lifetimes,” in Mining Software Repositories, 2007. ICSE Workshops MSR ’07. Fourth International Workshop on, 2007, pp. 29–29.

Digital Library

[13]

C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller, “How long will it take to fix this bug?” in Proceedings of the Fourth International Workshop on Mining Software Repositories. IEEE Computer Society, 2007.

Digital Library

[14]

E. Giger, M. Pinzger, and H. Gall, “Predicting the fix time of bugs,” in Proceedings of the 2Nd International Workshop on Recommendation Systems for Software Engineering. ACM, 2010, pp. 52–56.

Digital Library

[15]

J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proceedings of the 28th International Conference on Software Engineering. ACM, 2006, pp. 361–370.

Digital Library

[16]

W. Zou, Y. Hu, J. Xuan, and H. Jiang, “Towards training set reduction for bug triage,” in Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference. IEEE Computer Society, 2011, pp. 576–581.

Digital Library

[17]

J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage: Recommenders for development-oriented decisions,” ACM Trans. Softw. Eng. Methodol., vol. 20, no. 3, pp. 10:1–10:35, Aug. 2011.

Digital Library

[18]

E. Shihab, A. Ihara, Y. Kamei, W. Ibrahim, M. Ohira, B. Adams, A. Hassan, and K.-i. Matsumoto, “Studying re-opened bugs in open source software,” Empirical Software Engineering, vol. 18, no. 5, pp. 1005–1042, 2013.

[19]

T. Zimmermann, N. Nagappan, P. J. Guo, and B. Murphy, “Characterizing and predicting which bugs get reopened,” in Proceedings of the 2012 International Conference on Software Engineering, 2012, pp. 1074–1083.

Digital Library

[20]

S. Zaman, B. Adams, and A. E. Hassan, “A qualitative study on performance bugs,” in Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on. IEEE, 2012, pp. 199–208.

[21]

G. Antoniol, K. Ayari, M. Di Penta, F. Khomh, and Y.-G. Guéhéneuc, “Is it a bug or an enhancement?: A text-based approach to classify change requests,” in Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. ACM, 2008, pp. 23:304–23:318.

Digital Library

[22]

A. E. Hassan and K. Zhang, “Using decision trees to predict the certification result of a build,” in Automated Software Engineering, 2006. ASE’06. 21st IEEE/ACM International Conference on. IEEE, 2006, pp. 189–198.

Digital Library

[23]

C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on, 2011, pp. 253–262.

Digital Library

[24]

A. Lamkanfi, S. Demeyer, Q. Soetens, and T. Verdonck, “Comparing mining algorithms for predicting the severity of a reported bug,” in Software Maintenance and Reengineering (CSMR), 2011 15th European Conference on, 2011, pp. 249–258.

Digital Library

[25]

N. Jalbert and W. Weimer, “Automated duplicate detection for bug tracking systems,” in Dependable Systems and Networks With FTCS and DCC, 2008. DSN 2008. IEEE International Conference on, 2008, pp. 52–61.

[26]

W. Ibrahim, N. Bettenburg, E. Shihab, B. Adams, and A. Hassan, “Should i contribute to this discussion?” in Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, 2010, pp. 181–190.

[27]

P. Graham, “A plan for spam,” Available on: http://paulgraham.com/spam.html, Aug. 2003.

[28]

J. R. Quinlan, C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993.

Digital Library

[29]

L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

Digital Library

[30]

R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006, pp. 161–168.

Digital Library

[31]

M. C. Monard and G. Batista, “Learning with skewed class distributions,” Advances in Logic, Artificial Intelligence and Robotics, pp. 173–180, 2002.

[32]

J. Van Hulse, T. M. Khoshgoftaar, and A. Napolitano, “Experimental perspectives on learning from imbalanced data,” in Proceedings of the 24th International Conference on Machine Learning. ACM, 2007, pp. 935–942.

Digital Library

[33]

G. M. Weiss, “Mining with rarity: A unifying framework,” SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 7–19, Jun. 2004.

Digital Library

[34]

B. Efron, “Estimating the error rate of a prediction rule: improvement on cross-validation,” Journal of the American Statistical Association, vol. 78, no. 382, pp. 316–331, 1983.

[35]

X. Xia, D. Lo, X. Wang, X. Yang, S. Li, and J. Sun, “A comparative study of supervised learning algorithms for re-opened bug prediction,” in Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on, 2013, pp. 331–334.

Digital Library

[36]

T. Menzies and A. Marcus, “Automated severity assessment of software defect reports,” in Software Maintenance, 2008. ICSM 2008. IEEE International Conference on, 2008, pp. 346–355.

[37]

D. Cubranic and G. C. Murphy, “Automatic bug triage using text categorization,” in In SEKE 2004: Proceedings of the Sixteenth International Conference on Software Engineering and Knowledge Engineering. KSI Press, 2004, pp. 92–97.

[38]

M. M. Rahman, G. Ruhe, and T. Zimmermann, “Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects,” in Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE Computer Society, 2009, pp. 439–442.

Digital Library

[39]

P. Bhattacharya and I. Neamtiu, “Bug-fix time prediction models: Can we do better?” in Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, 2011, pp. 207–210.

Digital Library

[40]

T. Zimmermann and N. Nagappan, “Predicting defects using network analysis on dependency graphs,” in Proceedings of the 30th International Conference on Software Engineering, 2008, pp. 531–540.

Digital Library

Cited By

Swetha. S A.Poongodi (2024)Predicting Software Defect Complexity and Accuracy using Bug Tracking and ClusteringInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJETIR-1237(200-204)Online publication date: 10-Jul-2024
https://doi.org/10.48175/IJETIR-1237
Krasniqi R(2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
https://doi.org/10.12794/metadc2332577
Ren HLi YChen LZhou YNie C(2024)Why and how bug blocking relations are breakableInformation and Software Technology10.1016/j.infsof.2023.107354166:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.infsof.2023.107354
Show More Cited By

Index Terms

Characterizing and predicting blocking bugs in open source projects
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics
2. Information systems
  1. Information systems applications

Recommendations

Studying the fix-time for bugs in large open source projects
Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software Engineering

Background: Bug fixing lies at the core of most software maintenance efforts. Most prior studies examine the effort needed to fix a bug (fix-effort). However, the effort needed to fix a bug may not correlate with the calendar time needed to fix it (fix-...
An Empirical Study on Critical Blocking Bugs
ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

Blocking bugs are a severe type of bugs that prevent other bugs from being fixed. As software becomes increasingly complex and large, blocking bugs occur in many large-scale software, especially in software ecosystems. Blocking bugs may have a high ...
An empirical analysis of reopened bugs based on open source projects
EASE '16: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering

Background: Bug fixing is a long-term and time-consuming activity. A software bug experiences a typical life cycle from newly reported to finally closed by developers, but it could be reopened afterwards for further actions due to reasons such as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories

May 2014

427 pages

ISBN:9781450328630

DOI:10.1145/2597073

General Chair:
Premkumar Devanbu
University of California at Davis, USA
,
Program Chairs:
Sung Kim
Hong Kong University of Science and Technology, China
,
Martin Pinzger
University of Klagenfurt, Austria

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICSE '14

Sponsor:

SIGSOFT

ICSE '14: 36th International Conference on Software Engineering

May 31 - June 1, 2014

Hyderabad, India

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

68
Total Citations
View Citations
600
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Swetha. S A.Poongodi (2024)Predicting Software Defect Complexity and Accuracy using Bug Tracking and ClusteringInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJETIR-1237(200-204)Online publication date: 10-Jul-2024
https://doi.org/10.48175/IJETIR-1237
Krasniqi R(2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
https://doi.org/10.12794/metadc2332577
Ren HLi YChen LZhou YNie C(2024)Why and how bug blocking relations are breakableInformation and Software Technology10.1016/j.infsof.2023.107354166:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.infsof.2023.107354
Miloudi CCheikhi LIdri AAbran A(2024)On the value of instance selection for bug resolution prediction performanceJournal of Software: Evolution and Process10.1002/smr.2710Online publication date: 2-Jul-2024
https://doi.org/10.1002/smr.2710
Dugar M(2023)How Machine Learning Can Help Developers2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453880(1-6)Online publication date: 6-Dec-2023
https://doi.org/10.1109/ACIT58888.2023.10453880
Brown SWeyori BAdekoya AKudjo P(2023)The significant impact of parameter tuning on blocking bug predictionInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-01975-414:5(1703-1717)Online publication date: 19-Jun-2023
https://doi.org/10.1007/s13198-023-01975-4
Bansal KSingh GSunesh Malik Rohil H(2023)NRPredictor: an ensemble learning and feature selection based approach for predicting the non-reproducible bugsInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-01902-714:3(989-1009)Online publication date: 8-May-2023
https://doi.org/10.1007/s13198-023-01902-7
Krasniqi RDo H(2023)A multi-model framework for semantically enhancing detection of quality-related bug report descriptionsEmpirical Software Engineering10.1007/s10664-022-10280-w28:2Online publication date: 11-Feb-2023
https://dl.acm.org/doi/10.1007/s10664-022-10280-w
Miloudi CCheikhi LAbran A(2022)Systematic Review of Machine Learning-Based Open-Source Software Maintenance Effort EstimationRecent Advances in Computer Science and Communications10.2174/266625581666622060911071216:3Online publication date: Mar-2022
https://doi.org/10.2174/2666255816666220609110712
Yang YXia XLo DBi TGrundy JYang X(2022)Predictive Models in Software Engineering: Challenges and OpportunitiesACM Transactions on Software Engineering and Methodology10.1145/350350931:3(1-72)Online publication date: 9-Apr-2022
https://dl.acm.org/doi/10.1145/3503509
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten