Predicting the delay of issues with due dates in software projects

Choetkiertikul, Morakot; Dam, Hoa Khanh; Tran, Truyen; Ghose, Aditya

doi:10.1007/s10664-016-9496-7

Predicting the delay of issues with due dates in software projects

Published: 19 January 2017

Volume 22, pages 1223–1263, (2017)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Morakot Choetkiertikul¹,
Hoa Khanh Dam¹,
Truyen Tran² &
…
Aditya Ghose¹

1700 Accesses
34 Citations
7 Altmetric
Explore all metrics

Abstract

Issue-tracking systems (e.g. JIRA) have increasingly been used in many software projects. An issue could represent a software bug, a new requirement or a user story, or even a project task. A deadline can be imposed on an issue by either explicitly assigning a due date to it, or implicitly assigning it to a release and having it inherit the release’s deadline. This paper presents a novel approach to providing automated support for project managers and other decision makers in predicting whether an issue is at risk of being delayed against its deadline. A set of features (hereafter called risk factors) characterizing delayed issues were extracted from eight open source projects: Apache, Duraspace, Java.net, JBoss, JIRA, Moodle, Mulesoft, and WSO2. Risk factors with good discriminative power were selected to build predictive models to predict if the resolution of an issue will be at risk of being delayed. Our predictive models are able to predict both the the extend of the delay and the likelihood of the delay occurrence. The evaluation results demonstrate the effectiveness of our predictive models, achieving on average 79 % precision, 61 % recall, 68 % F-measure, and 83 % Area Under the ROC Curve. Our predictive models also have low error rates: on average 0.66 for Macro-averaged Mean Cost-Error and 0.72 Macro-averaged Mean Absolute Error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study of the integration time of fixed issues

Article 11 May 2017

Identifying self-admitted technical debt in issue tracking systems using machine learning

Article Open access 10 July 2022

The impact of rapid release cycles on the integration delay of fixed issues

Article 04 November 2017

Notes

https://issues.apache.org/jira
https://jira.duraspace.org
https://java.net/jira/
https://issues.jboss.org
https://jira.atlassian.com
https://tracker.moodle.org
https://www.mulesoft.org/jira/
https://wso2.org/jira/
https://www.atlassian.com/software/jira
Here we deal with only 4 classes but the formula can be easily generalized to n classes.

References

Abdelmoez W, Kholief M, Elsalmy FM (2012) Bug Fix-Time Prediction Model Using Naïve Bayes Classifier. In: Proceedings of the 22nd International Conference on Computer Theory and Applications (ICCTA), October, pp 13–15
Anvik J, Murphy GC (2011) Reducing the effort of bug report triage. ACM Trans Softw Eng Methodol 20(3):1–35
Article Google Scholar
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. ACM Press, New York, USA
Baccianella S, Esuli A, Sebastiani F (2009) Evaluation measures for ordinal regression. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 283–287
Belsley DA, Kuh E, Welsch RE (2005) Regression diagnostics: Identifying influential data and sources of collinearity, vol 571. John Wiley & Sons
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM Press, New York, USA, pp 308–318
Google Scholar
Bettenburg N, Premraj R, Zimmermann T (2008b) Duplicate bug reports considered harmful … really?. In: Proceedings of the International Conference on Software Maintenance (ICSM), pp 337–345
Bhattacharya P, Neamtiu I (2011) Bug-fix time prediction models: can we do better?. In: Proceedings of the 8th working conference on Mining software repositories (MSR). ACM, pp 207–210
Blei DM, Ng AY, Jordan MI (2012) Latent Dirichlet Allocation. J Mach Learn Res 3(4-5):993–1022
MATH Google Scholar
Boehm B (1989) Software risk management. Springer
Boehm B (1991) Software risk management: principles and practices. Software, IEEE 8(1):32–41
Article Google Scholar
Breiman L (2001) Random forests. Machine learning pp 5–32
Bright P (2015) What windows as a service and a ’free upgrade’ mean at home and at work. https://goo.gl/Fzwflg
Chawla N, Cieslak D (2006) Evaluating probability estimates from decision trees. American Association for Artificial Intelligence (AAAI) pp 1–6
Choetkiertikul M, Dam HK, Tran T, Ghose A (2015a) Characterization and prediction of issue-related risks in software projects. In: Proceedings of the 12th Working Conference on Mining Software Repositories (MSR). IEEE, pp 280–291
Choetkiertikul M, Dam HK, Tran T, Ghose A (2015b) Predicting delays in software projects using networked classification. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 353 – 364
Google Scholar
Conforti R, de Leoni M, La Rosa M, van der Aalst WM, ter Hofstede AH (2015) A recommendation system for predicting risks across multiple business process instances. Decis Support Syst 69:1–19
Article Google Scholar
da Costa DA, Abebe SL, Mcintosh S, Kulesza U, Hassan AE (2014) An Empirical Study of Delays in the Integration of Addressed Issues. In: Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), pp 281–290
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet MATH Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38 (4):367–378
Article MathSciNet MATH Google Scholar
Garg A, Roth D (2001) Understanding Probabilistic Classifiers, Lecture Notes in Computer Science 2167
Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recogn Lett 31(14):2225–2236
Article Google Scholar
Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering (RSSE). ACM, pp 52–56
Group S (2004) Chaos report. Tech. rep. West Yarmouth. Standish Group, Massachusetts
Google Scholar
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: Proceedings of the 32nd International Conference on Software Engineering (ICSE), vol 1, pp 495–504
Guyon I, Elisseeff A (2003) An Introduction to Variable and Feature Selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Han WM, Huang SJ (2007) An empirical analysis of risk components and performance on software projects. J Syst Softw 80(1):42–50
Article MathSciNet Google Scholar
Hodge VJ, Austin J (2004) A Survey of Outlier Detection Methodoligies. Artif Intell Rev 22(1969):85–126
Article MATH Google Scholar
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the 22 IEEE/ACM international conference on Automated software engineering (ASE), ACM Press, pp 34–44
Google Scholar
Hu Y, Huang J, Chen J, Liu M, Xie K, Yat-sen S (2007) Software Project Risk Management Modeling with Neural Network and Support Vector Machine Approaches. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC), vol 3, pp 358–362
Hu Y, Zhang X, Ngai E, Cai R, Liu M (2013) Software project risk analysis using Bayesian networks with causality constraints. Decis Support Syst 56:439–449
Article Google Scholar
Ibrahim WM, Bettenburg N, Shihab E, Adams B, Hassan AE (2010) Should I contribute to this discussion?. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR), pp 181–190
Google Scholar
Iqbal A (2014) Understanding Contributor to Developer Turnover Patterns in OSS Projects : A Case Study of Apache Projects. ISRN Softw Eng 2014:10–20
Article Google Scholar
Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the International Conference on Dependable Systems and Networks With FTCS and DCC, DSN. IEEE, pp 52–61
Jr DH, Lemeshow S (2004) Applied logistic regression, 3rd edn. Wiley
Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: Proceedings of the IEEE International Conference on Software Maintenance (ICSM). IEEE, pp 1–10
Kaufman S, Perlich C (2012) Leakage in Data Mining : Formulation , Detection , and Avoidance. ACM Trans Knowl Discov Data (TKDD) 6(15):556–563
Google Scholar
Kim S, Zimmermann T, Pan K, Jr Whitehead E (2006) Automatic Identification of Bug-Introducing Changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 81–90
Kochhar PS, Thung F, Lo D (2014) Automatic fine-grained issue report reclassification. In: Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS, pp 126–135
Google Scholar
Kohavi R (1996) Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp 202–207
Google Scholar
Lam X, Vu T, Le T (2008) Addressing cold-start problem in recommendation systems. In: Proceedings of the 2nd international conference on Ubiquitous information management and communication, pp 208–211
Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bugs. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, pp 1–10
Lee SI, Lee H, Abbeel P, Ng AY (2006) Efficient l^∼ 1 regularized logistic regression. In: Proceedings of the National Conference on Artificial Intelligence, Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, vol 21, pp 401–409
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Trans Softw Eng 34(4):485–496
Article Google Scholar
Letier E, Stefan D, Barr ET (2014) Uncertainty, risk, and information value in software requirements and architecture. In: Proceedings of the 36th International Conference on Software Engineering (ICSE). ACM Press, New York, USA, pp 883–894
Google Scholar
Marks L, Zou Y, Hassan AE (2011) Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Promise). ACM Press, pp 1–8
Menard S (2002) Applied logistic regression analysis, vol 106, 2nd edn. SAGE University paper
Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: Proceedings of the International Conference on Software Maintenance (ICSM). IEEE, pp 346–355
Michael B, Blumberg S, Laartz J (2012) Delivering large-scale IT projects on time, on budget, and on value. Tech. rep
Murphy G, Čubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th International Conference on Software Engineering & Knowledge Engineering (SEKE), pp 92– 97
Neumann D (2002) An enhanced neural network technique for software risk analysis. IEEE Trans Softw Eng 28(9):904–912
Article Google Scholar
Panjer LD (2007) Predicting Eclipse Bug Lifetimes. In: Proceedings of the 4th International Workshop on Mining Software Repositories (MSR), pp 29–32
Pika A, van der Aalst WM, Fidge CJ, ter Hofstede AH, Wynn MT, Aalst WVD (2013) Profiling event logs to configure risk indicators for process delays. In: Proceedings of the 25th International Conference on Advanced Information Systems Engineering (CAiSE). Springer, pp 465–481
Porter AA, Siy HP, Votta LG (1997) Understanding the effects of developer activities on inspection interval. ACM Press
Qin X, Salter-Townshend M, Cunningham P (2014) Exploring the Relationship between Membership Turnover and Productivity in Online Communities. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc
Rahman MM, Ruhe G, Zimmermann T (2009) Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects. In: Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE, pp 39–442
Runeson P, Alexandersson M, Nyholm O (2007) Detection of Duplicate Defect Reports Using Natural Language Processing. In: Proceedings of the 29th International Conference on Software Engineering (ICSE). IEEE, pp 499–510
Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K (2012) Studying re-opened bugs in open source software. Empir Softw Eng 18(5):1005–1042
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE, ASE, pp 253–262
Google Scholar
Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 205–214
Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383
Article Google Scholar
Valdivia Garcia H, Shihab E, Garcia HV (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). ACM Press, pp 72–81
Wallace L, Keil M (2004) Software project risks and their effect on outcomes. Commun ACM 47(4):68–73
Article Google Scholar
Wang LM, Li XL, Cao CH, Yuan SM (2006) Combining decision tree and Naive Bayes for classification. Knowl-Based Syst 19(7):511–515
Article Google Scholar
Wang Q, Zhu J, Yu B (2005) Combining Classifiers in Software Quality Prediction : A Neural Network Approach. In: Proceedings of the 2nd International Symposium on Neural Networks. Springer Berlin, Heidelberg, pp 921–926
Google Scholar
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering (ICSE), pp 461–470
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How Long Will It Take to Fix This Bug?. In: Proceedings of the 4th International Workshop on Mining Software Repositories (MSR), pp 1–8
Wolfson J, Bandyopadhyay S, Elidrisi M, Vazquez-Benitez G, Musgrove D, Adomavicius G, Johnson P, O’Connor P (2014) A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat Med:21–42
Xia X, Lo D, Shihab E, Wang X, Zhou B (2014a) Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22(1):75–109
Article Google Scholar
Xia X, Lo D, Wen M, Shihab E, Zhou B (2014b) An empirical study of bug report field reassignment. In: Proceedings of the Conference on Software Maintenance, Reengineering, and Reverse Engineering, pp 174–183
Xia X, Lo D, Shihab E, Wang X, Yang X (2015) ELBlocker: Predicting blocking bugs with ensemble imbalance learning. Inf Softw Technol 61:93–106
Article Google Scholar
Xu R, leqiu Q, Xinhai J (2003) CMM-based software risk control optimization. In: Proceedings of the 5th IEEE Workshop on Mobile Computing Systems and Applications, IEEE, pp 499–503
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 694–699
Zanoni M, Perin F, Fontana FA, Viscusi G (2014) Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process 26(12):1172–1192
Google Scholar
Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: Proceedings of the 34th International Conference on Software Engineering (ICSE). IEEE Press, pp 1074–1083

Download references

Author information

Authors and Affiliations

School of Computing and Information Technology, Faculty of Engineering and Information Sciences, University of Wollongong, Wollongong, Australia
Morakot Choetkiertikul, Hoa Khanh Dam & Aditya Ghose
School of Information Technology, Deakin University, Deakin, Australia
Truyen Tran

Authors

Morakot Choetkiertikul
View author publications
You can also search for this author in PubMed Google Scholar
Hoa Khanh Dam
View author publications
You can also search for this author in PubMed Google Scholar
Truyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Ghose
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoa Khanh Dam.

Additional information

Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choetkiertikul, M., Dam, H.K., Tran, T. et al. Predicting the delay of issues with due dates in software projects. Empir Software Eng 22, 1223–1263 (2017). https://doi.org/10.1007/s10664-016-9496-7

Download citation

Published: 19 January 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10664-016-9496-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting the delay of issues with due dates in software projects

Abstract

Access this article

Similar content being viewed by others

An empirical study of the integration time of fixed issues

Identifying self-admitted technical debt in issue tracking systems using machine learning

The impact of rapid release cycles on the integration delay of fixed issues

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting the delay of issues with due dates in software projects

Abstract

Access this article

Similar content being viewed by others

An empirical study of the integration time of fixed issues

Identifying self-admitted technical debt in issue tracking systems using machine learning

The impact of rapid release cycles on the integration delay of fixed issues

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation