Skip to main content
Log in

Predicting the delay of issues with due dates in software projects

Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Issue-tracking systems (e.g. JIRA) have increasingly been used in many software projects. An issue could represent a software bug, a new requirement or a user story, or even a project task. A deadline can be imposed on an issue by either explicitly assigning a due date to it, or implicitly assigning it to a release and having it inherit the release’s deadline. This paper presents a novel approach to providing automated support for project managers and other decision makers in predicting whether an issue is at risk of being delayed against its deadline. A set of features (hereafter called risk factors) characterizing delayed issues were extracted from eight open source projects: Apache, Duraspace, Java.net, JBoss, JIRA, Moodle, Mulesoft, and WSO2. Risk factors with good discriminative power were selected to build predictive models to predict if the resolution of an issue will be at risk of being delayed. Our predictive models are able to predict both the the extend of the delay and the likelihood of the delay occurrence. The evaluation results demonstrate the effectiveness of our predictive models, achieving on average 79 % precision, 61 % recall, 68 % F-measure, and 83 % Area Under the ROC Curve. Our predictive models also have low error rates: on average 0.66 for Macro-averaged Mean Cost-Error and 0.72 Macro-averaged Mean Absolute Error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://issues.apache.org/jira

  2. https://jira.duraspace.org

  3. https://java.net/jira/

  4. https://issues.jboss.org

  5. https://jira.atlassian.com

  6. https://tracker.moodle.org

  7. https://www.mulesoft.org/jira/

  8. https://wso2.org/jira/

  9. https://www.atlassian.com/software/jira

  10. Here we deal with only 4 classes but the formula can be easily generalized to n classes.

References

  • Abdelmoez W, Kholief M, Elsalmy FM (2012) Bug Fix-Time Prediction Model Using Naïve Bayes Classifier. In: Proceedings of the 22nd International Conference on Computer Theory and Applications (ICCTA), October, pp 13–15

  • Anvik J, Murphy GC (2011) Reducing the effort of bug report triage. ACM Trans Softw Eng Methodol 20(3):1–35

    Article  Google Scholar 

  • Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. ACM Press, New York, USA

  • Baccianella S, Esuli A, Sebastiani F (2009) Evaluation measures for ordinal regression. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 283–287

  • Belsley DA, Kuh E, Welsch RE (2005) Regression diagnostics: Identifying influential data and sources of collinearity, vol 571. John Wiley & Sons

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM Press, New York, USA, pp 308–318

    Google Scholar 

  • Bettenburg N, Premraj R, Zimmermann T (2008b) Duplicate bug reports considered harmful … really?. In: Proceedings of the International Conference on Software Maintenance (ICSM), pp 337–345

  • Bhattacharya P, Neamtiu I (2011) Bug-fix time prediction models: can we do better?. In: Proceedings of the 8th working conference on Mining software repositories (MSR). ACM, pp 207–210

  • Blei DM, Ng AY, Jordan MI (2012) Latent Dirichlet Allocation. J Mach Learn Res 3(4-5):993–1022

    MATH  Google Scholar 

  • Boehm B (1989) Software risk management. Springer

  • Boehm B (1991) Software risk management: principles and practices. Software, IEEE 8(1):32–41

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Machine learning pp 5–32

  • Bright P (2015) What windows as a service and a ’free upgrade’ mean at home and at work. https://goo.gl/Fzwflg

  • Chawla N, Cieslak D (2006) Evaluating probability estimates from decision trees. American Association for Artificial Intelligence (AAAI) pp 1–6

  • Choetkiertikul M, Dam HK, Tran T, Ghose A (2015a) Characterization and prediction of issue-related risks in software projects. In: Proceedings of the 12th Working Conference on Mining Software Repositories (MSR). IEEE, pp 280–291

  • Choetkiertikul M, Dam HK, Tran T, Ghose A (2015b) Predicting delays in software projects using networked classification. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 353 – 364

    Google Scholar 

  • Conforti R, de Leoni M, La Rosa M, van der Aalst WM, ter Hofstede AH (2015) A recommendation system for predicting risks across multiple business process instances. Decis Support Syst 69:1–19

    Article  Google Scholar 

  • da Costa DA, Abebe SL, Mcintosh S, Kulesza U, Hassan AE (2014) An Empirical Study of Delays in the Integration of Addressed Issues. In: Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), pp 281–290

    Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38 (4):367–378

    Article  MathSciNet  MATH  Google Scholar 

  • Garg A, Roth D (2001) Understanding Probabilistic Classifiers, Lecture Notes in Computer Science 2167

  • Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recogn Lett 31(14):2225–2236

    Article  Google Scholar 

  • Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering (RSSE). ACM, pp 52–56

  • Group S (2004) Chaos report. Tech. rep. West Yarmouth. Standish Group, Massachusetts

    Google Scholar 

  • Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: Proceedings of the 32nd International Conference on Software Engineering (ICSE), vol 1, pp 495–504

  • Guyon I, Elisseeff A (2003) An Introduction to Variable and Feature Selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  • Han WM, Huang SJ (2007) An empirical analysis of risk components and performance on software projects. J Syst Softw 80(1):42–50

    Article  MathSciNet  Google Scholar 

  • Hodge VJ, Austin J (2004) A Survey of Outlier Detection Methodoligies. Artif Intell Rev 22(1969):85–126

    Article  MATH  Google Scholar 

  • Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the 22 IEEE/ACM international conference on Automated software engineering (ASE), ACM Press, pp 34–44

    Google Scholar 

  • Hu Y, Huang J, Chen J, Liu M, Xie K, Yat-sen S (2007) Software Project Risk Management Modeling with Neural Network and Support Vector Machine Approaches. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC), vol 3, pp 358–362

  • Hu Y, Zhang X, Ngai E, Cai R, Liu M (2013) Software project risk analysis using Bayesian networks with causality constraints. Decis Support Syst 56:439–449

    Article  Google Scholar 

  • Ibrahim WM, Bettenburg N, Shihab E, Adams B, Hassan AE (2010) Should I contribute to this discussion?. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR), pp 181–190

    Google Scholar 

  • Iqbal A (2014) Understanding Contributor to Developer Turnover Patterns in OSS Projects : A Case Study of Apache Projects. ISRN Softw Eng 2014:10–20

    Article  Google Scholar 

  • Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the International Conference on Dependable Systems and Networks With FTCS and DCC, DSN. IEEE, pp 52–61

  • Jr DH, Lemeshow S (2004) Applied logistic regression, 3rd edn. Wiley

  • Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: Proceedings of the IEEE International Conference on Software Maintenance (ICSM). IEEE, pp 1–10

  • Kaufman S, Perlich C (2012) Leakage in Data Mining : Formulation , Detection , and Avoidance. ACM Trans Knowl Discov Data (TKDD) 6(15):556–563

    Google Scholar 

  • Kim S, Zimmermann T, Pan K, Jr Whitehead E (2006) Automatic Identification of Bug-Introducing Changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 81–90

  • Kochhar PS, Thung F, Lo D (2014) Automatic fine-grained issue report reclassification. In: Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS, pp 126–135

    Google Scholar 

  • Kohavi R (1996) Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp 202–207

    Google Scholar 

  • Lam X, Vu T, Le T (2008) Addressing cold-start problem in recommendation systems. In: Proceedings of the 2nd international conference on Ubiquitous information management and communication, pp 208–211

  • Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bugs. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, pp 1–10

  • Lee SI, Lee H, Abbeel P, Ng AY (2006) Efficient l 1 regularized logistic regression. In: Proceedings of the National Conference on Artificial Intelligence, Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, vol 21, pp 401–409

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  • Letier E, Stefan D, Barr ET (2014) Uncertainty, risk, and information value in software requirements and architecture. In: Proceedings of the 36th International Conference on Software Engineering (ICSE). ACM Press, New York, USA, pp 883–894

    Google Scholar 

  • Marks L, Zou Y, Hassan AE (2011) Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Promise). ACM Press, pp 1–8

  • Menard S (2002) Applied logistic regression analysis, vol 106, 2nd edn. SAGE University paper

  • Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: Proceedings of the International Conference on Software Maintenance (ICSM). IEEE, pp 346–355

  • Michael B, Blumberg S, Laartz J (2012) Delivering large-scale IT projects on time, on budget, and on value. Tech. rep

  • Murphy G, Čubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th International Conference on Software Engineering & Knowledge Engineering (SEKE), pp 92– 97

  • Neumann D (2002) An enhanced neural network technique for software risk analysis. IEEE Trans Softw Eng 28(9):904–912

    Article  Google Scholar 

  • Panjer LD (2007) Predicting Eclipse Bug Lifetimes. In: Proceedings of the 4th International Workshop on Mining Software Repositories (MSR), pp 29–32

  • Pika A, van der Aalst WM, Fidge CJ, ter Hofstede AH, Wynn MT, Aalst WVD (2013) Profiling event logs to configure risk indicators for process delays. In: Proceedings of the 25th International Conference on Advanced Information Systems Engineering (CAiSE). Springer, pp 465–481

  • Porter AA, Siy HP, Votta LG (1997) Understanding the effects of developer activities on inspection interval. ACM Press

  • Qin X, Salter-Townshend M, Cunningham P (2014) Exploring the Relationship between Membership Turnover and Productivity in Online Communities. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc

  • Rahman MM, Ruhe G, Zimmermann T (2009) Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects. In: Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE, pp 39–442

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of Duplicate Defect Reports Using Natural Language Processing. In: Proceedings of the 29th International Conference on Software Engineering (ICSE). IEEE, pp 499–510

  • Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K (2012) Studying re-opened bugs in open source software. Empir Softw Eng 18(5):1005–1042

    Article  Google Scholar 

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  • Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE, ASE, pp 253–262

    Google Scholar 

  • Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of the Working Conference on Reverse Engineering (WCRE), pp 205–214

  • Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383

    Article  Google Scholar 

  • Valdivia Garcia H, Shihab E, Garcia HV (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). ACM Press, pp 72–81

  • Wallace L, Keil M (2004) Software project risks and their effect on outcomes. Commun ACM 47(4):68–73

    Article  Google Scholar 

  • Wang LM, Li XL, Cao CH, Yuan SM (2006) Combining decision tree and Naive Bayes for classification. Knowl-Based Syst 19(7):511–515

    Article  Google Scholar 

  • Wang Q, Zhu J, Yu B (2005) Combining Classifiers in Software Quality Prediction : A Neural Network Approach. In: Proceedings of the 2nd International Symposium on Neural Networks. Springer Berlin, Heidelberg, pp 921–926

    Google Scholar 

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering (ICSE), pp 461–470

  • Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How Long Will It Take to Fix This Bug?. In: Proceedings of the 4th International Workshop on Mining Software Repositories (MSR), pp 1–8

  • Wolfson J, Bandyopadhyay S, Elidrisi M, Vazquez-Benitez G, Musgrove D, Adomavicius G, Johnson P, O’Connor P (2014) A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat Med:21–42

  • Xia X, Lo D, Shihab E, Wang X, Zhou B (2014a) Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22(1):75–109

    Article  Google Scholar 

  • Xia X, Lo D, Wen M, Shihab E, Zhou B (2014b) An empirical study of bug report field reassignment. In: Proceedings of the Conference on Software Maintenance, Reengineering, and Reverse Engineering, pp 174–183

  • Xia X, Lo D, Shihab E, Wang X, Yang X (2015) ELBlocker: Predicting blocking bugs with ensemble imbalance learning. Inf Softw Technol 61:93–106

    Article  Google Scholar 

  • Xu R, leqiu Q, Xinhai J (2003) CMM-based software risk control optimization. In: Proceedings of the 5th IEEE Workshop on Mobile Computing Systems and Applications, IEEE, pp 499–503

  • Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 694–699

  • Zanoni M, Perin F, Fontana FA, Viscusi G (2014) Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process 26(12):1172–1192

    Google Scholar 

  • Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: Proceedings of the 34th International Conference on Software Engineering (ICSE). IEEE Press, pp 1074–1083

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoa Khanh Dam.

Additional information

Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choetkiertikul, M., Dam, H.K., Tran, T. et al. Predicting the delay of issues with due dates in software projects. Empir Software Eng 22, 1223–1263 (2017). https://doi.org/10.1007/s10664-016-9496-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9496-7

Keywords

Navigation