Skip to main content
Log in

The impact of context metrics on just-in-time defect prediction

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Traditional just-in-time defect prediction approaches have been using changed lines of software to predict defective-changes in software development. However, they disregard information around the changed lines. Our main hypothesis is that such information has an impact on the likelihood that the change is defective. To take advantage of this information in defect prediction, we consider n-lines (n = 1,2,…) that precede and follow the changed lines (which we call context lines), and propose metrics that measure them, which we call “Context Metrics.” Specifically, these context metrics are defined as the number of words/keywords in the context lines. In a large-scale empirical study using six open source software projects, we compare the performance of using our context metrics, traditional code churn metrics (e.g., the number of modified subsystems), our extended context metrics which measure not only context lines but also changed lines, and combination metrics that use two extended context metrics at a prediction model for defect prediction. The results show that context metrics that consider the context lines of added-lines achieve the best median value in all cases in terms of a statistical test. Moreover, using few number of context lines is suitable for context metric that considers words, and using more number of context lines is suitable for context metric that considers keywords. Finally, the combination metrics of two extended context metrics significantly outperform all studied metrics in all studied projects w. r. t. the area under the receiver operation characteristic curve (AUC) and Matthews correlation coefficient (MCC).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. Note that a chunk is able to be of type ‘+’, ‘-’ and ‘all’ at once. In this case, a chunk includes at least two lines that consist of at least one ‘+’ and ‘-’ line.

  2. The keywords refer to reserved words (statements) in C++ that are shown by Microsoft Visual Studio (Microsoft 2016). Because the reserved words of C++ and Java are almost the same, we use the keywords for the projects in Java. We separate the reserved words that include underscores. For instance, we convert “__if_exists” into “if” and “exists”.

  3. Here, a source file is a file with the name ending in java, c, h, cpp, hpp, cxx, or hxx, since we analyze both C++ and Java.

  4. https://github.com/doofuslarge/lscp. lscp separates complex identifiers into its component words —e.g., converts GetBoolArg into Get, Bool, Arg).

  5. Note that while higher values of AUC and MCC are better than lower values, lower values of Brier score are better than higher values. This is because Brier score is the sum of the mean squared differences between predicted probabilities and actual binary labels.

  6. https://github.com/klainfo/ScottKnottESD

  7. COMB are two context metrics NCCW and gotoNCCKW. Hence, we study NCCW and gotoNCCKW instead of COMB.

  8. Here, the coefficient means the left-singular vector. We conduct the PCA using singular vector decomposition.

  9. The first principal component means the input metrics set that can very retain the original metrics variance.

  10. Metrics which represent higher variance of the studied metrics have higher coefficient in the first principal component.

References

  • Aversano L, Cerulo L, Del Grosso C (2007) Learning from bug-introducing changes to prevent fault prone code. In: Proceedings of the 9th international workshop on principles of software evolution (IWPSE), pp 19–26. ACM

  • Basili V R, Briand L C, Melo W L (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Bettenburg N, Nagappan M, Hassan A E (2012) Think locally, act globally: Improving defect and effort prediction models. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR), pp 60–69. IEEE Press

  • Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS One 12(6):e0177,678

    Article  Google Scholar 

  • Bowes D, Hall T, Gray D (2012) Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In: Proceedings of the 8th international conference on predictive models in software engineering, pp 109–118. ACM

  • Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Mining 10(1):35

    Article  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences

  • D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of the 7th working conference on mining software repositories (MSR), pp 31–41. IEEE

  • Farrar D E, Glauber R R (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49(1):92–107

    Article  Google Scholar 

  • Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 172–181. ACM

  • Ghotra B, McIntosh S, Hassan A E (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering (ICSE), pp 789–800. IEEE Press

  • Graves T L, Karr A F, Marron J S, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304

    Article  Google Scholar 

  • Halstead M H (1977) Elements of software science. Elsevier, New York

    MATH  Google Scholar 

  • Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Proceedings of the international workshop on artificial neural networks, pp 195–201. Springer

  • Hassan A E (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering (ICSE), pp 78–88. IEEE

  • Hata H, Mizuno O, Kikuno T (2012) Bug prediction based on fine-grained module histories. In: Proceedings of the 34th international conference on software engineering (ICSE), pp 200–210. IEEE

  • Hindle A, Godfrey M W, Holt R C (2008) Reading beside the lines: Indentation as a proxy for complexity metric. In: Proceedings of the 16th international conference on program comprehension (ICPC), pp 133–142. IEEE

  • Ho T K (1995) Random decision forests. In: Proceedings of the 3rd international conference on document analysis and recognition, vol 1, pp 278–282. IEEE

  • Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: Proceedings of the 28th international conference on automated software engineering (ASE), pp 279–289. IEEE

  • Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan A E (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(5):2072–2106

    Article  Google Scholar 

  • Kamei Y, Shihab E, Adams B, Hassan A E, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Karunanithi N (1993) A neural network approach for software reliability growth modeling in the presence of code churn. In: Proceedings of the 4th international symposium on software reliability engineering, pp 310–317. IEEE

  • Khoshgoftaar T M, Allen E B, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: Proceedings of the 7th international symposium on software reliability engineering, pp 364–371. IEEE

  • Khoshgoftaar T M, Szabo R M (1994) Improving code churn predictions during the system test and maintenance phases. In: Proceedings of the international conference on software maintenance (ICSM), pp 58–67. IEEE

  • Kim S, Whitehead Jr E J (2006) How long did it take to fix bugs?. In: Proceedings of the 2006 international workshop on Mining software repositories (MSR), pp 173–174. ACM

  • Kim S, Whitehead Jr E J, Zhang Y (2008) Classifying software changes: Clean or buggy? IEEE Trans Softw Eng 34(2):181–196

  • Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33th international conference on software engineering (ICSE), pp 481–490. IEEE

  • Kim S, Zimmermann T, Whitehead Jr E J, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering (ICSE), pp 489–498. IEEE

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  • Li J, He P, Zhu J, Lyu M R (2017) Software defect prediction via convolutional neural network. In: Proceedings of the 2017 software quality, reliability and security (QRS), pp 318–328. IEEE

  • McCabe T J (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320

    Article  MathSciNet  Google Scholar 

  • McDonald J H (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore

    Google Scholar 

  • Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407

    Article  Google Scholar 

  • Microsoft (2016) Overview of c++ statements. https://docs.microsoft.com/ja-jp/cpp/cpp/overview-of-cpp-statements https://docs.microsoft.com/ja-jp/cpp/cpp/overview-of-cpp-statements

  • Mizuno O, Kikuno T (2007) Training on errors experiment to detect fault-prone software modules by spam filter. In: Proceedings of the 6th joint meeting on foundations of software engineering (ESEC/FSE), pp 405–414. ACM

  • Mockus A, Votta L G (2000) Identifying reasons for software changes using historic databases. In: Proceedings of the 22th international conference on software maintenance (ICSE), pp 120–130. IEEE

  • Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering (ICSE), pp 181–190. IEEE

  • Munson J C, Elbaum S G (1998) Code churn: A measure for estimating the impact of code change. In: Proceedings of the international conference on software maintenance, pp 24–31. IEEE

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering, pp 284–292. ACM

  • Ohlsson M C, Von Mayrhauser A, McGuire B, Wohlin C (1999) Code decay analysis of legacy software through successive releases. In: Proceedings of the aerospace conference, proceedings, pp 69–81. IEEE

  • Oram A, Wilson G (2010) Making software: What really works, and why we believe it. ” O’Reilly Media Inc.”

  • Ostrand T J, Weyuker E J, Bell R M (2004) Where the bugs are. In: ACM SIGSOFT Software Engineering Notes, vol 29, pp 86–96. ACM

  • Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers

  • Rice M E, Harris G T (2005) Comparing effect sizes in follow-up studies: Roc area, cohen’s d, and r. Law Hum Behav 29(5):615–620

    Article  Google Scholar 

  • Romanski P, Kotthoff L (2018) Fselector

  • Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 10th joint meeting on foundations of software engineering, ESEC/FSE, pp 966–969. ACM

  • Shannon C E (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  MathSciNet  Google Scholar 

  • Shepperd M, Bowes D, Hall T (2014) Researcher bias: The use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616

    Article  Google Scholar 

  • Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. Queen’s University (Canada), Ph.D. thesis

    Google Scholar 

  • Shihab E, Hassan A E, Adams B, Jiang Z M (2012) An industrial study on the risk of software changes. In: Proceedings of the 20th international symposium on the foundations of software engineering (FSE), p. 62. ACM

  • Śliwerski J., Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proceedings of the 2th international workshop on mining software repositories (MSR), 4, pp 1–5. ACM

  • Stevenson A, Lindberg C A (2010) New Oxford American dictionary. Oxford University Press, Oxford

    Book  Google Scholar 

  • Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the 37th international conference on software engineering (ICSE), pp 99–108. IEEE

  • Tantithamthavorn C, Hassan A E (2018) An experience report on defect modelling in practice: Pitfalls and challenges. In: Proceedings of the 40th international conference on software engineering: Software engineering in practice track (ICSE-SEIP’18), p To Appear

  • Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 321–332. ACM

  • Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18

    Article  Google Scholar 

  • Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology

  • Thomas WS (2015) lscp: A lightweight source code preprocesser

  • Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 297–308. ACM

  • Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: Proceedings of the 2015 software quality, reliability and security (QRS), pp 17–26. IEEE

  • Zhang F, Zheng Q, Zou Y, Hassan A E (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 309–320. ACM

  • Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the 3th international workshop on predictor models in software engineering (PROMISE), pp 9–19. IEEE

  • Zwillinger D, Kokoska S (1999) CRC standard probability and statistics tables and formulae. CRC Press

Download references

Acknowledgment

This work was partially supported by NSERC Canada as well as JSPS KAKENHI Japan (Grant Numbers: JP16K12415).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masanari Kondo.

Additional information

Communicated by: Nachiappan Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this Appendix, we show the actual values of the three evaluation measures corresponding to the results of the rank in RQ1, RQ2, and RQ3. In addition, we show the correlation across the modified NCCKW.

Table 15 The median values of AUC for each context metric variant and studied project
Table 16 The median values of MCC for each context metric variant and studied project
Table 17 The median values of Brier score for each context metric variant and studied project
Table 18 The median values of AUC for each context metric, each extended context metric, COMB, each indentation metric, the change metrics and each of the change metrics
Table 19 The median values of MCC for each context metric, each extended context metric, COMB, each indentation metric, the change metrics and each of the change metrics
Table 20 The median values of Brier score for each context metric, each extended context metric, COMB, each indentation metric, the change metrics and each of the change metrics
Table 21 Spearman rank correlation between NCCW and modified NCCKWs in the studied projects

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kondo, M., German, D.M., Mizuno, O. et al. The impact of context metrics on just-in-time defect prediction. Empir Software Eng 25, 890–939 (2020). https://doi.org/10.1007/s10664-019-09736-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09736-3

Keywords

Navigation