The impact of context metrics on just-in-time defect prediction

Kondo, Masanari; German, Daniel M.; Mizuno, Osamu; Choi, Eun-Hye

doi:10.1007/s10664-019-09736-3

The impact of context metrics on just-in-time defect prediction

Published: 08 August 2019

Volume 25, pages 890–939, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Masanari Kondo ORCID: orcid.org/0000-0002-6317-7001¹,
Daniel M. German²,
Osamu Mizuno¹ &
…
Eun-Hye Choi³

1644 Accesses
5 Altmetric
Explore all metrics

Abstract

Traditional just-in-time defect prediction approaches have been using changed lines of software to predict defective-changes in software development. However, they disregard information around the changed lines. Our main hypothesis is that such information has an impact on the likelihood that the change is defective. To take advantage of this information in defect prediction, we consider n-lines (n = 1,2,…) that precede and follow the changed lines (which we call context lines), and propose metrics that measure them, which we call “Context Metrics.” Specifically, these context metrics are defined as the number of words/keywords in the context lines. In a large-scale empirical study using six open source software projects, we compare the performance of using our context metrics, traditional code churn metrics (e.g., the number of modified subsystems), our extended context metrics which measure not only context lines but also changed lines, and combination metrics that use two extended context metrics at a prediction model for defect prediction. The results show that context metrics that consider the context lines of added-lines achieve the best median value in all cases in terms of a statistical test. Moreover, using few number of context lines is suitable for context metric that considers words, and using more number of context lines is suitable for context metric that considers keywords. Finally, the combination metrics of two extended context metrics significantly outperform all studied metrics in all studied projects w. r. t. the area under the receiver operation characteristic curve (AUC) and Matthews correlation coefficient (MCC).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

Article 27 October 2018

An empirical evaluation of defect prediction approaches in within-project and cross-project context

Article 04 March 2023

Defect prediction model of static code features for cross-company and cross-project software

Article 06 December 2018

Notes

Note that a chunk is able to be of type ‘+’, ‘-’ and ‘all’ at once. In this case, a chunk includes at least two lines that consist of at least one ‘+’ and ‘-’ line.
The keywords refer to reserved words (statements) in C++ that are shown by Microsoft Visual Studio (Microsoft 2016). Because the reserved words of C++ and Java are almost the same, we use the keywords for the projects in Java. We separate the reserved words that include underscores. For instance, we convert “__if_exists” into “if” and “exists”.
Here, a source file is a file with the name ending in java, c, h, cpp, hpp, cxx, or hxx, since we analyze both C++ and Java.
https://github.com/doofuslarge/lscp. lscp separates complex identifiers into its component words —e.g., converts GetBoolArg into Get, Bool, Arg).
Note that while higher values of AUC and MCC are better than lower values, lower values of Brier score are better than higher values. This is because Brier score is the sum of the mean squared differences between predicted probabilities and actual binary labels.
https://github.com/klainfo/ScottKnottESD
COMB are two context metrics NCCW and gotoNCCKW. Hence, we study NCCW and gotoNCCKW instead of COMB.
Here, the coefficient means the left-singular vector. We conduct the PCA using singular vector decomposition.
The first principal component means the input metrics set that can very retain the original metrics variance.
Metrics which represent higher variance of the studied metrics have higher coefficient in the first principal component.

References

Aversano L, Cerulo L, Del Grosso C (2007) Learning from bug-introducing changes to prevent fault prone code. In: Proceedings of the 9th international workshop on principles of software evolution (IWPSE), pp 19–26. ACM
Basili V R, Briand L C, Melo W L (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Article Google Scholar
Bettenburg N, Nagappan M, Hassan A E (2012) Think locally, act globally: Improving defect and effort prediction models. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR), pp 60–69. IEEE Press
Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS One 12(6):e0177,678
Article Google Scholar
Bowes D, Hall T, Gray D (2012) Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In: Proceedings of the 8th international conference on predictive models in software engineering, pp 109–118. ACM
Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Mining 10(1):35
Article Google Scholar
Cohen J (1988) Statistical power analysis for the behavioral sciences
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of the 7th working conference on mining software repositories (MSR), pp 31–41. IEEE
Farrar D E, Glauber R R (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49(1):92–107
Article Google Scholar
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 172–181. ACM
Ghotra B, McIntosh S, Hassan A E (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering (ICSE), pp 789–800. IEEE Press
Graves T L, Karr A F, Marron J S, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Article Google Scholar
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Article Google Scholar
Halstead M H (1977) Elements of software science. Elsevier, New York
MATH Google Scholar
Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Proceedings of the international workshop on artificial neural networks, pp 195–201. Springer
Hassan A E (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering (ICSE), pp 78–88. IEEE
Hata H, Mizuno O, Kikuno T (2012) Bug prediction based on fine-grained module histories. In: Proceedings of the 34th international conference on software engineering (ICSE), pp 200–210. IEEE
Hindle A, Godfrey M W, Holt R C (2008) Reading beside the lines: Indentation as a proxy for complexity metric. In: Proceedings of the 16th international conference on program comprehension (ICPC), pp 133–142. IEEE
Ho T K (1995) Random decision forests. In: Proceedings of the 3rd international conference on document analysis and recognition, vol 1, pp 278–282. IEEE
Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: Proceedings of the 28th international conference on automated software engineering (ASE), pp 279–289. IEEE
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan A E (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(5):2072–2106
Article Google Scholar
Kamei Y, Shihab E, Adams B, Hassan A E, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Article Google Scholar
Karunanithi N (1993) A neural network approach for software reliability growth modeling in the presence of code churn. In: Proceedings of the 4th international symposium on software reliability engineering, pp 310–317. IEEE
Khoshgoftaar T M, Allen E B, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: Proceedings of the 7th international symposium on software reliability engineering, pp 364–371. IEEE
Khoshgoftaar T M, Szabo R M (1994) Improving code churn predictions during the system test and maintenance phases. In: Proceedings of the international conference on software maintenance (ICSM), pp 58–67. IEEE
Kim S, Whitehead Jr E J (2006) How long did it take to fix bugs?. In: Proceedings of the 2006 international workshop on Mining software repositories (MSR), pp 173–174. ACM
Kim S, Whitehead Jr E J, Zhang Y (2008) Classifying software changes: Clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33th international conference on software engineering (ICSE), pp 481–490. IEEE
Kim S, Zimmermann T, Whitehead Jr E J, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering (ICSE), pp 489–498. IEEE
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Article Google Scholar
Li J, He P, Zhu J, Lyu M R (2017) Software defect prediction via convolutional neural network. In: Proceedings of the 2017 software quality, reliability and security (QRS), pp 318–328. IEEE
McCabe T J (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
Article MathSciNet Google Scholar
McDonald J H (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore
Google Scholar
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407
Article Google Scholar
Microsoft (2016) Overview of c++ statements. https://docs.microsoft.com/ja-jp/cpp/cpp/overview-of-cpp-statements https://docs.microsoft.com/ja-jp/cpp/cpp/overview-of-cpp-statements
Mizuno O, Kikuno T (2007) Training on errors experiment to detect fault-prone software modules by spam filter. In: Proceedings of the 6th joint meeting on foundations of software engineering (ESEC/FSE), pp 405–414. ACM
Mockus A, Votta L G (2000) Identifying reasons for software changes using historic databases. In: Proceedings of the 22th international conference on software maintenance (ICSE), pp 120–130. IEEE
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering (ICSE), pp 181–190. IEEE
Munson J C, Elbaum S G (1998) Code churn: A measure for estimating the impact of code change. In: Proceedings of the international conference on software maintenance, pp 24–31. IEEE
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering, pp 284–292. ACM
Ohlsson M C, Von Mayrhauser A, McGuire B, Wohlin C (1999) Code decay analysis of legacy software through successive releases. In: Proceedings of the aerospace conference, proceedings, pp 69–81. IEEE
Oram A, Wilson G (2010) Making software: What really works, and why we believe it. ” O’Reilly Media Inc.”
Ostrand T J, Weyuker E J, Bell R M (2004) Where the bugs are. In: ACM SIGSOFT Software Engineering Notes, vol 29, pp 86–96. ACM
Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers
Rice M E, Harris G T (2005) Comparing effect sizes in follow-up studies: Roc area, cohen’s d, and r. Law Hum Behav 29(5):615–620
Article Google Scholar
Romanski P, Kotthoff L (2018) Fselector
Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 10th joint meeting on foundations of software engineering, ESEC/FSE, pp 966–969. ACM
Shannon C E (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Article MathSciNet Google Scholar
Shepperd M, Bowes D, Hall T (2014) Researcher bias: The use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616
Article Google Scholar
Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. Queen’s University (Canada), Ph.D. thesis
Google Scholar
Shihab E, Hassan A E, Adams B, Jiang Z M (2012) An industrial study on the risk of software changes. In: Proceedings of the 20th international symposium on the foundations of software engineering (FSE), p. 62. ACM
Śliwerski J., Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proceedings of the 2th international workshop on mining software repositories (MSR), 4, pp 1–5. ACM
Stevenson A, Lindberg C A (2010) New Oxford American dictionary. Oxford University Press, Oxford
Book Google Scholar
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the 37th international conference on software engineering (ICSE), pp 99–108. IEEE
Tantithamthavorn C, Hassan A E (2018) An experience report on defect modelling in practice: Pitfalls and challenges. In: Proceedings of the 40th international conference on software engineering: Software engineering in practice track (ICSE-SEIP’18), p To Appear
Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 321–332. ACM
Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18
Article Google Scholar
Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology
Thomas WS (2015) lscp: A lightweight source code preprocesser
Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 297–308. ACM
Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: Proceedings of the 2015 software quality, reliability and security (QRS), pp 17–26. IEEE
Zhang F, Zheng Q, Zou Y, Hassan A E (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 309–320. ACM
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the 3th international workshop on predictor models in software engineering (PROMISE), pp 9–19. IEEE
Zwillinger D, Kokoska S (1999) CRC standard probability and statistics tables and formulae. CRC Press

Download references

Acknowledgment

This work was partially supported by NSERC Canada as well as JSPS KAKENHI Japan (Grant Numbers: JP16K12415).

Author information

Authors and Affiliations

Software Engineering Laboratory (SEL), Kyoto Institute of Technology, Kyoto, Japan
Masanari Kondo & Osamu Mizuno
Department of Computer Science, University of Victoria, Victoria, BC, Canada
Daniel M. German
Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology, Sapporo, Japan
Eun-Hye Choi

Authors

Masanari Kondo
View author publications
You can also search for this author inPubMed Google Scholar
Daniel M. German
View author publications
You can also search for this author inPubMed Google Scholar
Osamu Mizuno
View author publications
You can also search for this author inPubMed Google Scholar
Eun-Hye Choi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Masanari Kondo.

Additional information

Communicated by: Nachiappan Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this Appendix, we show the actual values of the three evaluation measures corresponding to the results of the rank in RQ1, RQ2, and RQ3. In addition, we show the correlation across the modified NCCKW.

Table 15 The median values of AUC for each context metric variant and studied project

Full size table

Table 16 The median values of MCC for each context metric variant and studied project

Full size table

Table 17 The median values of Brier score for each context metric variant and studied project

Full size table

Table 18 The median values of AUC for each context metric, each extended context metric, COMB, each indentation metric, the change metrics and each of the change metrics

Full size table

Table 19 The median values of MCC for each context metric, each extended context metric, COMB, each indentation metric, the change metrics and each of the change metrics

Full size table

Table 20 The median values of Brier score for each context metric, each extended context metric, COMB, each indentation metric, the change metrics and each of the change metrics

Full size table

Table 21 Spearman rank correlation between NCCW and modified NCCKWs in the studied projects

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kondo, M., German, D.M., Mizuno, O. et al. The impact of context metrics on just-in-time defect prediction. Empir Software Eng 25, 890–939 (2020). https://doi.org/10.1007/s10664-019-09736-3

Download citation

Published: 08 August 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10664-019-09736-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of context metrics on just-in-time defect prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

An empirical evaluation of defect prediction approaches in within-project and cross-project context

Defect prediction model of static code features for cross-company and cross-project software

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now