Abstract
Traditional just-in-time defect prediction approaches have been using changed lines of software to predict defective-changes in software development. However, they disregard information around the changed lines. Our main hypothesis is that such information has an impact on the likelihood that the change is defective. To take advantage of this information in defect prediction, we consider n-lines (n = 1,2,…) that precede and follow the changed lines (which we call context lines), and propose metrics that measure them, which we call “Context Metrics.” Specifically, these context metrics are defined as the number of words/keywords in the context lines. In a large-scale empirical study using six open source software projects, we compare the performance of using our context metrics, traditional code churn metrics (e.g., the number of modified subsystems), our extended context metrics which measure not only context lines but also changed lines, and combination metrics that use two extended context metrics at a prediction model for defect prediction. The results show that context metrics that consider the context lines of added-lines achieve the best median value in all cases in terms of a statistical test. Moreover, using few number of context lines is suitable for context metric that considers words, and using more number of context lines is suitable for context metric that considers keywords. Finally, the combination metrics of two extended context metrics significantly outperform all studied metrics in all studied projects w. r. t. the area under the receiver operation characteristic curve (AUC) and Matthews correlation coefficient (MCC).
















Similar content being viewed by others
Notes
Note that a chunk is able to be of type ‘+’, ‘-’ and ‘all’ at once. In this case, a chunk includes at least two lines that consist of at least one ‘+’ and ‘-’ line.
The keywords refer to reserved words (statements) in C++ that are shown by Microsoft Visual Studio (Microsoft 2016). Because the reserved words of C++ and Java are almost the same, we use the keywords for the projects in Java. We separate the reserved words that include underscores. For instance, we convert “__if_exists” into “if” and “exists”.
Here, a source file is a file with the name ending in java, c, h, cpp, hpp, cxx, or hxx, since we analyze both C++ and Java.
https://github.com/doofuslarge/lscp. lscp separates complex identifiers into its component words —e.g., converts GetBoolArg into Get, Bool, Arg).
Note that while higher values of AUC and MCC are better than lower values, lower values of Brier score are better than higher values. This is because Brier score is the sum of the mean squared differences between predicted probabilities and actual binary labels.
COMB are two context metrics NCCW and gotoNCCKW. Hence, we study NCCW and gotoNCCKW instead of COMB.
Here, the coefficient means the left-singular vector. We conduct the PCA using singular vector decomposition.
The first principal component means the input metrics set that can very retain the original metrics variance.
Metrics which represent higher variance of the studied metrics have higher coefficient in the first principal component.
References
Aversano L, Cerulo L, Del Grosso C (2007) Learning from bug-introducing changes to prevent fault prone code. In: Proceedings of the 9th international workshop on principles of software evolution (IWPSE), pp 19–26. ACM
Basili V R, Briand L C, Melo W L (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Bettenburg N, Nagappan M, Hassan A E (2012) Think locally, act globally: Improving defect and effort prediction models. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR), pp 60–69. IEEE Press
Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS One 12(6):e0177,678
Bowes D, Hall T, Gray D (2012) Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In: Proceedings of the 8th international conference on predictive models in software engineering, pp 109–118. ACM
Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Mining 10(1):35
Cohen J (1988) Statistical power analysis for the behavioral sciences
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of the 7th working conference on mining software repositories (MSR), pp 31–41. IEEE
Farrar D E, Glauber R R (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49(1):92–107
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 172–181. ACM
Ghotra B, McIntosh S, Hassan A E (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering (ICSE), pp 789–800. IEEE Press
Graves T L, Karr A F, Marron J S, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Halstead M H (1977) Elements of software science. Elsevier, New York
Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Proceedings of the international workshop on artificial neural networks, pp 195–201. Springer
Hassan A E (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering (ICSE), pp 78–88. IEEE
Hata H, Mizuno O, Kikuno T (2012) Bug prediction based on fine-grained module histories. In: Proceedings of the 34th international conference on software engineering (ICSE), pp 200–210. IEEE
Hindle A, Godfrey M W, Holt R C (2008) Reading beside the lines: Indentation as a proxy for complexity metric. In: Proceedings of the 16th international conference on program comprehension (ICPC), pp 133–142. IEEE
Ho T K (1995) Random decision forests. In: Proceedings of the 3rd international conference on document analysis and recognition, vol 1, pp 278–282. IEEE
Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: Proceedings of the 28th international conference on automated software engineering (ASE), pp 279–289. IEEE
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan A E (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(5):2072–2106
Kamei Y, Shihab E, Adams B, Hassan A E, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Karunanithi N (1993) A neural network approach for software reliability growth modeling in the presence of code churn. In: Proceedings of the 4th international symposium on software reliability engineering, pp 310–317. IEEE
Khoshgoftaar T M, Allen E B, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: Proceedings of the 7th international symposium on software reliability engineering, pp 364–371. IEEE
Khoshgoftaar T M, Szabo R M (1994) Improving code churn predictions during the system test and maintenance phases. In: Proceedings of the international conference on software maintenance (ICSM), pp 58–67. IEEE
Kim S, Whitehead Jr E J (2006) How long did it take to fix bugs?. In: Proceedings of the 2006 international workshop on Mining software repositories (MSR), pp 173–174. ACM
Kim S, Whitehead Jr E J, Zhang Y (2008) Classifying software changes: Clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33th international conference on software engineering (ICSE), pp 481–490. IEEE
Kim S, Zimmermann T, Whitehead Jr E J, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering (ICSE), pp 489–498. IEEE
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Li J, He P, Zhu J, Lyu M R (2017) Software defect prediction via convolutional neural network. In: Proceedings of the 2017 software quality, reliability and security (QRS), pp 318–328. IEEE
McCabe T J (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
McDonald J H (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407
Microsoft (2016) Overview of c++ statements. https://docs.microsoft.com/ja-jp/cpp/cpp/overview-of-cpp-statements https://docs.microsoft.com/ja-jp/cpp/cpp/overview-of-cpp-statements
Mizuno O, Kikuno T (2007) Training on errors experiment to detect fault-prone software modules by spam filter. In: Proceedings of the 6th joint meeting on foundations of software engineering (ESEC/FSE), pp 405–414. ACM
Mockus A, Votta L G (2000) Identifying reasons for software changes using historic databases. In: Proceedings of the 22th international conference on software maintenance (ICSE), pp 120–130. IEEE
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering (ICSE), pp 181–190. IEEE
Munson J C, Elbaum S G (1998) Code churn: A measure for estimating the impact of code change. In: Proceedings of the international conference on software maintenance, pp 24–31. IEEE
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering, pp 284–292. ACM
Ohlsson M C, Von Mayrhauser A, McGuire B, Wohlin C (1999) Code decay analysis of legacy software through successive releases. In: Proceedings of the aerospace conference, proceedings, pp 69–81. IEEE
Oram A, Wilson G (2010) Making software: What really works, and why we believe it. ” O’Reilly Media Inc.”
Ostrand T J, Weyuker E J, Bell R M (2004) Where the bugs are. In: ACM SIGSOFT Software Engineering Notes, vol 29, pp 86–96. ACM
Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers
Rice M E, Harris G T (2005) Comparing effect sizes in follow-up studies: Roc area, cohen’s d, and r. Law Hum Behav 29(5):615–620
Romanski P, Kotthoff L (2018) Fselector
Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 10th joint meeting on foundations of software engineering, ESEC/FSE, pp 966–969. ACM
Shannon C E (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Shepperd M, Bowes D, Hall T (2014) Researcher bias: The use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616
Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. Queen’s University (Canada), Ph.D. thesis
Shihab E, Hassan A E, Adams B, Jiang Z M (2012) An industrial study on the risk of software changes. In: Proceedings of the 20th international symposium on the foundations of software engineering (FSE), p. 62. ACM
Śliwerski J., Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proceedings of the 2th international workshop on mining software repositories (MSR), 4, pp 1–5. ACM
Stevenson A, Lindberg C A (2010) New Oxford American dictionary. Oxford University Press, Oxford
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the 37th international conference on software engineering (ICSE), pp 99–108. IEEE
Tantithamthavorn C, Hassan A E (2018) An experience report on defect modelling in practice: Pitfalls and challenges. In: Proceedings of the 40th international conference on software engineering: Software engineering in practice track (ICSE-SEIP’18), p To Appear
Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 321–332. ACM
Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18
Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology
Thomas WS (2015) lscp: A lightweight source code preprocesser
Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 297–308. ACM
Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: Proceedings of the 2015 software quality, reliability and security (QRS), pp 17–26. IEEE
Zhang F, Zheng Q, Zou Y, Hassan A E (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 309–320. ACM
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the 3th international workshop on predictor models in software engineering (PROMISE), pp 9–19. IEEE
Zwillinger D, Kokoska S (1999) CRC standard probability and statistics tables and formulae. CRC Press
Acknowledgment
This work was partially supported by NSERC Canada as well as JSPS KAKENHI Japan (Grant Numbers: JP16K12415).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Nachiappan Nagappan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this Appendix, we show the actual values of the three evaluation measures corresponding to the results of the rank in RQ1, RQ2, and RQ3. In addition, we show the correlation across the modified NCCKW.
Rights and permissions
About this article
Cite this article
Kondo, M., German, D.M., Mizuno, O. et al. The impact of context metrics on just-in-time defect prediction. Empir Software Eng 25, 890–939 (2020). https://doi.org/10.1007/s10664-019-09736-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09736-3