Abstract
The purpose of this study is to determine a type of software metric at file level exhibiting the best prediction performance. Studies have shown that software process metrics are better predictors of software faults than software product metrics. However, there is need for a specific software process metric which can guarantee the best fault prediction performances consistently across different experimental contexts. We collected software metrics data from Open Source Software projects. We used logistic regression and linear regression algorithms to predict bug status and number of bugs corresponding to a file, respectively. The prediction performance of these models was evaluated against numerical and graphical prediction model performance measures. We found that change burst metrics exhibit the best numerical performance measures and have the highest fault detection probability and least cost of misclassification of software components.
Similar content being viewed by others
References
Stol KJ, Babar MA (2010) A comparison framework for open source software evaluation methods. IFIP international conference on open source systems. Springer, Berlin, pp 389–394
Malanga KN, Mehat J, Ganchev I, Wandeto J, Shivachi C (2015) Evaluation of open source software with QualiPSO OMM: a case for Bungeni and AT4AM for all, Free and Open Source Software Conference (FOSSC-15), Sultan Qaboos University (SQU), Muscat, pp 41–46
Ndenga MK, Mehat J, Ganchev I, Franklin W (2016) Predicting faults in open source software: trends and challenges. In: Crowston K, Hammouda I, Lindman J, Lundell B, Robles G (eds) Proceedings of the Doctoral Consortium at the 12th International Conference on Open Source Systems. University of Skövde, Gothenburg, pp 55–73
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577
Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering (ISSRE). IEEE, pp 309–318
Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17
Giger E, Pinzger M, Gall HC (2011) Comparing fine-grained source code changes and code churn for bug prediction. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 83–92
Nguyen TH, Adams B, Hassan AE (2010) Studying the impact of dependency network measures on software quality. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 4–14
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 9
Okutan A, Yildiz OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM, pp 452–461
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of ICSE 2005, 27th international conference on software engineering. IEEE, pp 284–292
Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th international conference on Software engineering. ACM, pp 531–540
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: ICSE’08, ACM/IEEE 30th international conference on software engineering, 2008. IEEE, pp 181–190
Kamei Y, Matsumoto S, Monden A, Matsumoto KI, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10
Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787
Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security, ACM, pp 529–540
Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality. In: ICSE’08, ACM/IEEE 30th international conference on software engineering, 2008. IEEE, pp 521–530
Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 13–23
Witten IH, Frank E, Mark A (2011) Hall. 2011. Data mining: practical machine learning tools and techniques 3:125–180
Sasaki Y (2007) The truth of the F-measure. Teach Tutor Mater 1(5)
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 172–181
Cios KJ, Swiniarski RW, Pedrycz W, Kurgan LA (2007) The knowledge discovery process. In: Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (eds) Data mining. Springer, pp 9–24
Jiang Y, Cukic B, Menzies T (2008) Cost curve evaluation of fault prediction models. In: ISSRE 2008. 19th international symposium on software reliability engineering, 2008. IEEE, pp 197–206
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473
Ndenga MK, Jean M, Ganchev I, Franklin W (2015) Assessing quality of open source software based on community metrics. Int J Softw Eng Appl 9(12):337–348
Github (2016) MetricsGrimoire/Bicho. Github: https://github.com/MetricsGrimoire/Bicho, 10 Sept 2016
Github (2016) MetricsGrimoire/CVSAnalY. Github: https://github.com/MetricsGrimoire/CVSAnalY, 10 Sept 2016
Scientific Toolworks. Understand static code analysis tool. 1996–2016. https://scitools.com. Accessed 10 Sept 2016
Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 15–25
Bissyande TF, Thung F, Wang S, Lo D, Jiang L, Reveillere L (2013) Empirical evaluation of bug linking. In: 2013 17th European conference on software maintenance and reengineering (CSMR). IEEE, pp 89–98
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Park H (2013) An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. J Korean Acad Nurs 43(2):154–164
Aggarwal CC (ed) (2014) Data classification: algorithms and applications. CRC Press, Boca Raton
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Radjenovic D, Hericko M, Torkar R, Zivkovic A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418
Acknowledgements
We would like to thank Dr. Rabah Mazouzi for his technical support.
Funding
Funding was provided by French Government Embassy in Nairobi.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ndenga, M.K., Ganchev, I., Mehat, J. et al. Performance and cost-effectiveness of change burst metrics in predicting software faults. Knowl Inf Syst 60, 275–302 (2019). https://doi.org/10.1007/s10115-018-1241-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1241-7