Performance and cost-effectiveness of change burst metrics in predicting software faults

Ndenga, Malanga Kennedy; Ganchev, Ivaylo; Mehat, Jean; Wabwoba, Franklin; Akdag, Herman

doi:10.1007/s10115-018-1241-7

Performance and cost-effectiveness of change burst metrics in predicting software faults

Regular Paper
Published: 14 July 2018

Volume 60, pages 275–302, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Malanga Kennedy Ndenga ORCID: orcid.org/0000-0001-8435-5084^1,2,
Ivaylo Ganchev¹,
Jean Mehat¹,
Franklin Wabwoba^1,3 &
…
Herman Akdag¹

281 Accesses
2 Citations
Explore all metrics

Abstract

The purpose of this study is to determine a type of software metric at file level exhibiting the best prediction performance. Studies have shown that software process metrics are better predictors of software faults than software product metrics. However, there is need for a specific software process metric which can guarantee the best fault prediction performances consistently across different experimental contexts. We collected software metrics data from Open Source Software projects. We used logistic regression and linear regression algorithms to predict bug status and number of bugs corresponding to a file, respectively. The prediction performance of these models was evaluated against numerical and graphical prediction model performance measures. We found that change burst metrics exhibit the best numerical performance measures and have the highest fault detection probability and least cost of misclassification of software components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Software defect prediction: future directions and challenges

Article 27 February 2024

Concept and Principles of Measurement

Notes

References

Stol KJ, Babar MA (2010) A comparison framework for open source software evaluation methods. IFIP international conference on open source systems. Springer, Berlin, pp 389–394
Google Scholar
Malanga KN, Mehat J, Ganchev I, Wandeto J, Shivachi C (2015) Evaluation of open source software with QualiPSO OMM: a case for Bungeni and AT4AM for all, Free and Open Source Software Conference (FOSSC-15), Sultan Qaboos University (SQU), Muscat, pp 41–46
Ndenga MK, Mehat J, Ganchev I, Franklin W (2016) Predicting faults in open source software: trends and challenges. In: Crowston K, Hammouda I, Lindman J, Lundell B, Robles G (eds) Proceedings of the Doctoral Consortium at the 12th International Conference on Open Source Systems. University of Skövde, Gothenburg, pp 55–73
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577
Article Google Scholar
Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering (ISSRE). IEEE, pp 309–318
Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17
Article Google Scholar
Giger E, Pinzger M, Gall HC (2011) Comparing fine-grained source code changes and code churn for bug prediction. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 83–92
Nguyen TH, Adams B, Hassan AE (2010) Studying the impact of dependency network measures on software quality. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 4–14
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 9
Okutan A, Yildiz OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Article Google Scholar
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM, pp 452–461
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of ICSE 2005, 27th international conference on software engineering. IEEE, pp 284–292
Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th international conference on Software engineering. ACM, pp 531–540
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: ICSE’08, ACM/IEEE 30th international conference on software engineering, 2008. IEEE, pp 181–190
Kamei Y, Matsumoto S, Monden A, Matsumoto KI, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10
Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787
Article Google Scholar
Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security, ACM, pp 529–540
Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality. In: ICSE’08, ACM/IEEE 30th international conference on software engineering, 2008. IEEE, pp 521–530
Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 13–23
Witten IH, Frank E, Mark A (2011) Hall. 2011. Data mining: practical machine learning tools and techniques 3:125–180
Google Scholar
Sasaki Y (2007) The truth of the F-measure. Teach Tutor Mater 1(5)
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Article Google Scholar
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 172–181
Cios KJ, Swiniarski RW, Pedrycz W, Kurgan LA (2007) The knowledge discovery process. In: Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (eds) Data mining. Springer, pp 9–24
Jiang Y, Cukic B, Menzies T (2008) Cost curve evaluation of fault prediction models. In: ISSRE 2008. 19th international symposium on software reliability engineering, 2008. IEEE, pp 197–206
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473
Article Google Scholar
Ndenga MK, Jean M, Ganchev I, Franklin W (2015) Assessing quality of open source software based on community metrics. Int J Softw Eng Appl 9(12):337–348
Google Scholar
Github (2016) MetricsGrimoire/Bicho. Github: https://github.com/MetricsGrimoire/Bicho, 10 Sept 2016
Github (2016) MetricsGrimoire/CVSAnalY. Github: https://github.com/MetricsGrimoire/CVSAnalY, 10 Sept 2016
Scientific Toolworks. Understand static code analysis tool. 1996–2016. https://scitools.com. Accessed 10 Sept 2016
Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 15–25
Bissyande TF, Thung F, Wang S, Lo D, Jiang L, Reveillere L (2013) Empirical evaluation of bug linking. In: 2013 17th European conference on software maintenance and reengineering (CSMR). IEEE, pp 89–98
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York
Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Park H (2013) An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. J Korean Acad Nurs 43(2):154–164
Article Google Scholar
Aggarwal CC (ed) (2014) Data classification: algorithms and applications. CRC Press, Boca Raton
Google Scholar
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Article Google Scholar
Radjenovic D, Hericko M, Torkar R, Zivkovic A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418
Article Google Scholar

Download references

Acknowledgements

We would like to thank Dr. Rabah Mazouzi for his technical support.

Funding

Funding was provided by French Government Embassy in Nairobi.

Author information

Authors and Affiliations

University of Paris 8, Saint-Denis, France
Malanga Kennedy Ndenga, Ivaylo Ganchev, Jean Mehat, Franklin Wabwoba & Herman Akdag
Dedan Kimathi University of Technology, Nyeri, Kenya
Malanga Kennedy Ndenga
Kibabii University, Bungoma, Kenya
Franklin Wabwoba

Authors

Malanga Kennedy Ndenga
View author publications
You can also search for this author in PubMed Google Scholar
Ivaylo Ganchev
View author publications
You can also search for this author in PubMed Google Scholar
Jean Mehat
View author publications
You can also search for this author in PubMed Google Scholar
Franklin Wabwoba
View author publications
You can also search for this author in PubMed Google Scholar
Herman Akdag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malanga Kennedy Ndenga.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ndenga, M.K., Ganchev, I., Mehat, J. et al. Performance and cost-effectiveness of change burst metrics in predicting software faults. Knowl Inf Syst 60, 275–302 (2019). https://doi.org/10.1007/s10115-018-1241-7

Download citation

Received: 24 March 2017
Revised: 01 May 2018
Accepted: 26 May 2018
Published: 14 July 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10115-018-1241-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance and cost-effectiveness of change burst metrics in predicting software faults

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Software defect prediction: future directions and challenges

Concept and Principles of Measurement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance and cost-effectiveness of change burst metrics in predicting software faults

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Software defect prediction: future directions and challenges

Concept and Principles of Measurement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation