Skip to main content
Log in

Performance and cost-effectiveness of change burst metrics in predicting software faults

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The purpose of this study is to determine a type of software metric at file level exhibiting the best prediction performance. Studies have shown that software process metrics are better predictors of software faults than software product metrics. However, there is need for a specific software process metric which can guarantee the best fault prediction performances consistently across different experimental contexts. We collected software metrics data from Open Source Software projects. We used logistic regression and linear regression algorithms to predict bug status and number of bugs corresponding to a file, respectively. The prediction performance of these models was evaluated against numerical and graphical prediction model performance measures. We found that change burst metrics exhibit the best numerical performance measures and have the highest fault detection probability and least cost of misclassification of software components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://jmeter.apache.org/.

  2. https://wiki.gnome.org/Apps/Gedit.

  3. https://poi.apache.org/.

  4. https://www.gimp.org/.

  5. https://github.com/.

  6. https://www.bugzilla.org/.

  7. https://subversion.apache.org/.

  8. https://scitools.com.

References

  1. Stol KJ, Babar MA (2010) A comparison framework for open source software evaluation methods. IFIP international conference on open source systems. Springer, Berlin, pp 389–394

    Google Scholar 

  2. Malanga KN, Mehat J, Ganchev I, Wandeto J, Shivachi C (2015) Evaluation of open source software with QualiPSO OMM: a case for Bungeni and AT4AM for all, Free and Open Source Software Conference (FOSSC-15), Sultan Qaboos University (SQU), Muscat, pp 41–46

  3. Ndenga MK, Mehat J, Ganchev I, Franklin W (2016) Predicting faults in open source software: trends and challenges. In: Crowston K, Hammouda I, Lindman J, Lundell B, Robles G (eds) Proceedings of the Doctoral Consortium at the 12th International Conference on Open Source Systems. University of Skövde, Gothenburg, pp 55–73

  4. D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577

    Article  Google Scholar 

  5. Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering (ISSRE). IEEE, pp 309–318

  6. Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17

    Article  Google Scholar 

  7. Giger E, Pinzger M, Gall HC (2011) Comparing fine-grained source code changes and code churn for bug prediction. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 83–92

  8. Nguyen TH, Adams B, Hassan AE (2010) Studying the impact of dependency network measures on software quality. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10

  9. Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 4–14

  10. Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 9

  11. Okutan A, Yildiz OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181

    Article  Google Scholar 

  12. Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM, pp 452–461

  13. Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of ICSE 2005, 27th international conference on software engineering. IEEE, pp 284–292

  14. Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th international conference on Software engineering. ACM, pp 531–540

  15. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: ICSE’08, ACM/IEEE 30th international conference on software engineering, 2008. IEEE, pp 181–190

  16. Kamei Y, Matsumoto S, Monden A, Matsumoto KI, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10

  17. Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787

    Article  Google Scholar 

  18. Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security, ACM, pp 529–540

  19. Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality. In: ICSE’08, ACM/IEEE 30th international conference on software engineering, 2008. IEEE, pp 521–530

  20. Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 13–23

  21. Witten IH, Frank E, Mark A (2011) Hall. 2011. Data mining: practical machine learning tools and techniques 3:125–180

    Google Scholar 

  22. Sasaki Y (2007) The truth of the F-measure. Teach Tutor Mater 1(5)

  23. Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595

    Article  Google Scholar 

  24. Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 172–181

  25. Cios KJ, Swiniarski RW, Pedrycz W, Kurgan LA (2007) The knowledge discovery process. In: Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (eds) Data mining. Springer, pp 9–24

  26. Jiang Y, Cukic B, Menzies T (2008) Cost curve evaluation of fault prediction models. In: ISSRE 2008. 19th international symposium on software reliability engineering, 2008. IEEE, pp 197–206

  27. Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473

    Article  Google Scholar 

  28. Ndenga MK, Jean M, Ganchev I, Franklin W (2015) Assessing quality of open source software based on community metrics. Int J Softw Eng Appl 9(12):337–348

    Google Scholar 

  29. Github (2016) MetricsGrimoire/Bicho. Github: https://github.com/MetricsGrimoire/Bicho, 10 Sept 2016

  30. Github (2016) MetricsGrimoire/CVSAnalY. Github: https://github.com/MetricsGrimoire/CVSAnalY, 10 Sept 2016

  31. Scientific Toolworks. Understand static code analysis tool. 1996–2016. https://scitools.com. Accessed 10 Sept 2016

  32. Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 15–25

  33. Bissyande TF, Thung F, Wang S, Lo D, Jiang L, Reveillere L (2013) Empirical evaluation of bug linking. In: 2013 17th European conference on software maintenance and reengineering (CSMR). IEEE, pp 89–98

  34. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York

    Google Scholar 

  35. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  Google Scholar 

  36. Park H (2013) An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. J Korean Acad Nurs 43(2):154–164

    Article  Google Scholar 

  37. Aggarwal CC (ed) (2014) Data classification: algorithms and applications. CRC Press, Boca Raton

    Google Scholar 

  38. Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304

    Article  Google Scholar 

  39. Radjenovic D, Hericko M, Torkar R, Zivkovic A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Dr. Rabah Mazouzi for his technical support.

Funding

Funding was provided by French Government Embassy in Nairobi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malanga Kennedy Ndenga.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ndenga, M.K., Ganchev, I., Mehat, J. et al. Performance and cost-effectiveness of change burst metrics in predicting software faults. Knowl Inf Syst 60, 275–302 (2019). https://doi.org/10.1007/s10115-018-1241-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1241-7

Keywords

Navigation