Skip to main content

An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned

  • Conference paper
  • First Online:
  • 507 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 925))

Abstract

The number of research papers on defect prediction has sharply increased for the last decade or so. One of the main driving forces behind it has been the publicly available datasets for defect prediction such as the PROMISE repository. These publicly available datasets make it possible for numerous researchers to conduct various experiments on defect prediction without having to collect data themselves. However, there are potential problems that have been ignored. First, there is a potential risk that the knowledge accumulated in the research community is, over time, likely to overfit to the datasets that are repeatedly used in numerous studies. Second, as software development practices commonly employed in the field evolve over time, these changes may potentially affect the relation between defect-proneness and software metrics, which would not be reflected in the existing datasets. In fact, these potential risks can be addressed to a significant degree, if new datasets can be prepared easily. As a step toward that goal, we introduce an open-source software metric tool, SMD (Software Metric tool for Defect prediction) that can generate code metrics and process metrics for a given Java software project in a Git repository. In our case study where we compare existing datasets with the datasets re-generated from the same software projects using our tool, we found that the two datasets are not identical with each other, despite the fact that the metric values we obtained conform to the definitions of their corresponding metrics. We learned that there are subtle factors to consider when generating and using metrics for defect prediction.

B. Gabdrakhmanov and A. Tolkachev—These authors contributed equally to the work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Note that the NASA datasets mentioned in [7] are available in the PROMISE repository.

References

  1. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  2. D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of MSR 2010, 7th IEEE working conference on mining software repositories. IEEE CS Press, pp 31–41

    Google Scholar 

  3. Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th IEEE/ACM international conference on software engineering, ICSE 2015, Florence, Italy, 16–24 May 2015, vol 1, pp 789–800

    Google Scholar 

  4. Jureczko M (2011) Significance of different software metrics in defect prediction. Softw Eng Int J 1(1):86–95

    Google Scholar 

  5. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, PROMISE, p 9

    Google Scholar 

  6. Madeyski L, Kawalerowicz M (2017) Continuous defect prediction: the idea and a related dataset. In: Proceedings of the 14th international conference on mining software repositories. IEEE Press, pp 515–518

    Google Scholar 

  7. Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518

    Article  Google Scholar 

  8. Osman H (2017) An extensive analysis of efficient bug prediction configurations. In: Proceedings of the 13th international conference on predictive models and data analytics in software engineering. ACM, pp 107–116

    Google Scholar 

  9. Parr T (2013) The definitive ANTLR 4 reference, 2nd edn. Pragmatic Bookshelf

    Google Scholar 

  10. Moser R, Pedrycz W, Giancarlo S (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of ICSE, the ACM/IEEE international conference on software engineering, pp 181–90

    Google Scholar 

  11. Rahman F, Devanbu PT (2013) How, and why, process metrics are better. In: 35th International conference on software engineering (ICSE), pp 432–441

    Google Scholar 

  12. Shepperd MJ, Bowes D, Hall T (2014) Researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616

    Article  Google Scholar 

  13. Shepperd MJ, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215

    Article  Google Scholar 

  14. Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 1–5

    Google Scholar 

  15. Spinellis D (2005) Tool writing: a forgotten art? (software tools). IEEE Softw 22(4):9–11

    Article  Google Scholar 

  16. Varela ASN, Pérez-González HG, Martínez-Perez FE, Soubervielle-Montalvo C (2017) Source code metrics: a systematic mapping study. J Syst Softw 128:164–197

    Article  Google Scholar 

  17. Zimmermann T, Premraj R, Zeller A (May 2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jooyong Yi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gabdrakhmanov, B., Tolkachev, A., Succi, G., Yi, J. (2020). An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned. In: Ciancarini, P., Mazzara, M., Messina, A., Sillitti, A., Succi, G. (eds) Proceedings of 6th International Conference in Software Engineering for Defence Applications. SEDA 2018. Advances in Intelligent Systems and Computing, vol 925. Springer, Cham. https://doi.org/10.1007/978-3-030-14687-0_7

Download citation

Publish with us

Policies and ethics