Abstract
The number of research papers on defect prediction has sharply increased for the last decade or so. One of the main driving forces behind it has been the publicly available datasets for defect prediction such as the PROMISE repository. These publicly available datasets make it possible for numerous researchers to conduct various experiments on defect prediction without having to collect data themselves. However, there are potential problems that have been ignored. First, there is a potential risk that the knowledge accumulated in the research community is, over time, likely to overfit to the datasets that are repeatedly used in numerous studies. Second, as software development practices commonly employed in the field evolve over time, these changes may potentially affect the relation between defect-proneness and software metrics, which would not be reflected in the existing datasets. In fact, these potential risks can be addressed to a significant degree, if new datasets can be prepared easily. As a step toward that goal, we introduce an open-source software metric tool, SMD (Software Metric tool for Defect prediction) that can generate code metrics and process metrics for a given Java software project in a Git repository. In our case study where we compare existing datasets with the datasets re-generated from the same software projects using our tool, we found that the two datasets are not identical with each other, despite the fact that the metric values we obtained conform to the definitions of their corresponding metrics. We learned that there are subtle factors to consider when generating and using metrics for defect prediction.
B. Gabdrakhmanov and A. Tolkachev—These authors contributed equally to the work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Note that the NASA datasets mentioned in [7] are available in the PROMISE repository.
References
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of MSR 2010, 7th IEEE working conference on mining software repositories. IEEE CS Press, pp 31–41
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th IEEE/ACM international conference on software engineering, ICSE 2015, Florence, Italy, 16–24 May 2015, vol 1, pp 789–800
Jureczko M (2011) Significance of different software metrics in defect prediction. Softw Eng Int J 1(1):86–95
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, PROMISE, p 9
Madeyski L, Kawalerowicz M (2017) Continuous defect prediction: the idea and a related dataset. In: Proceedings of the 14th international conference on mining software repositories. IEEE Press, pp 515–518
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Osman H (2017) An extensive analysis of efficient bug prediction configurations. In: Proceedings of the 13th international conference on predictive models and data analytics in software engineering. ACM, pp 107–116
Parr T (2013) The definitive ANTLR 4 reference, 2nd edn. Pragmatic Bookshelf
Moser R, Pedrycz W, Giancarlo S (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of ICSE, the ACM/IEEE international conference on software engineering, pp 181–90
Rahman F, Devanbu PT (2013) How, and why, process metrics are better. In: 35th International conference on software engineering (ICSE), pp 432–441
Shepperd MJ, Bowes D, Hall T (2014) Researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616
Shepperd MJ, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 1–5
Spinellis D (2005) Tool writing: a forgotten art? (software tools). IEEE Softw 22(4):9–11
Varela ASN, Pérez-González HG, Martínez-Perez FE, Soubervielle-Montalvo C (2017) Source code metrics: a systematic mapping study. J Syst Softw 128:164–197
Zimmermann T, Premraj R, Zeller A (May 2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gabdrakhmanov, B., Tolkachev, A., Succi, G., Yi, J. (2020). An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned. In: Ciancarini, P., Mazzara, M., Messina, A., Sillitti, A., Succi, G. (eds) Proceedings of 6th International Conference in Software Engineering for Defence Applications. SEDA 2018. Advances in Intelligent Systems and Computing, vol 925. Springer, Cham. https://doi.org/10.1007/978-3-030-14687-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-14687-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14686-3
Online ISBN: 978-3-030-14687-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)