ReviewA systematic review of software fault prediction studies
Introduction
This paper reviews several journal articles and conference papers on software fault prediction to evaluate the progress and direct future research on this software engineering problem. Many researchers used different approaches such as genetic programming (Evett, Khoshgoftaar, Chien, & Allen, 1998), neural networks (Thwin & Quah, 2003), case-based reasoning (El Emam, Benlarbi, Goel, & Rai, 2001), fuzzy logic (Yuan, Khoshgoftaar, Allen, & Ganesan, 2000), Dempster–Shafer networks (Guo, Cukic, & Singh, 2003), decision trees (Khoshgoftaar & Seliya, 2002), Naı¨ve Bayes (Menzies, Greenwald, & Frank, 2007), and logistic regression (Denaro et al., 2003, Schneidewind, 2001) to predict software faults before testing process. We applied Artificial Immune Systems paradigm for fault prediction during our Fault Prediction Research Program (Catal and Diri, 2007a, Catal and Diri, 2007b, Catal and Diri, 2008).
This review does not describe all these prediction models for practitioners in detail. Our aim is to classify studies with respect to metrics, methods, and datasets that have been used in these prediction papers. We evaluated papers published before and after 2005 with respect to metrics, methods, and datasets because PROMISE repository has been created in 2005. PROMISE repository includes a collection of public datasets to build repeatable, refutable and verifiable models of software engineering and it was inspired by UCI Machine Learning Repository which is widely used by researchers in Machine Learning area (Sayyad & Menzies, 2005).
Jorgensen and Shepperd (2007) provided a systematic review of software development cost estimation studies and our review methodology is similar to their methodology. According to our knowledge, this is the first study which provides a systematic review of software fault prediction studies from different perspectives. We posed the eight research questions shown in Table 1 and these questions helped us to collect the necessary information from papers in our review process.
This paper is organized as follows: Section 2 describes the review process. Section 3 reports the results. Section 4 suggests issues for future research on software fault prediction.
Section snippets
Inclusion criteria
We included papers in our review if the paper describes research on software fault prediction and software quality prediction. We excluded position proceedings and papers which do not include experimental results. Papers with respect to their years, datasets, metrics, techniques, evaluation criteria and results have been examined. The inclusion of papers was based on the similarity degree of the study with fault prediction research topic. The exclusion did not take into account the publication
Results
Twenty-seven journal papers and 47 conference proceedings have been evaluated in this review systematically. Publication years of papers are between year 1990 and 2007. Fig. 1 is a curve which plots publication year on the x-axis and the number of papers published in that year on the y-axis for papers in review.
Sixty-one percentage of papers are conference proceedings, 36% of papers are journal papers, and 3% of papers are book chapters.
Each subsection of this section will address each research
Conclusion
This paper reviewed software fault prediction papers published in conference proceedings and journals to evaluate the progress and direct future research on software fault prediction. We evaluated papers with a specific focus on types of metrics, methods, and datasets and did not describe all the prediction models in detail. The aim was to classify studies with respect to metrics, methods, and datasets that have been used in fault prediction papers. We evaluated papers published before and
Acknowledgements
This study is supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant 107E213. The findings and opinions in this study belong solely to the authors, and are not necessarily those of the sponsor.
References (36)
- et al.
Comparing case-based reasoning classifiers for predicting high risk software components
Journal of Systems and Software
(2001) - et al.
A survey of component based system quality assurance and assessment
Information and Software Technology
(2005) - Abreu, F. B. e., & Carapuca, R. (1994). Object-oriented software engineering: Measuring and controlling the development...
- Abreu, F. B. e., & Melo, W. (1996). Evaluating the impact of object-oriented design on software quality. In Proceedings...
- et al.
A hierarchical model for object-oriented design quality assessment
IEEE Transactions on Software Engineering
(2002) - Bibi, S., Tsoumakas, G., Stamelos, I., & Vlahvas, I. (2006). Software defect prediction using regression via...
- Catal, C., & Diri, B. (2007a). Software defect prediction using artificial immune recognition system. In Proceedings of...
- et al.
Software fault prediction with object-oriented metrics based artificial immune recognition system
- et al.
A fault prediction model with limited fault data to improve test process. Product focused software process improvement
- et al.
A metrics suite for object-oriented design
IEEE Transactions on Software Engineering
(1994)
Towards industrially relevant fault-proneness models
International Journal of Software Engineering and Knowledge Engineering
Component based measurement: Few useful guidelines
SIGSOFT Software Engineering Notes
Few important considerations for deriving interface complexity metric for component-based systems
SIGSOFT Software Engineering Notes
Predicting fault prone modules by the Dempster–Shafer belief networks
Elements of software science
Fault prediction using early lifecycle data
Cited by (430)
A tertiary study on links between source code metrics and external quality attributes
2024, Information and Software TechnologyBug severity classification in software using ant colony optimization based feature weighting technique
2023, Expert Systems with ApplicationsIndustrial applications of software defect prediction using machine learning: A business-driven systematic literature review
2023, Information and Software TechnologyApplication of Deep Learning in Software Defect Prediction: Systematic Literature Review and Meta-analysis
2023, Information and Software TechnologyOn the relationship between source-code metrics and cognitive load: A systematic tertiary review
2023, Journal of Systems and SoftwareExamining the performance of kernel methods for software defect prediction based on support vector machine
2023, Science of Computer Programming