Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study

Rathore, Santosh S.; Kumar, Sandeep

doi:10.1007/s10489-021-02346-x

Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study

Published: 16 April 2021

Volume 51, pages 8945–8960, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

762 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

An effective software fault prediction (SFP) model could help developers in the quick and prompt detection of faults and thus help enhance the overall reliability and quality of the software project. Variations in the prediction performance of learning techniques for different software systems make it difficult to select a suitable learning technique for fault prediction modeling. The evaluation of previously presented SFP approaches has shown that single machine learning-based models failed to provide the best accuracy in any context, highlighting the need to use multiple techniques to build the SFP model. To solve this problem, we present and discuss a software fault prediction approach based on selecting the most appropriate learning techniques from a set of competitive and accurate learning techniques for building a fault prediction model. In work, we apply the discussed SFP approach for the five Eclipse project datasets and nine Object-oriented (OO) project datasets and report the findings of the experimental study. We have used different performance measures, i.e., AUC, accuracy, sensitivity, and specificity, to assess the discussed approach’s performance. Further, we have performed a cost-benefit analysis to evaluate the economic viability of the approach. Results showed that the presented approach predicted the software’s faults effectively for the used accuracy, AUC, sensitivity, and specificity measures with the highest achieved values of 0.816, 0.835, 0.98, and 0.903 for AUC, accuracy, sensitivity, and specificity, respectively. The cost-benefit analysis of the approach showed that it could help reduce the overall software testing cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on fault detection and diagnosis techniques: basics and beyond

Article 10 November 2020

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

Article 04 March 2022

References

Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111
Article Google Scholar
Krasner H (2018) The cost of poor quality software in the us: A 2018 report. Consortium for IT Software Quality, Tech. Rep, 10
IDC (2020) Analyze the future
Nukala S, Rau V (2018) Why sre documents matter. Queue 16(4):66–91
Article Google Scholar
Raychev V, Vechev M, Krause A (2015) Predicting program properties from” big code”. ACM SIGPLAN Not 50(1):111–124
Article MATH Google Scholar
Steve Zdancewic (2018) Technical perspective: Building bug-free compilers. Commun ACM 61 (2):83–83
Article Google Scholar
Chatterjee S, Maji B (2018) A bayesian belief network based model for predicting software faults in early phase of software development process. Appl Intell 48(8):2214–2228
Article Google Scholar
Lanza M, Mocci A, Ponzanelli L (2016) The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw 33(6):102–105
Article Google Scholar
Qiao L, Li X, Umer Q, Guo P (2020) Deep learning based software defect prediction. Neurocomputing 385:100–110
Article Google Scholar
Xiaomeng L, Stambaugh RF, Yu Y (2017) Anomalies abroad Beyond data mining. Technical report, National Bureau of Economic Research
Kalaivani N, Beena R (2018) Overview of software defect prediction using machine learning algorithms. Int J Pure Appl Math 118(20):3863–3873
Google Scholar
Arar ÖF, Ayan K (2017) A feature dependent naive bayes approach and its application to the software defect prediction problem. Appl Soft Comput 59:197–209
Article Google Scholar
Hammouri A, Hammad M, Mohammad A, Fatima A (2018) Software bug prediction using machine learning approach. Int J Adv Comput Sci Appl 9(2):78–83
Google Scholar
Kumar Lx, Sripada SK, Ashish S, Santanu KR (2018) Effective fault prediction model developed using least square support vector machine (lssvm). J Syst Softw 137:686–712
Article Google Scholar
Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Comput Electric Eng 67:15–24
Article Google Scholar
Begum M, Dohi T, et al. (2017) A neuro-based software fault prediction with box-cox power transformation. J Softw Eng Appl 10(03):288
Article Google Scholar
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327
Article Google Scholar
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Article Google Scholar
Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175
Article Google Scholar
Yucalar F, Ozcift A, Emin B, Deniz K (2020) Multiple-classifiers in software quality engineering: Combining predictors to improve software fault prediction ability. Eng Sci Technol Int J 23(4):938–950
Google Scholar
Yang X, Lo D, Xia X, Tlel JS (2017) A two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220
Article Google Scholar
Pandey SK, Mishra RB, Tripathi AK (2020) Bpdet: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl 144:113085
Article Google Scholar
Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200
Article Google Scholar
Nucci Dario Di, Palomba Fabio, Rocco Oliveto, Andrea De Lucia (2017) Dynamic selection of classifiers in bug prediction: An adaptive method. IEEE Trans Emerg Top Comput Intell 1(3):202–212
Article Google Scholar
Mousavi R, Eftekhari M, Rahdari F (2018) Omni-ensemble learning (oel): utilizing over-bagging, static and dynamic ensemble selection approaches for software defect prediction. Int J Artif Intell Tools 27 (06):1850024
Article Google Scholar
Pecorelli F, Di Nucci D (2021) Adaptive selection of classifiers for bug prediction A large-scale empirical analysis of its performances and a benchmark study. Science of Computer Programming 102611
Mousavi R, Eftekhari M (2015) A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches. Appl Soft Comput 37:652–666
Article Google Scholar
Li N, Shepperd M, Guo Y (2020) A systematic review of unsupervised learning techniques for software defect prediction. Inf Softw Technol 122:106287
Article Google Scholar
Bowes D, Hall T, Petrić J (2018) Software defect prediction: do different classifiers find the same defects? Softw Qual J 26(2):525–552
Article Google Scholar
Malhotra R, Khanna M (2019) Dynamic selection of fitness function for software change prediction using particle swarm optimization. Inf Softw Technol 112:51–67
Article Google Scholar
Sadowski C, Aftandilian E, Eagle A, Miller-Cushon L, Jaspan C (2018) Lessons from building static analysis tools at google. Commun ACM 61(4):58–66
Article Google Scholar
Basirati MR, Otasevic M, Rajavi K, Böhm M, Krcmar H (2020) Understanding the relationship of conflict and success in software development projects. Inf Softw Technol 126:106331
Article Google Scholar
Son LH, Pritam N, Khari M, Kumar R, Phuong PTM, Thong PH, et al. (2019) Empirical study of software defect prediction: a systematic mapping. Symmetry 11(2):212
Article Google Scholar
Rathore SS, Kumar S (2020) An empirical study of ensemble techniques for software fault prediction. Appl Intell 1– 30
Yohannese CW, Li T, Simfukwe M, Khurshid F (2017) Ensembles based combined learning for improved software fault prediction: A comparative study. In: 2017 12th International conference on intelligent systems and knowledge engineering (ISKE), pp 1–6. IEEE
Alsaeedi A, Khan MZ (2019) Software defect prediction using supervised machine learning and ensemble techniques: a comparative study. J Softw Eng Appl 12(5):85–100
Article Google Scholar
Cruz RM, Sabourin R, Cavalcanti GDC (2018) Dynamic classifier selection: Recent advances and perspectives. Inf Fusion 41:195–216
Article Google Scholar
Merz CJ (1996) Dynamical selection of learning algorithms. In: Learning from data. Springer, pp 281–290
Britto AS Jr, Sabourin R, Oliveira LES (2014) Dynamic selection of classifiers—a comprehensive review. Pattern Recogn 47(11):3665–3680
Article Google Scholar
Souza MA, Cavalcanti GDC, Cruz RMO, Sabourin R (2019) Online local pool generation for dynamic classifier selection. Pattern Recogn 85:132–148
Article Google Scholar
Brun AL, Britto AS Jr, Oliveira LS, Enembreck F, Sabourin R (2018) A framework for dynamic classifier selection oriented by the classification problem difficulty. Pattern Recogn 76:175– 190
Article Google Scholar
García S, Zhang ZL, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:22–37
Article MathSciNet Google Scholar
Cruz RMO, Sabourin R, Cavalcanti GDC (2018) Prototype selection for dynamic classifier and ensemble selection. Neural Comput Applic 29(2):447–457
Article Google Scholar
Mohandes M, Deriche M, Aliyu SO (2018) Classifiers combination techniques: A comprehensive review. IEEE Access 6:19626–19639
Article Google Scholar
Sagi O, Lior R (2018) Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4):e1249
Google Scholar
Bühlmann P (2012) Bagging, boosting and ensemble methods. In: Handbook of computational statistics. Springer, pp 985–1022
Xia Y, Ke C, Yang Y (2020) Multi-label classification with weighted classifier selection and stacked ensemble. Information Sciences
Krawczyk B, Galar M, Woźniak M, Bustince H, Herrera F (2018) Dynamic ensemble selection for multi-class classification with one-class classifiers. Pattern Recogn 83:34–51
Article Google Scholar
Mendes-Moreira J, Jorge AM, Soares C, de Sousa JF (2009) Ensemble learning: A study on different variants of the dynamic selection approach. In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 191–205
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: A review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24
Google Scholar
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working conference on mining software repositories (MSR 2010), pp 31–41, IEEE
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Article Google Scholar
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, pp 1–10
Wagner S (2006) A literature survey of the quality economics of defect-detection techniques. In: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pp 194–203
Kumar L, Misra S, Rath SK (2017) An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Comput Stand Int 53:1– 32
Article Google Scholar
Software quality in 2010: a survey of the state of the art
Wilde N, Huitt R (1992) Maintenance support for object-oriented programs. IEEE Trans Softw Eng 18(12):1038
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, ABV-Indian Institute of Information Technology and Management, Gwalior, India
Santosh S. Rathore
Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, India
Sandeep Kumar

Authors

Santosh S. Rathore
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandeep Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this study, we have used python programming libraries, Sklearn and imbalance to implement different used base learners and the presented approach. Following parameter values have been set for these learning techniques and presented approach.

Techniques	Parameters values
Naïve Bayes	Priors= None, var_smoothing= 1e-9, epsilon= absolute additive value to variances, sigma= variance of each feature per class, theta= mean of each feature per class.
Logistic Regression	penalty= l2, dual= False, tol= 0.0001, C = 1.0, fit_intercept= True, intercept_scaling= 1, class_weight= None, random_state= None, solver= ‘lbfgs’, max_iter= 500, multi_class= ‘auto’, verbose= 0, warm_start= False, n_jobs= None, l1_ratio= None
K-nearest Neighbor	n_neighbors= 5, weights= ‘uniform’, algorithm= ‘auto’, leaf_size= 30, power parameter= 2, metric=‘Euclidean’, metric_params= None, n_jobs= None
Decision Tree	criterion=‘gini’, splitter= ‘best’, max_depth= None, min_samples_split= 2, min_samples_leaf= 1, min_weight_fraction_leaf= 0.0, max_features= None, random_state= None, max_leaf_nodes= None, min_impurity_decrease= 0.0, min_impurity_split= None, class_weight= None, ccp_alpha= 0.0
Support Vector Machine	C = 1.0, kernel= ‘rbf’, degree= 3, gamma= ‘scale’, coef0 = 0.0, shrinking= True, probability= False, tol= 0.001, cache_size= 200, class_weight= None, verbose= False, max_iter= -1, decision_function_shape= ‘ovr’, break_ties= False, random_state= None

Techniques	Parameters values
Multilayer Perceptron	hidden_layer_sizes= 100, activation= ‘relu’, solver= ‘adam’, alpha= 0.0001, batch_size= ‘auto’, learning_rate= ‘constant’, learning_rate_init= 0.001, power_t = 0.5, max_iter= 200, shuffle= True, random_state= None, tol= 0.0001, verbose= False, warm_start= False, momentum= 0.9, nesterovs_momentum= True, early_stopping= False, validation_fraction= 0.1, beta_1 = 0.9, beta_2 = 0.999, epsilon= 1e-08, n_iter_no_change= 10, max_fun= 15000
SMOTE	sampling_strategy= ‘auto’, random_state= None, k_neighbors= 5, n_jobs= None
X-mean clustering (Weka)	binValue= 1.0, cutOffFactor= 0.5, debugLevel= 0, distanceF= Euclidian Distance-R first-last, maxIterations= 100, maxKMeans= 1000, maxKMeansForChildren= 1000, maxNumClusters= 10, minNumClusters= 2, seed= 10, and useKDTree= false

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rathore, S.S., Kumar, S. Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study. Appl Intell 51, 8945–8960 (2021). https://doi.org/10.1007/s10489-021-02346-x

Download citation

Accepted: 10 March 2021
Published: 16 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10489-021-02346-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study

Abstract

Access this article

Similar content being viewed by others

A review on fault detection and diagnosis techniques: basics and beyond

Data collection and quality challenges in deep learning: a data-centric AI perspective

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study

Abstract

Access this article

Similar content being viewed by others

A review on fault detection and diagnosis techniques: basics and beyond

Data collection and quality challenges in deep learning: a data-centric AI perspective

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation