Skip to main content
Log in

Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

An effective software fault prediction (SFP) model could help developers in the quick and prompt detection of faults and thus help enhance the overall reliability and quality of the software project. Variations in the prediction performance of learning techniques for different software systems make it difficult to select a suitable learning technique for fault prediction modeling. The evaluation of previously presented SFP approaches has shown that single machine learning-based models failed to provide the best accuracy in any context, highlighting the need to use multiple techniques to build the SFP model. To solve this problem, we present and discuss a software fault prediction approach based on selecting the most appropriate learning techniques from a set of competitive and accurate learning techniques for building a fault prediction model. In work, we apply the discussed SFP approach for the five Eclipse project datasets and nine Object-oriented (OO) project datasets and report the findings of the experimental study. We have used different performance measures, i.e., AUC, accuracy, sensitivity, and specificity, to assess the discussed approach’s performance. Further, we have performed a cost-benefit analysis to evaluate the economic viability of the approach. Results showed that the presented approach predicted the software’s faults effectively for the used accuracy, AUC, sensitivity, and specificity measures with the highest achieved values of 0.816, 0.835, 0.98, and 0.903 for AUC, accuracy, sensitivity, and specificity, respectively. The cost-benefit analysis of the approach showed that it could help reduce the overall software testing cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111

    Article  Google Scholar 

  2. Krasner H (2018) The cost of poor quality software in the us: A 2018 report. Consortium for IT Software Quality, Tech. Rep, 10

  3. IDC (2020) Analyze the future

  4. Nukala S, Rau V (2018) Why sre documents matter. Queue 16(4):66–91

    Article  Google Scholar 

  5. Raychev V, Vechev M, Krause A (2015) Predicting program properties from” big code”. ACM SIGPLAN Not 50(1):111–124

    Article  MATH  Google Scholar 

  6. Steve Zdancewic (2018) Technical perspective: Building bug-free compilers. Commun ACM 61 (2):83–83

    Article  Google Scholar 

  7. Chatterjee S, Maji B (2018) A bayesian belief network based model for predicting software faults in early phase of software development process. Appl Intell 48(8):2214–2228

    Article  Google Scholar 

  8. Lanza M, Mocci A, Ponzanelli L (2016) The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw 33(6):102–105

    Article  Google Scholar 

  9. Qiao L, Li X, Umer Q, Guo P (2020) Deep learning based software defect prediction. Neurocomputing 385:100–110

    Article  Google Scholar 

  10. Xiaomeng L, Stambaugh RF, Yu Y (2017) Anomalies abroad Beyond data mining. Technical report, National Bureau of Economic Research

  11. Kalaivani N, Beena R (2018) Overview of software defect prediction using machine learning algorithms. Int J Pure Appl Math 118(20):3863–3873

    Google Scholar 

  12. Arar ÖF, Ayan K (2017) A feature dependent naive bayes approach and its application to the software defect prediction problem. Appl Soft Comput 59:197–209

    Article  Google Scholar 

  13. Hammouri A, Hammad M, Mohammad A, Fatima A (2018) Software bug prediction using machine learning approach. Int J Adv Comput Sci Appl 9(2):78–83

    Google Scholar 

  14. Kumar Lx, Sripada SK, Ashish S, Santanu KR (2018) Effective fault prediction model developed using least square support vector machine (lssvm). J Syst Softw 137:686–712

    Article  Google Scholar 

  15. Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Comput Electric Eng 67:15–24

    Article  Google Scholar 

  16. Begum M, Dohi T, et al. (2017) A neuro-based software fault prediction with box-cox power transformation. J Softw Eng Appl 10(03):288

    Article  Google Scholar 

  17. Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327

    Article  Google Scholar 

  18. Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518

    Article  Google Scholar 

  19. Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175

    Article  Google Scholar 

  20. Yucalar F, Ozcift A, Emin B, Deniz K (2020) Multiple-classifiers in software quality engineering: Combining predictors to improve software fault prediction ability. Eng Sci Technol Int J 23(4):938–950

    Google Scholar 

  21. Yang X, Lo D, Xia X, Tlel JS (2017) A two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220

    Article  Google Scholar 

  22. Pandey SK, Mishra RB, Tripathi AK (2020) Bpdet: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl 144:113085

    Article  Google Scholar 

  23. Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200

    Article  Google Scholar 

  24. Nucci Dario Di, Palomba Fabio, Rocco Oliveto, Andrea De Lucia (2017) Dynamic selection of classifiers in bug prediction: An adaptive method. IEEE Trans Emerg Top Comput Intell 1(3):202–212

    Article  Google Scholar 

  25. Mousavi R, Eftekhari M, Rahdari F (2018) Omni-ensemble learning (oel): utilizing over-bagging, static and dynamic ensemble selection approaches for software defect prediction. Int J Artif Intell Tools 27 (06):1850024

    Article  Google Scholar 

  26. Pecorelli F, Di Nucci D (2021) Adaptive selection of classifiers for bug prediction A large-scale empirical analysis of its performances and a benchmark study. Science of Computer Programming 102611

  27. Mousavi R, Eftekhari M (2015) A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches. Appl Soft Comput 37:652–666

    Article  Google Scholar 

  28. Li N, Shepperd M, Guo Y (2020) A systematic review of unsupervised learning techniques for software defect prediction. Inf Softw Technol 122:106287

    Article  Google Scholar 

  29. Bowes D, Hall T, Petrić J (2018) Software defect prediction: do different classifiers find the same defects? Softw Qual J 26(2):525–552

    Article  Google Scholar 

  30. Malhotra R, Khanna M (2019) Dynamic selection of fitness function for software change prediction using particle swarm optimization. Inf Softw Technol 112:51–67

    Article  Google Scholar 

  31. Sadowski C, Aftandilian E, Eagle A, Miller-Cushon L, Jaspan C (2018) Lessons from building static analysis tools at google. Commun ACM 61(4):58–66

    Article  Google Scholar 

  32. Basirati MR, Otasevic M, Rajavi K, Böhm M, Krcmar H (2020) Understanding the relationship of conflict and success in software development projects. Inf Softw Technol 126:106331

    Article  Google Scholar 

  33. Son LH, Pritam N, Khari M, Kumar R, Phuong PTM, Thong PH, et al. (2019) Empirical study of software defect prediction: a systematic mapping. Symmetry 11(2):212

    Article  Google Scholar 

  34. Rathore SS, Kumar S (2020) An empirical study of ensemble techniques for software fault prediction. Appl Intell 1– 30

  35. Yohannese CW, Li T, Simfukwe M, Khurshid F (2017) Ensembles based combined learning for improved software fault prediction: A comparative study. In: 2017 12th International conference on intelligent systems and knowledge engineering (ISKE), pp 1–6. IEEE

  36. Alsaeedi A, Khan MZ (2019) Software defect prediction using supervised machine learning and ensemble techniques: a comparative study. J Softw Eng Appl 12(5):85–100

    Article  Google Scholar 

  37. Cruz RM, Sabourin R, Cavalcanti GDC (2018) Dynamic classifier selection: Recent advances and perspectives. Inf Fusion 41:195–216

    Article  Google Scholar 

  38. Merz CJ (1996) Dynamical selection of learning algorithms. In: Learning from data. Springer, pp 281–290

  39. Britto AS Jr, Sabourin R, Oliveira LES (2014) Dynamic selection of classifiers—a comprehensive review. Pattern Recogn 47(11):3665–3680

    Article  Google Scholar 

  40. Souza MA, Cavalcanti GDC, Cruz RMO, Sabourin R (2019) Online local pool generation for dynamic classifier selection. Pattern Recogn 85:132–148

    Article  Google Scholar 

  41. Brun AL, Britto AS Jr, Oliveira LS, Enembreck F, Sabourin R (2018) A framework for dynamic classifier selection oriented by the classification problem difficulty. Pattern Recogn 76:175– 190

    Article  Google Scholar 

  42. García S, Zhang ZL, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:22–37

    Article  MathSciNet  Google Scholar 

  43. Cruz RMO, Sabourin R, Cavalcanti GDC (2018) Prototype selection for dynamic classifier and ensemble selection. Neural Comput Applic 29(2):447–457

    Article  Google Scholar 

  44. Mohandes M, Deriche M, Aliyu SO (2018) Classifiers combination techniques: A comprehensive review. IEEE Access 6:19626–19639

    Article  Google Scholar 

  45. Sagi O, Lior R (2018) Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4):e1249

    Google Scholar 

  46. Bühlmann P (2012) Bagging, boosting and ensemble methods. In: Handbook of computational statistics. Springer, pp 985–1022

  47. Xia Y, Ke C, Yang Y (2020) Multi-label classification with weighted classifier selection and stacked ensemble. Information Sciences

  48. Krawczyk B, Galar M, Woźniak M, Bustince H, Herrera F (2018) Dynamic ensemble selection for multi-class classification with one-class classifiers. Pattern Recogn 83:34–51

    Article  Google Scholar 

  49. Mendes-Moreira J, Jorge AM, Soares C, de Sousa JF (2009) Ensemble learning: A study on different variants of the dynamic selection approach. In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 191–205

  50. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  51. Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: A review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24

    Google Scholar 

  52. D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working conference on mining software repositories (MSR 2010), pp 31–41, IEEE

  53. He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190

    Article  Google Scholar 

  54. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, pp 1–10

  55. Wagner S (2006) A literature survey of the quality economics of defect-detection techniques. In: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pp 194–203

  56. Kumar L, Misra S, Rath SK (2017) An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Comput Stand Int 53:1– 32

    Article  Google Scholar 

  57. Software quality in 2010: a survey of the state of the art

  58. Wilde N, Huitt R (1992) Maintenance support for object-oriented programs. IEEE Trans Softw Eng 18(12):1038

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandeep Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this study, we have used python programming libraries, Sklearn and imbalance to implement different used base learners and the presented approach. Following parameter values have been set for these learning techniques and presented approach.

Techniques

Parameters values

Naïve Bayes

Priors= None, var_smoothing= 1e-9, epsilon= absolute additive value to variances, sigma= variance of each feature per class, theta= mean of each feature per class.

Logistic Regression

penalty= l2, dual= False, tol= 0.0001, C = 1.0, fit_intercept= True, intercept_scaling= 1, class_weight= None, random_state= None, solver= ‘lbfgs’, max_iter= 500, multi_class= ‘auto’, verbose= 0, warm_start= False, n_jobs= None, l1_ratio= None

K-nearest Neighbor

n_neighbors= 5, weights= ‘uniform’, algorithm= ‘auto’, leaf_size= 30, power parameter= 2, metric=‘Euclidean’, metric_params= None, n_jobs= None

Decision Tree

criterion=‘gini’, splitter= ‘best’, max_depth= None, min_samples_split= 2, min_samples_leaf= 1, min_weight_fraction_leaf= 0.0, max_features= None, random_state= None, max_leaf_nodes= None, min_impurity_decrease= 0.0, min_impurity_split= None, class_weight= None, ccp_alpha= 0.0

Support Vector Machine

C = 1.0, kernel= ‘rbf’, degree= 3, gamma= ‘scale’, coef0 = 0.0, shrinking= True, probability= False, tol= 0.001, cache_size= 200, class_weight= None, verbose= False, max_iter= -1, decision_function_shape= ‘ovr’, break_ties= False, random_state= None

Techniques

Parameters values

Multilayer Perceptron

hidden_layer_sizes= 100, activation= ‘relu’, solver= ‘adam’, alpha= 0.0001, batch_size= ‘auto’, learning_rate= ‘constant’, learning_rate_init= 0.001, power_t = 0.5, max_iter= 200, shuffle= True, random_state= None, tol= 0.0001, verbose= False, warm_start= False, momentum= 0.9, nesterovs_momentum= True, early_stopping= False, validation_fraction= 0.1, beta_1 = 0.9, beta_2 = 0.999, epsilon= 1e-08, n_iter_no_change= 10, max_fun= 15000

SMOTE

sampling_strategy= ‘auto’, random_state= None, k_neighbors= 5, n_jobs= None

X-mean clustering (Weka)

binValue= 1.0, cutOffFactor= 0.5, debugLevel= 0, distanceF= Euclidian Distance-R first-last, maxIterations= 100, maxKMeans= 1000, maxKMeansForChildren= 1000, maxNumClusters= 10, minNumClusters= 2, seed= 10, and useKDTree= false

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rathore, S.S., Kumar, S. Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study. Appl Intell 51, 8945–8960 (2021). https://doi.org/10.1007/s10489-021-02346-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02346-x

Keywords

Navigation