Skip to main content
Log in

Heterogeneous stacked ensemble classifier for software defect prediction

  • 1211: AIoT Support and Applications with Multimedia
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Software defect prediction (SDP) plays an important role to ensure that software meets quality standards; by highlighting the modules which are prone to errors and hence allows to focus the test efforts on them. Class imbalance nature of the defect dataset hinders the defect predictors to correctly classify the buggy modules. Here, we introduce a novel heterogenous ensemble classifier built with stacking methodology to overcome this problem of imbalanced datasets and hence, significant improvement in the prediction power is being proposed. Stacked ensemble is achieved with the best known classifiers from SDP literature as base classifiers (artificial neural network, nearest neighbor, tree based classifier, Bayesian classifier and support vector machines). For experimental work, five public datasets from NASA corpus are used. A comparative analysis for the proposed heterogenous stacking based ensemble method is made with the base classifiers and with the state-of-the art ensemble based SDP models over the evaluation criteria of ROC, AUC and accuracy. It is found that the proposed heterogenous stacking based ensemble classifier outperforms the base classifiers by 12% in terms of AUC score and by 8% in terms of Accuracy. It improves the performance of state-of-the-art ensemble methods by 4% in terms of AUC score and by 9% in terms of Accuracy. It can be concluded from the comparative analysis that the proposed SDP classifier is best performer among the candidate SDP classifiers statistically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Balogun AO, Lafenwa-Balogun FB, Mojeed HA, Adeyemo VE, Akande ON, Akintola AG, Bajeh AO, Usman-Hamza FE (2020) SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction. Computational Science and Its Applications – ICCSA 2020: 20th International Conference, Cagliari, Italy, July 1–4, 2020. Proceedings, Part VI 12254:615–631. https://doi.org/10.1007/978-3-030-58817-5_45

    Article  Google Scholar 

  2. Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38–67

    Article  Google Scholar 

  3. Chen L, Fang B, Shang Z et al (2018) Tackling class overlap and imbalance problems in software defect prediction. Softw Qual J 26:97–125. https://doi.org/10.1007/s11219-016-9342-6

    Article  Google Scholar 

  4. Erturk E, Sezer EA (2015) A comparison of some soft computing methods for software fault prediction. Expert Syst Appl 42:1872–1879

    Article  Google Scholar 

  5. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting- and hybrid-based approaches. IEEE Trans Syst Man Cybernetics Part C 42(4):463–484

    Article  Google Scholar 

  6. Goyal S, Bhatia P (2020) Comparison of machine learning techniques for software quality prediction. Int J Knowl Syst Sci IJKSS 11(2):21–40. https://doi.org/10.4018/IJKSS.2020040102

    Article  Google Scholar 

  7. Goyal S, Bhatia PK (2020 ) Empirical software measurements with machine learning. In: Bansal A, Jain A, Jain S, Jain V, Choudhary A (eds) Computational intelligence techniques and their applications to software engineering problems, pp 49–64. CRC Press, Boca Raton. https://doi.org/10.1201/9781003079996

  8. Haixiang G, Yijing Li, Jennifer Shang Gu, Mingyun HY, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  9. Haykin S (2010) Neural networks and learning machines, 3/e. PHI Learning, India

    Google Scholar 

  10. Huda S, Liu K, Abdelrazek M, Ibrahim A, Alyahya S, Al-Dossari H, Ahmad S (2018) An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction. IEEE Access 6:24184–24195. https://doi.org/10.1109/access.2018.2817572

    Article  Google Scholar 

  11. Khuat TT, Le MH (2020) Evaluation of Sampling-based ensembles of classifiers on imbalanced data for software defect prediction problems. SN Comput Sci 1:108. https://doi.org/10.1007/s42979-020-0119-4

    Article  Google Scholar 

  12. Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402

    Article  Google Scholar 

  13. Lee HK, Kim SB (2018) An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst Appl 98:72–83

    Article  Google Scholar 

  14. Lehmann EL, Romano JP (2008) Testing statistical hypothesis: springer texts in statistics. Springer, New York

    Google Scholar 

  15. Miholca, D., G., Czibula, I., Czibula. A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks. J. Information Sciences. Feb 2018

  16. (NASA 2015) https://www.nasa.gov/sites/default/files/files/Space_Math_VI_2015.pdf. Accessed 23 Aug 2018

  17. Ozakıncı R, Tarhan A (2018) Early software defect prediction: ¨a systematic map and review. J Syst Softw 144:216–239. https://doi.org/10.1016/j.jss.2018.06.025

    Article  Google Scholar 

  18. Rathore S, Kumar S (2017) Towards an ensemble-based system for predicting the number of software faults. Expert Syst Appl 82:357–382

    Article  Google Scholar 

  19. (PROMISE) http://promise.site.uottawa.ca/SERepository. Accessed 23 Aug 2018

  20. Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327. https://doi.org/10.1007/s10462-017-9563-5

    Article  Google Scholar 

  21. Sayyad S, Menzies T (2005) The PROMISE repository of software engineering databases. Canada: University of Ottawa, http://promise.site.uottawa.ca/SERepository.

  22. Siers MJ, Islam MZ (2015) Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf Syst 51:62–71

    Article  Google Scholar 

  23. Son LH, Pritam N, Khari M, Kumar R, Phuong PTM, Thong PH (2019) Empirical study of software defect prediction: a systematic mapping. Symmetry. MDPI AG. https://doi.org/10.3390/sym11020212

  24. Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111. https://doi.org/10.1016/j.infsof.2017.11.008

    Article  Google Scholar 

  25. Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443

    Article  Google Scholar 

  26. Wang T, Zhang Z, Jing X, Zhang L (2015) Multiple kernel ensemble learning for software defect prediction. Autom Softw Eng 23:569–590

    Article  Google Scholar 

  27. Xia X, Lo D, Shihab E, Wang X, Yang X (2015) ELBlocker: Predicting blocking bugs with ensemble imbalance learning. Inf Softw Technol 61:93–106

    Article  Google Scholar 

  28. Yang X, Lo D, Xia X, Sun J (2017) TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–20

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Somya Goyal.

Ethics declarations

Conflict of interest

Authors have no Conflicts of interest/Competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goyal, S., Bhatia, P.K. Heterogeneous stacked ensemble classifier for software defect prediction. Multimed Tools Appl 81, 37033–37055 (2022). https://doi.org/10.1007/s11042-021-11488-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11488-6

Keywords

Navigation