SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction

Balogun, Abdullateef O.; Lafenwa-Balogun, Fatimah B.; Mojeed, Hammed A.; Adeyemo, Victor E.; Akande, Oluwatobi N.; Akintola, Abimbola G.; Bajeh, Amos O.; Usman-Hamza, Fatimah E.

doi:10.1007/978-3-030-58817-5_45

Abdullateef O. Balogun¹⁹,
Fatimah B. Lafenwa-Balogun¹⁹,
Hammed A. Mojeed¹⁹,
Victor E. Adeyemo²⁰,
Oluwatobi N. Akande²¹,
Abimbola G. Akintola¹⁹,
Amos O. Bajeh¹⁹ &
…
Fatimah E. Usman-Hamza¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12254))

Included in the following conference series:

International Conference on Computational Science and Its Applications

2233 Accesses
15 Citations

Abstract

Class imbalance is a prevalent problem in machine learning which affects the prediction performance of classification algorithms. Software Defect Prediction (SDP) is no exception to this latent problem. Solutions such as data sampling and ensemble methods have been proposed to address the class imbalance problem in SDP. This study proposes a combination of Synthetic Minority Oversampling Technique (SMOTE) and homogeneous ensemble (Bagging and Boosting) methods for predicting software defects. The proposed approach was implemented using Decision Tree (DT) and Bayesian Network (BN) as base classifiers on defects datasets acquired from NASA software corpus. The experimental results showed that the proposed approach outperformed other experimental methods. High accuracy of 86.8% and area under operating receiver characteristics curve value of 0.93% achieved by the proposed technique affirmed its ability to differentiate between the defective and non-defective labels without bias.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Basri, S., Almomani, M.A., Imam, A.A., Thangiah, M., Gilal, A.R., Balogun, A.O.: The organisational factors of software process improvement in small software industry: comparative study. In: Saeed, F., Mohammed, F., Gazem, N. (eds.) IRICT 2019. AISC, vol. 1073, pp. 1132–1143. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33582-3_106
Chapter Google Scholar
Mojeed, H.A., Bajeh, A.O., Balogun, A.O., Adeleke, H.O.: Memetic approach for multi-objective overtime planning in software engineering projects. J. Eng. Sci. Technol. 14, 3213–3233 (2019)
Google Scholar
Balogun, A., Oladele, R., Mojeed, H., Amin-Balogun, B., Adeyemo, V.E., Aro, T.O.: Performance analysis of selected clustering techniques for software defects prediction. Afr. J. Comput. ICT 12, 30–42 (2019)
Google Scholar
Balogun, A.O., Basri, S., Abdulkadir, S.J., Hashim, A.S.: Performance analysis of feature selection methods in software defect prediction: a search method approach. Appl. Sci. 9, 2764 (2019)
Article Google Scholar
Bajeh, A.O., Oluwatosin, O.-J., Basri, S., Akintola, A.G., Balogun, A.O.: Object-oriented measures as testability indicators: an empirical study. J. Eng. Sci. Technol. 15, 1092–1108 (2020)
Google Scholar
Gupta, A., Suri, B., Kumar, V., Misra, S., Blažauskas, T., Damaševičius, R.: Software code smell prediction model using Shannon, Rényi and Tsallis entropies. Entropy 20, 372 (2018)
Article Google Scholar
Bashir, K., Li, T., Yohannese, C.W., Mahama, Y.: Enhancing software defect prediction using a supervised-learning based framework. In: 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), pp. 1–6. IEEE (2017)
Google Scholar
Chen, L., Fang, B., Shang, Z., Tang, Y.: Tackling class overlap and imbalance problems in software defect prediction. Softw. Qual. J. 26(1), 97–125 (2016). https://doi.org/10.1007/s11219-016-9342-6
Article Google Scholar
Ghotra, B., McIntosh, S., Hassan, A.E.: A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 146–157. IEEE (2017)
Google Scholar
Chaturvedi, K., Bedi, P., Misra, S., Singh, V.: An empirical validation of the complexity of code changes and bugs in predicting the release time of open-source software. In: 2013 IEEE 16th International Conference on Computational Science and Engineering, pp. 1201–1206. IEEE (2013)
Google Scholar
Goel, L., Sharma, M., Khatri, S.K., Damodaran, D.: Implementation of data sampling in class imbalance learning for cross project defect prediction: an empirical study. In: 2018 Fifth International Symposium on Innovation in Information and Communication Technology (ISIICT), pp. 1–6. IEEE (2018)
Google Scholar
Hamdy, A., El-Laithy, A.: SMOTE and Feature Selection for More Effective Bug Severity Prediction. Int. J. Softw. Eng. Knowl. Eng. 29, 897–919 (2019)
Article Google Scholar
Iqbal, A., Aftab, S.: A classification framework for software defect prediction using multi-filter feature selection technique and MLP. Int. J. Mod. Educ. Comput. Sci. 12(1), 18–25 (2020). https://doi.org/10.5815/ijmecs.2020.01.03
Oluwagbemiga, B.A., Shuib, B., Abdulkadir, S.J., Sobri, A.: A hybrid multi-filter wrapper feature selection method for software defect predictors. Int. J Supply Chain Manag. 8, 9–16 (2019)
Google Scholar
Kamei, Y., Shihab, E.: Defect prediction: accomplishments and future challenges. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 5, pp. 33–45. IEEE (2016)
Google Scholar
Kondo, M., Bezemer, C.-P., Kamei, Y., Hassan, A.E., Mizuno, O.: The impact of feature reduction techniques on defect prediction models. Empir. Softw. Eng. 24(4), 1925–1963 (2019). https://doi.org/10.1007/s10664-018-9679-5
Article Google Scholar
Li, Z., Jing, X.-Y., Zhu, X.: Progress on approaches to software defect prediction. IET Softw. 12, 161–175 (2018)
Article Google Scholar
Mabayoje, M.A., Balogun, A.O., Jibril, H.A., Atoyebi, J.O., Mojeed, H.A., Adeyemo, V.E.: Parameter tuning in KNN for software defect prediction: an empirical analysis. Jurnal Teknologi dan Sistem Komputer 7, 121–126 (2019)
Article Google Scholar
Tong, H., Liu, B., Wang, S.: Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf. Softw. Technol. 96, 94–111 (2018)
Article Google Scholar
Usman-Hamza, F.E., Atte, A.F., Balogun, A.O., Mojeed, H.A., Bajeh, A.O., Adeyemo, V.E.: Impact of feature selection on classification via clustering techniques in software defect prediction. J. Comput. Sci. Appl. 26(1), 73–88 (2019). https://doi.org/10.4314/jcsia.v26i1.8
Yu, Q., Jiang, S., Zhang, Y.: The performance stability of defect prediction models with class imbalance: An empirical study. IEICE Trans. Inf. Syst. 100, 265–272 (2017)
Article Google Scholar
Xu, Z., Liu, J., Yang, Z., An, G., Jia, X.: The impact of feature selection on defect prediction performance: an empirical comparison. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 309–320. IEEE (2016)
Google Scholar
Gupta, A., Suri, B., Misra, S.: A systematic literature review: code bad smells in java source code. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10408, pp. 665–682. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62404-4_49
Chapter Google Scholar
Balogun, A.O., Basri, S., Abdulkadir, S.J., Adeyemo, V.E., Imam, A.A., Bajeh, A.O.: Software defect prediction: analysis of class imbalance and performance stability. J. Eng. Sci. Technol. 14, 3294–3308 (2019)
Google Scholar
Rodriguez, D., Herraiz, I., Harrison, R., Dolado, J., Riquelme, J.C.: Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, pp. 1–10 (2014)
Google Scholar
Song, Q., Guo, Y., Shepperd, M.: A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Softw. Eng. 45, 1253–1269 (2018)
Article Google Scholar
Yang, X., Lo, D., Xia, X., Sun, J.: TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Technol. 87, 206–220 (2017)
Article Google Scholar
Yohannese, C.W., Li, T.: A combined-learning based framework for improved software fault prediction. Int. J. Comput. Intell. Syst. 10, 647–662 (2017)
Article Google Scholar
Singh, V., Misra, S., Sharma, M.: Bug severity assessment in cross-project context and identifying training candidates. J. Inf. Knowl. Manag. 16, 1750005 (2017)
Article Google Scholar
El-Shorbagy, S.A., El-Gammal, W.M., Abdelmoez, W.M.: Using SMOTE and heterogeneous stacking in ensemble learning for software defect prediction. In: Proceedings of the 7th International Conference on Software and Information Engineering, pp. 44–47 (2018)
Google Scholar
Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton (2012)
Book Google Scholar
Ardabili, S., Mosavi, A., Várkonyi-Kóczy, A.R.: Advances in machine learning modeling reviewing hybrid and ensemble methods. In: Várkonyi-Kóczy, A.R. (ed.) INTER-ACADEMIA 2019. LNNS, vol. 101, pp. 215–227. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36841-8_21
Chapter Google Scholar
Laradji, I.H., Alshayeb, M., Ghouti, L.: Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58, 388–402 (2015)
Article Google Scholar
Malhotra, R., Jain, J.: Handling imbalanced data using ensemble learning in software defect prediction. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 300–304. IEEE (2020)
Google Scholar
Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62, 434–443 (2013)
Article Google Scholar
Kumar, L., Misra, S., Rath, S.K.: An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Comput. Stand. Interfaces 53, 1–32 (2017)
Article Google Scholar
Collell, G., Prelec, D., Patil, K.R.: A simple plug-in bagging ensemble based on threshold moving for classifying binary and multiclass imbalanced data. Neurocomputing 275, 330340 (2018)
Article Google Scholar
Lee, S.-J., Xu, Z., Li, T., Yang, Y.: A novel bagging C4. 5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J. Biomed. Inform. 78, 144–155 (2018)
Article Google Scholar
Sun, B., Chen, S., Wang, J., Chen, H.: A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl.-Based Syst. 102, 87–102 (2016)
Article Google Scholar
Yijing, L., Haixiang, G., Xiao, L., Yanan, L., Jinling, L.: Adapted ensemble classification algorithm based on multiple classifier systems and feature selection for classifying multiclass imbalanced data. Knowl.-Based Syst. 94, 88–104 (2016)
Article Google Scholar
Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39, 1208–1215 (2013)
Article Google Scholar
Balogun, A.O., Bajeh, A.O., Orie, V.A., Yusuf-Asaju, W.A.: Software defect prediction using ensemble learning: an ANP based evaluation method. FUOYE J. Eng. Technol. 3, 50–55 (2018)
Article Google Scholar
Jimoh, R., Balogun, A., Bajeh, A., Ajayi, S.: A PROMETHEE based evaluation of software defect predictors. J. Comput. Sci. Appl. 25, 106–119 (2018)
Google Scholar
Yadav, S., Shukla, S.: Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: 2016 IEEE 6th International Conference on Advanced Computing (IACC), pp. 78–83. IEEE (2016)
Google Scholar
Arlot, S., Lerasle, M.: Choice of V for V-fold cross-validation in least-squares density estimation. J. Mach. Learn. Res. 17, 7256–7305 (2016)
MathSciNet MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM Sig. Exp. 11, 10–18 (2009)
Article Google Scholar
Singhal, Y., Jain, A., Batra, S., Varshney, Y., Rathi, M.: Review of bagging and boosting classification performance on unbalanced binary classification. In: 2018 IEEE 8th International Advance Computing Conference (IACC), pp. 338–343. IEEE (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Ilorin, Ilorin, 1515, Nigeria
Abdullateef O. Balogun, Fatimah B. Lafenwa-Balogun, Hammed A. Mojeed, Abimbola G. Akintola, Amos O. Bajeh & Fatimah E. Usman-Hamza
School of Built Environment, Engineering and Computing, Leeds Beckett University, Headingley Campus, Leeds, LS6 3QS, UK
Victor E. Adeyemo
Department of Computer Science, Landmark University, Omu-Aran, Kwara State, Nigeria
Oluwatobi N. Akande

Authors

Abdullateef O. Balogun
View author publications
You can also search for this author in PubMed Google Scholar
Fatimah B. Lafenwa-Balogun
View author publications
You can also search for this author in PubMed Google Scholar
Hammed A. Mojeed
View author publications
You can also search for this author in PubMed Google Scholar
Victor E. Adeyemo
View author publications
You can also search for this author in PubMed Google Scholar
Oluwatobi N. Akande
View author publications
You can also search for this author in PubMed Google Scholar
Abimbola G. Akintola
View author publications
You can also search for this author in PubMed Google Scholar
Amos O. Bajeh
View author publications
You can also search for this author in PubMed Google Scholar
Fatimah E. Usman-Hamza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oluwatobi N. Akande .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Potenza, Italy
Beniamino Murgante
Chair- Center of ICT/ICE, Covenant University, Ota, Nigeria
Sanjay Misra
University of Cagliari, Cagliari, Italy
Chiara Garau
University of Cagliari, Cagliari, Italy
Ivan Blečić
Clayton School of Information Technology, Monash University, Clayton, VIC, Australia
David Taniar
Department of Information Science, Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
University of Minho, Braga, Portugal
Ana Maria A. C. Rocha
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino
Polytechnic University of Bari, Bari, Italy
Carmelo Maria Torre
Department of Neurology, University of Massachusetts Medical School, Worcester, MA, USA
Yeliz Karaca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balogun, A.O. et al. (2020). SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12254. Springer, Cham. https://doi.org/10.1007/978-3-030-58817-5_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-58817-5_45
Published: 30 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58816-8
Online ISBN: 978-3-030-58817-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics