Skip to main content
Log in

Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Software defect prediction has caused widespread concern among software engineering researchers, which aims to erect a software defect prediction model according to historical data. Among all the techniques used in this field, extreme learning machine is widely used by researchers because of its simple structure and excellent learning speed. At the same time, the prediction performance of extreme learning machine is greatly affected by the random selection of parameters and the weak generalization ability. In this sense, in order to improve the prediction performance of the model, researchers uses swarm intelligence optimization algorithm to provide the optimal parameters for the model. Sparrow search algorithm is a new meta-heuristic algorithm that simulates the foraging and anti-predation behavior of the sparrow group. However, the original sparrow search algorithm is easily trapped to local optimal solutions in the later stage of the iterations. To improve the global optimization ability of the original sparrow search algorithm, this paper proposed an adaptive variable sparrow search algorithm (AVSSA) based on adaptive hyper-parameters and variable logarithmic spiral. This work run experiments of AVSSA in eight benchmark functions, and obtained the satisfactory results. In the traditional software defect prediction algorithm, the imbalance of data distribution is also one of the main reasons that affect the performance of the model. Therefore, this paper uses the adaptive variable sparrow search algorithm to optimize the extreme learning machine as the base predictor for Bagging ensemble learning (AVSEB). A new software defect prediction ensemble learning model is proposed in this paper. Firstly, the model used the unstable cut-points algorithm to preprocess Bagging sample set in this model. Then, the adaptive variable sparrow search algorithm is used to optimize the extreme learning machine as the base predictor of ensemble learning. Finally, the voting method is used to output the prediction results of software defects. Based on the experimental results, the evaluation index of our proposed algorithm is significantly superior to the other four advanced comparison algorithms in 15 open software defect datasets. According to the test results of Friedman ranking and Holm’s post hoc test, this paper proposed algorithm has obvious statistical significance compared with other advanced prediction algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Clust Comput 22(1):77–88

    Google Scholar 

  2. Jin C (2021) Software defect prediction model based on distance metric learning. Soft Comput 25(1):447–461

    MATH  Google Scholar 

  3. Czibula G, Marian Z, Czibula IG (2014) Software defect prediction using relational association rule mining. Inf Sci 264:260–278

    Google Scholar 

  4. Milićević V, Denić N, Milićević Z, Arsić L, Spasić-Stojković M, Petković D, Stojanović J, Krkic M, Milovančević NS, Jovanović A (2021) E-learning perspectives in higher education institutions. Technol Forecast Soc Chang 166:120618

    Google Scholar 

  5. Stojanović J, Petkovic D, Alarifi IM, Cao Y, Denic N, Ilic J, Assilzadeh H, Resic S, Petkovic B, Khan A et al (2021) Application of distance learning in mathematics through adaptive neuro-fuzzy learning method. Comput Elect Eng 93:107270

    Google Scholar 

  6. Spasić B, Siljković B, Denić N, Petković D, Vujović V (2020) Natural lignite resources in Kosovo and Metohija and their influence on the environment. 561–566

  7. Jing X-Y, Wu F, Dong X, Xu B (2016) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339

    Google Scholar 

  8. Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Google Scholar 

  9. Denić N, Petković D (2020) journal=Encyclopedia of Renewable and Sustainable Materials, B. Spasić: global economy increasing by enterprise resource planning, 331–337

  10. Czibula IG, Czibula G, Miholca D-L, Onet-Marian Z (2019) An aggregated coupling measure for the analysis of object-oriented software systems. J Syst Softw 148:1–20

    Google Scholar 

  11. Arora I, Tetarwal V, Saha A (2015) Open issues in software defect prediction. Procedia Comput Sci 46:906–912

    Google Scholar 

  12. Mahmood Z, Bowes D, Hall T, Lane PC, Petrić J (2018) Reproducibility and replicability of software defect prediction studies. Inf Softw Technol 99:148–163

    Google Scholar 

  13. Bishnu PS, Bhattacherjee V (2011) Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Trans Knowl Data Eng 24(6):1146–1150

    Google Scholar 

  14. Gong L, Jiang S, Bo L, Jiang L, Qian J (2019) A novel class-imbalance learning approach for both within-project and cross-project defect prediction. IEEE Trans Reliab 69(1):40–54

    Google Scholar 

  15. Ghosh S, Rana A, Kansal V (2018) A nonlinear manifold detection based model for software defect prediction. Procedia Comput Sci 132:581–594

    Google Scholar 

  16. Arar ÖF, Ayan K (2017) A feature dependent Naive bayes approach and its application to the software defect prediction problem. Appl Soft Comput 59:197–209

    Google Scholar 

  17. Hall T, Beecham S, Bowes D, Gray D, Counsell S (2011) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Software Eng 38(6):1276–1304

    Google Scholar 

  18. Miholca D-L, Czibula G, Czibula IG (2018) A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks. Inf Sci 441:152–170

    MathSciNet  Google Scholar 

  19. Malhotra R, Kamal S (2019) An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data. Neurocomputing 343:120–140

    Google Scholar 

  20. Zheng S, Gai J, Yu H, Zou H, Gao S (2021) Training data selection for imbalanced cross-project defect prediction. Comput Elect Eng 94:107370

    Google Scholar 

  21. Yu T, Zhu H (2020) Hyper-parameter optimization: a review of algorithms and applications. arXiv preprint arXiv:2003.05689

  22. Shu R, Xia T, Williams L, Menzies T (2019) Better security bug report classification via hyperparameter optimization. arXiv preprint arXiv:1905.06872

  23. Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683–711

    Google Scholar 

  24. Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111

    Google Scholar 

  25. Chen H, Jing X-Y, Zhou Y, Li B, Xu B (2022) Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction. Inf Softw Technol 147:106892

    Google Scholar 

  26. Huang X, Zhan J, Ding W, Pedrycz W (2022) An error correction prediction model based on three-way decision and ensemble learning. Int J Approx Reason 146:21–46

    MathSciNet  MATH  Google Scholar 

  27. Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science, pp 39–43. Ieee

  28. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67

    Google Scholar 

  29. Xue J, Shen B (2020) A novel swarm intelligence optimization approach: sparrow search algorithm. Syst Sci Control Eng 8(1):22–34

    Google Scholar 

  30. Zhang C, Ding S (2021) A stochastic configuration network based on chaotic sparrow search algorithm. Knowl-Based Syst 220:106924

    Google Scholar 

  31. Wu J, Wang Y-G, Burrage K, Tian Y-C, Lawson B, Ding Z (2020) An improved firefly algorithm for global continuous optimization problems. Expert Syst Appl 149:113340

    Google Scholar 

  32. Bhairavi R, Sudha GF (2022) Hybrid sparrow search optimization technique for quality of service cooperative routing in underwater acoustic sensor networks. Phys Chem Earth, Parts A/B/C, 103175

  33. Petković D, Barjaktarovic M, Milošević S, Denić N, Spasić B, Stojanović J, Milovancevic M (2021) Neuro fuzzy estimation of the most influential parameters for Kusum biodiesel performance. Energy 229:120621

    Google Scholar 

  34. Kuzman B, Petković B, Denić N, Petković D, Ćirković B, Stojanović J, Milić M (2021) Estimation of optimal fertilizers for optimal crop yield by adaptive neuro fuzzy logic. Rhizosphere 18:100358

    Google Scholar 

  35. Milić M, Petković B, Selmi A, Petković D, Jermsittiparsert K, Radivojević A, Milovancevic M, Khan A, Vidosavljević ST, Denić N et al. (2021) Computational evaluation of microalgae biomass conversion to biodiesel. Biomass Convers Biorefinery 1–8

  36. Lakovic N, Khan A, Petković B, Petkovic D, Kuzman B, Resic S, Jermsittiparsert K, Azam S (2021) Management of higher heating value sensitivity of biomass by hybrid learning technique. Biomass Convers Biorefinery 1–8

  37. Gavrilović S, Denić N, Petković D, Živić NV, Vujičić S (2018) Statistical evaluation of mathematics lecture performances by soft computing approach. Comput Appl Eng Educ 26(4):902–905

    Google Scholar 

  38. Petković D, Gocic M, Trajkovic S, Milovančević M, Šević D (2017) Precipitation concentration index management by adaptive neuro-fuzzy methodology. Clim Change 141(4):655–669

    Google Scholar 

  39. Momani S, Abo-Hammour ZS, Alsmadi OM (2016) Solution of inverse kinematics problem using genetic algorithms. Appl Math Inform Sci 10(1):225

    Google Scholar 

  40. Abo-Hammour Z, Arqub OA, Alsmadi O, Momani S, Alsaedi A (2014) An optimization algorithm for solving systems of singular boundary value problems. Appl Math Inf Sci 8(6):2809

    MathSciNet  Google Scholar 

  41. Abo-Hammour Z, Abu Arqub O, Momani S, Shawagfeh N (2014) Optimization solution of Troesch’s and Bratu’s problems of ordinary type using novel continuous genetic algorithm. Discrete Dyn Nat Soc 2014

  42. Abu Arqub O, Abo-Hammour Z, Momani S, Shawagfeh N (2012) Solving singular two-point boundary value problems using continuous genetic algorithm. In: Abstract and Applied Analysis, vol. 2012. Hindawi

  43. Ding L, Zhang X-Y, Wu D-Y, Liu M-l (2021) Application of an extreme learning machine network with particle swarm optimization in syndrome classification of primary liver cancer. J Integr Med 19(5):395–407

    Google Scholar 

  44. Li L-L, Sun J, Tseng M-L, Li Z-G (2019) Extreme learning machine optimized by whale optimization algorithm using insulated gate bipolar transistor module aging degree evaluation. Expert Syst Appl 127:58–67

    Google Scholar 

  45. Huda S, Alyahya S, Ali MM, Ahmad S, Abawajy J, Al-Dossari H, Yearwood J (2017) A framework for software defect prediction and metric selection. IEEE Access 6:2844–2858

    Google Scholar 

  46. Li W, Huang Z, Li Q (2016) Three-way decisions based software defect prediction. Knowl-Based Syst 91:263–274

    Google Scholar 

  47. Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf Softw Technol 106:182–200

    Google Scholar 

  48. Pandey SK, Rathee D, Tripathi AK (2020) Software defect prediction using k-PCA and various kernel-based extreme learning machine: an empirical study. IET Softw 14(7):768–782

    Google Scholar 

  49. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  50. Duffy N, Helmbold D (2002) Boosting methods for regression. Mach Learn 47(2):153–200

    MATH  Google Scholar 

  51. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MATH  Google Scholar 

  52. Wang T, Zhang Z, Jing X, Zhang L (2016) Multiple kernel ensemble learning for software defect prediction. Autom Softw Eng 23(4):569–590

    Google Scholar 

  53. Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C (Applications and Reviews) 42(6):1806–1817

    Google Scholar 

  54. Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402

    Google Scholar 

  55. Tang T, Yuan H (2021) The capacity prediction of Li-ion batteries based on a new feature extraction technique and an improved extreme learning machine algorithm. J Power Sources 514:230572

    Google Scholar 

  56. Chen C, Jiang B, Cheng Z, Jin X (2019) Joint domain matching and classification for cross-domain adaptation via elm. Neurocomputing 349:314–325

    Google Scholar 

  57. Liu N, Wang H (2010) Ensemble based extreme learning machine. IEEE Signal Process Lett 17(8):754–757

    Google Scholar 

  58. Zhai J-H, Xu H-Y, Wang X-Z (2012) Dynamic ensemble extreme learning machine based on sample entropy. Soft Comput 16(9):1493–1502

    Google Scholar 

  59. Zhang H, Peng Z, Tang J, Dong M, Wang K, Li W (2022) A multi-layer extreme learning machine refined by sparrow search algorithm and weighted mean filter for short-term multi-step wind speed forecasting. Sustain Energy Technol Assess 50:101698

    Google Scholar 

  60. Xing S, Ming Z (2018) A study on unstable cuts and its application to sample selection. Int J Mach Learn Cybern 9(9):1541–1552

    Google Scholar 

  61. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29

    Google Scholar 

  62. Huang G, Zhu Q, Siew C (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Google Scholar 

  63. Gaspar A, Oliva D, Hinojosa S, Aranguren I, Zaldivar D (2022) An optimized kernel extreme learning machine for the classification of the autism spectrum disorder by using gaze tracking images. Appl Soft Comput 120:108654

    Google Scholar 

  64. Tummalapalli S, Kumar L, Neti LBM, Krishna A (2022) Detection of web service anti-patterns using weighted extreme learning machine. Comput Stand Interfaces 82:103621

    Google Scholar 

  65. Liu Y, Wang J (2022) Transfer learning based multi-layer extreme learning machine for probabilistic wind power forecasting. Appl Energy 312:118729

    Google Scholar 

  66. Yaman MA, Rattay F, Subasi A (2021) Comparison of bagging and boosting ensemble machine learning methods for face recognition. Procedia Comput Sci 194:202–209

    Google Scholar 

  67. Ahmad A, Farooq F, Niewiadomski P, Ostrowski K, Akbar A, Aslam F, Alyousef R (2021) Prediction of compressive strength of fly ash based concrete using individual and ensemble algorithm. Materials 14(4):794

    Google Scholar 

  68. Zhou S, Xie H, Zhang C, Hua Y, Zhang W, Chen Q, Gu G, Sui X (2021) Wavefront-shaping focusing based on a modified sparrow search algorithm. Optik 244:167516

    Google Scholar 

  69. Wang X, Liu J, Hou T, Pan C (2021) The SSA-bp-based potential threat prediction for aerial target considering commander emotion. Defence Technology

  70. Yang B, Guo Z, Yang Y, Chen Y, Zhang R, Su K, Shu H, Yu T, Zhang X (2021) Extreme learning machine based meta-heuristic algorithms for parameter extraction of solid oxide fuel cells. Appl Energy 303:117630

    Google Scholar 

  71. Ma J, Hao Z, Sun W (2022) Enhancing sparrow search algorithm via multi-strategies for continuous optimization problems. Inf Process Manage 59(2):102854

    Google Scholar 

  72. Ren J, Wang Y, Mao M, Cheung Y-M (2022) Equalization ensemble for large scale highly imbalanced data classification. Knowl-Based Syst 242:108295

    Google Scholar 

  73. Dai Q, Liu J-W, Liu Y (2022) Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl Soft Comput 109083

  74. Li K, Yan D, Liu Y, Zhu Q (2022) A network-based feature extraction model for imbalanced text data. Expert Syst Appl 195:116600

    Google Scholar 

  75. Garcı S, Triguero I, Carmona CJ, Herrera F et al (2012) Evolutionary-based selection of generalized instances for imbalanced classification. Knowl-Based Syst 25(1):3–12

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 52074126. We would like to thank the editor and anonymous reviewers for their valuable comments and suggestions to improve the paper.

Author information

Authors and Affiliations

Authors

Contributions

YT: writing-original draft, conceptualization, methodology, and program writing and results collecting. QD: methodology and reviewing. MY: data curation, visualization. TD: reviewing, editing, visualization. LC: reviewing, supervision.

Corresponding author

Correspondence to Lifang Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Y., Dai, Q., Yang, M. et al. Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm. Int. J. Mach. Learn. & Cyber. 14, 1967–1987 (2023). https://doi.org/10.1007/s13042-022-01740-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01740-2

Keywords

Navigation