Implicit and explicit mixture of experts models for software defect prediction

Shankar Mishra, Aditya; Singh Rathore, Santosh

doi:10.1007/s11219-023-09640-6

Implicit and explicit mixture of experts models for software defect prediction

Research
Published: 20 June 2023

Volume 31, pages 1331–1368, (2023)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Aditya Shankar Mishra¹ &
Santosh Singh Rathore¹

292 Accesses
Explore all metrics

Abstract

Accurately predicting defects in software modules helps the developers and testers to find the defective modules quickly and save their efforts in other software development aspects. Most previous studies have used single machine learning technique-based models to detect defects in software. These models have produced limited results as they perform well in only some parts of the data and fail to capture all the defect-causing patterns. The mixture of experts (MoE) is a combination method that utilizes experts specialized in the given data subspaces. The results of different specialized experts are combined according to their specific expertise for the final prediction governed by a gating network. This paper explores using the MoE method and presents implicit and explicit MoE-based models for software defect prediction. The presented models are evaluated via an experimental study on twenty-two software defect datasets collected from AEEEM, PROMISE, and JIRA repositories. The prediction performance of the presented models is evaluated using accuracy, f1-score, area under the ROC curve (AUC), and Mathew correlation coefficient (MCC) performance metrics. The experimental results showed that the presented MoE-based models outperformed different machine learning and ensemble techniques, such as Bagging and AdaBoost, and produced a state-of-the-art performance for defect prediction. Additionally, we found that the MoE models produced better or at least equal performance than the DNN-based model for most cases. The results are consistent for all the datasets. The results of the Wilcoxon test also showed that the presented models performed significantly better than the other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combined classifier for cross-project defect prediction: an extended empirical study

Article 15 February 2018

Using Ensemble of Different Classifiers for Defect Prediction

Heterogeneous Defect Prediction Using Ensemble Learning Technique

Data availability

The datasets used during and/or analyzed during the current study are available in the online repository, https://zenodo.org/record/3362613#.YrhPAnZByM8.

Notes

https://zenodo.org/record/3362613#.YrhPAnZByM8

References

Alsawalqah, J., Faris, H., Aljarah, I., Alnemer, L., & Alhindawi, N. (2017). In Computer Science On-line Conference (Springer, 2017) pp. 355–366.
Arora, I., Tetarwal, V., & Saha, A. (2015). Open issues in software defect prediction. Procedia Computer Science, 46, 906–912.
Article Google Scholar
Assim, M., Obeidat, Q., & Hammad, M. (2020). In 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI) (2020), pp. 1–6.
Bock, A. S., & Fine, I. (2014). Anatomical and functional plasticity in early blind individuals and the mixture of experts architecture. Frontiers in human neuroscience, 8, 971.
Article Google Scholar
Bowes, D., Hall, T., & Petrić, J. (2018). Software defect prediction: do different classifiers find the same defects? Software Quality Journal, 26(2), 525–552.
Article Google Scholar
Canfora, G., De Lucia, A., Di Penta, M., Oliveto, R., Panichella, A., & Panichella, S. (2023). In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation (IEEE, 2013), pp. 252–261.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215.
Article Google Scholar
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.
Article Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.
Article MATH Google Scholar
D’ Ambros, M., Lanza, M., & Robbes, R. (2010) In 2010 7th IEEE working conference on mining software repositories (MSR 2010) (IEEE, 2010), pp. 31–41.
D’Ambros, M., Lanza, M., & Robbes, R. (2012). Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering, 17, 531–577.
Article Google Scholar
Deep Singh, P., & Chug, A. (2017). In 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence pp. 775–781.
Di Nucci, D., Palomba, F., & De Lucia, A. (2018). In 2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE) (IEEE, 2018), pp. 48–54.
Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics, 35(5–6), 352–359.
Article Google Scholar
Elmishali, A., & Kalech, M. (2023). Issues-driven features for software fault prediction. Information and Software Technology 155, 107102.
Feng, S., Keung, J., Yu, X., Xiao, Y., Bennin, K. E., Kabir, M. A., & Zhang, M. (2021). Coste: Complexity-based oversampling technique to alleviate the class imbalance problem in software defect prediction. Information and Software Technology 129, 106432.
Feng, S. J., Keung, X., Yu, Y., & Xiao, M. (2021). Zhang, Investigation on the stability of smote-based oversampling techniques in software defect prediction. Information and Software Technology 139, 106662.
Ferrari, D., & Milioni, A. (2011). Choices and pitfalls concerning mixture-of-experts modeling. Pesquisa Operacional, 31, 95–111.
Article Google Scholar
Ghosh S., Rana A., & Kansal V. (2018). A nonlinear manifold detection based model for software defect prediction. International Conference on Computational Intelligence and Data Science, Procedia Computer Science 132, 581–594.
Gormley, I. C., & Frühwirth-Schnatter, S. (2019). In Handbook of mixture analysis (Chapman and Hall/CRC, 2019) pp. 271–307.
Jović, A., Brkić, K., & Bogunović, N. (2015). In 2015 38th International convention on information and communication technology, electronics and microelectronics (MIPRO) (IEEE, 2015), pp. 1200–1205.
Jureczko, M., & Madeyski, L. (2010). In Proceedings of the 6th international conference on predictive models in software engineering pp. 1–10.
Kondratyuk, D., Tan, M., Brown, M., & Gong, B. (2020). When ensembling smaller models is more efficient than single large models. arXiv preprint arXiv:2005.00570
Komaroff, E. (2020). Relationships between p-values and pearson correlation coefficients, type 1 errors and effect size errors, under a true null hypothesis. Journal of Statistical Theory and Practice, 14(3), 1–13.
Article MathSciNet MATH Google Scholar
Kwak, S. K., & Kim, J. H. (2017). Statistical data preparation: Management of missing values and outliers. Korean Journal of Anesthesiology, 70(4), 407–411.
Article Google Scholar
Li, L., Lessmann, S., & Baesens, B. (2019). Evaluating software defect prediction performance: an updated benchmarking study. arXiv preprint. http://arxiv.org/abs/1901.01726
Li, N., Shepperd, M., & Guo, Y. (2020). A systematic review of unsupervised learning techniques for software defect prediction. Information and Software Technology 122, 106287.
Liaw, A., Wiener, M., et al., (2002). Classification and regression by randomforest. R news, 2(3), 18–22.
Google Scholar
Majd, A., Vahidi-Asl, M., Khalilian, A., Poorsarvi-Tehrani, P., & Haghighi, H. (2020). Sldeep: Statement-level software defect prediction using deep-learning model on static code features. Expert Systems with Applications 147, 113156.
Masoudnia, S., & Ebrahimpour, R. (2014). Mixture of experts: a literature survey. Artificial Intelligence Review, 42(2), 275–293.
Article Google Scholar
Moustafa, S., ElNainay, M. Y., El Makky, N., & Abougabal, M. S. (2018). Software bug prediction using weighted majority voting techniques. Alexandria engineering journal, 57(4), 2763–2774.
Article Google Scholar
Nam, J. (2014). Survey on software defect prediction. Department of Compter Science and Engineering, The Hong Kong University of Science and Technology, Tech. Rep.
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21.
Article Google Scholar
Niu, J., Li, Z., Chen, H., Dong, X., & Jing, X. Y. (2022) Data sampling and kernel manifold discriminant alignment for mixed-project heterogeneous defect prediction. Software Quality Journal pp. 1–35.
O’Doherty, J. P., Lee, S. W., Tadayonnejad, R., Cockburn, J., Iigaya, K., & Charpentier, C. J. (2021). Why and how the brain weights contributions from a mixture of experts. Neuroscience & Biobehavioral Reviews, 123, 14–23.
Article Google Scholar
Pandey, S. K., Mishra, R. B., & Tripathi, A. K. (2020) BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Systems with Applications 144, 113085.
Parsons, V. L. (2014). Stratified sampling. Wiley Stats Ref: Statistics Reference Online pp. 1–11.
Pelleg, D., Moore, A. W. et al. (2000). In ICML, vol. 1 pp. 727–734.
Priyanka, D. (2020). Kumar, Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269.
Article Google Scholar
Qiao, L., Li, X., Umer, Q., & Guo, P. (2020). Deep learning based software defect prediction. Neurocomputing, 385, 100–110.
Article Google Scholar
Rathore, S. S., & Kumar, S. (2021). An empirical study of ensemble techniques for software fault prediction. Applied Intelligence, 51(6), 3615–3644.
Article Google Scholar
Radwan, A., Kamarudin, N., Solihin, M. I., Leong, H., Rizon, M., Hazry, D., & Bin Azizan, M. A. (2020). X-means clustering for wireless sensor networks. Journal of Robotics Networking and Artificial Life 7(2), 111–115.
Rey, D., & Neuhäuser, M. (2011) In International encyclopedia of statistical science (Springer, 2011), pp. 1658–1659.
Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: Roc area, cohen’s d, and r. Law and human behavior, 29(5), 615–620.
Article Google Scholar
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.
Google Scholar
Shao, Y., Liu, B., Wang, S., & Li, G. (2020). Software defect prediction based on correlation weighted class association rule mining. Knowledge-Based Systems 196, 105742.
Singh, P. K., Panda, R., & Sangwan, O. P. (2015). A critical analysis on software fault prediction techniques. World applied sciences journal, 33(3), 371–379.
Google Scholar
Sotto-Mayor, B., Elmishali, A., Kalech, M., & Abreu, R. (2022). Exploring design smells for smell-based defect prediction. Engineering Applications of Artificial Intelligence 115, 105240.
Sotto-Mayor, B., & Kalech, M. (2021). Cross-project smell-based defect prediction. Soft Computing, 25(22), 14171–14181.
Article Google Scholar
Tanaka, K., Monden, A., & Yücel, Z. (2019, July). Prediction of software defects using automated machine learning. In 2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 490-494). IEEE.
Tantithamthavorn, C. K. (2022). Large defect prediction benchmark. Zenodo. Retrieved from https://zenodo.org/record/6342328
Thota, M. K., Shajin, F. H., Rajesh, P., et al., (2020). Survey on software defect prediction techniques. International Journal of Applied Science and Engineering, 17(4), 331–344.
Google Scholar
Wahono, R. S. (2015). A systematic literature review of software defect prediction. Journal of Software Engineering, 1(1), 1–16.
Google Scholar
Wang, H., Zhuang, W., & Zhang, X. (2021). Software defect prediction based on gated hierarchical lstms. IEEE Transactions on Reliability, 70(2), 711–727.
Article Google Scholar
Waterhouse S. R. (1998). Classification and regression using mixtures of experts. Ph.D. thesis, CiteSeer.
Woolson, R. F. (2007). Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials pp. 1–3.
Wu, J., Wu, Y., Niu, N., & Zhou, M. (2021). Mhcpdp: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder. Software Quality Journal, 29(2), 405–430.
Article Google Scholar
Xu, Z., Liu, J., Luo, X., Yang, Z., Zhang, Y., Yuan, P., Tang, Y., & Zhang, T. (2019). Software defect prediction based on kernel pca and weighted extreme learning machine. Information and Software Technology, 106, 182–200.
Article Google Scholar
Yang, X., Lo, D., Xia, X., & Sun, J. (2017). Tlel: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology, 87, 206–220.
Article Google Scholar
Yatish, S., Jiarpakdee, J., Thongtanunam, P., & Tantithamthavorn, C. (2019). In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (IEEE, 2019), pp. 654–665.
Yigit, H. (2013). In 2013 international conference on electronics, computer and computation (ICECCO) (IEEE, 2013), pp. 228–231.
Yuksel, S. E., Wilson, J. N., & Gader, P. D. (2012). Twenty years of mixture of experts. IEEE transactions on neural networks and learning systems, 23(8), 1177–1193.
Article Google Scholar
Zeng, Z., Zhang, Y., Zhang, H., & Zhang, L. (2021). In Proceedings of the 30th ACM SIGSOFT. International Symposium on Software Testing and Analysis pp. 427–438.
Zimmermann, T., Premraj, R., & Zeller, A. (2007). In Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007) (IEEE, 2007), pp. 9–9.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, ABV-Indian Institute of Information Technology and Management Gwalior, Gwalior, India
Aditya Shankar Mishra & Santosh Singh Rathore

Authors

Aditya Shankar Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Singh Rathore
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors confirm their contribution to the paper as follows: Aditya Shankar Mishra: study conception and design, data collection, experimental analysis. Santosh Singh Rathore: concept design, analysis and interpretation of results, draft manuscript. Both authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Santosh Singh Rathore.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Comparison with other state-of- the-art works

A comparative analysis of the presented MoE-based prediction models (MIoE and MEoE) has been performed against other previously published SDP-related works. We have not re-produced the results but borrowed them from the studies cited and used them. The works specified in Table 13 for comparison purposes are for the same datasets, with the same performance measures, and a similar experimental environment as used for our presented models. The table shows that except for the work presented by Panday et al. (2020), for all other works, the MIoE and MEoE models produced better performances for different performance measures. The presented work has used twenty-two defect datasets of different domains. For some of the datasets, the presented MoE model performed relatively poorly, reducing the methods’ average performance. In contrast, other previous works have considered fewer fault datasets for the performance evaluation of their methods. The average and highest values of the presented models were greater than their counterparts. Most of the works reported in Table 13 have used static ensemble methods, where the weights of the base learners were decided during training only. Moreover, only a few works have analyzed the base learners’ competitiveness for the ensemble, and the experimental analysis has also been limited to a few datasets.

Table 13 Comparison of the presented MIoE and MEoE models with state-of-the-art works (* showing the average values)

Full size table

Table 14 Parameter values of different used techniques

Full size table

Appendix B. Parameter details

In this study, we have used Python’s implementation of used machine learning techniques and ensemble methods. The following parameter values have been set for these techniques/methods as given in Table 14.

Appendix C. Description of the used software metrics

Table 15 provides descriptions of the used software metrics available in different software defect datasets used for experimentation in this work. A detailed description of these metrics can be found in Jureczko and Madeyski (2010), D’Ambros et al. (2010), Zimmermann et al. (2007), and Yatish et al. (2019).

Appendix D. Analysis of MIoE and MEoE models

The working of the presented MIoE and MEoE models (Algorithms 1 and 2) is based on standard algorithms and techniques. The analysis of the models and the steps of the algorithms are discussed as follows. The initial steps of the method involve data cleaning, data balancing, and handling high data dimensionality, which takes some non-constant time. In Algorithm-1 (MIoE model), the training data is initially randomly partitioned into sub-spaces equal to the number of experts. This step takes constant time and is applied only once. After that, experts are trained on their corresponding input sub-spaces. The training time depends on the learning technique; however, it is always less than training a single large model. Finally, the obtained best experts are tested on the testing dataset, and a gating function is then applied to combine the experts’ predictions. It takes some linear computation time. Other computations take constant time and are applied only once. Similarly, in Algorithm-2 (MEoE model), the X-means clustering algorithm is used to partition the training data into sub-spaces. It is applied only once and takes some linear computation time. After that, experts are trained on their corresponding input sub-spaces. Again, the training time depends on the learning technique, but it is always less than training a single large model. Finally, the obtained best experts are tested on the testing dataset, and the results are passed to the gating function for the final prediction. It takes some linear computation time. Other computations take constant time and are applied only once. On the machine with 16 GB of RAM and an Intel i7 processor, it took only 160 s to build the MIoE model and make the prediction and 180 s to build the MEoE model and make the prediction. In their work, Kondratyuk et al. (2020) studied the performance of ensemble or combinational models and found that ensembling can often be more efficient than training larger models. Additionally, the authors concluded that ensembles execute multiple models in parallel and then combine their outputs to make the final prediction. Therefore, the overall time will be better than the large single model.

Table 15 Description of the software metrics (Jureczko & Madeyski, 2010; D’Ambros et al., 2010; Zimmermann et al., 2007; Yatish et al., 2019)

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shankar Mishra, A., Singh Rathore, S. Implicit and explicit mixture of experts models for software defect prediction. Software Qual J 31, 1331–1368 (2023). https://doi.org/10.1007/s11219-023-09640-6

Download citation

Accepted: 13 May 2023
Published: 20 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11219-023-09640-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implicit and explicit mixture of experts models for software defect prediction

Abstract

Access this article

Similar content being viewed by others

Combined classifier for cross-project defect prediction: an extended empirical study

Using Ensemble of Different Classifiers for Defect Prediction

Heterogeneous Defect Prediction Using Ensemble Learning Technique

Data availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Appendices

Appendix A. Comparison with other state-of- the-art works

Appendix B. Parameter details

Appendix C. Description of the used software metrics

Appendix D. Analysis of MIoE and MEoE models

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Implicit and explicit mixture of experts models for software defect prediction

Abstract

Access this article

Similar content being viewed by others

Combined classifier for cross-project defect prediction: an extended empirical study

Using Ensemble of Different Classifiers for Defect Prediction

Heterogeneous Defect Prediction Using Ensemble Learning Technique

Data availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Appendices

Appendix A. Comparison with other state-of- the-art works

Appendix B. Parameter details

Appendix C. Description of the used software metrics

Appendix D. Analysis of MIoE and MEoE models

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation