Skip to main content
Log in

Implicit and explicit mixture of experts models for software defect prediction

  • Research
  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Accurately predicting defects in software modules helps the developers and testers to find the defective modules quickly and save their efforts in other software development aspects. Most previous studies have used single machine learning technique-based models to detect defects in software. These models have produced limited results as they perform well in only some parts of the data and fail to capture all the defect-causing patterns. The mixture of experts (MoE) is a combination method that utilizes experts specialized in the given data subspaces. The results of different specialized experts are combined according to their specific expertise for the final prediction governed by a gating network. This paper explores using the MoE method and presents implicit and explicit MoE-based models for software defect prediction. The presented models are evaluated via an experimental study on twenty-two software defect datasets collected from AEEEM, PROMISE, and JIRA repositories. The prediction performance of the presented models is evaluated using accuracy, f1-score, area under the ROC curve (AUC), and Mathew correlation coefficient (MCC) performance metrics. The experimental results showed that the presented MoE-based models outperformed different machine learning and ensemble techniques, such as Bagging and AdaBoost, and produced a state-of-the-art performance for defect prediction. Additionally, we found that the MoE models produced better or at least equal performance than the DNN-based model for most cases. The results are consistent for all the datasets. The results of the Wilcoxon test also showed that the presented models performed significantly better than the other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The datasets used during and/or analyzed during the current study are available in the online repository, https://zenodo.org/record/3362613#.YrhPAnZByM8.

Notes

  1. https://zenodo.org/record/3362613#.YrhPAnZByM8

References

  • Alsawalqah, J., Faris, H., Aljarah, I., Alnemer, L., & Alhindawi, N. (2017). In Computer Science On-line Conference (Springer, 2017) pp. 355–366.

  • Arora, I., Tetarwal, V., & Saha, A. (2015). Open issues in software defect prediction. Procedia Computer Science, 46, 906–912.

    Article  Google Scholar 

  • Assim, M., Obeidat, Q., & Hammad, M. (2020). In 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI) (2020), pp. 1–6.

  • Bock, A. S., & Fine, I. (2014). Anatomical and functional plasticity in early blind individuals and the mixture of experts architecture. Frontiers in human neuroscience, 8, 971.

    Article  Google Scholar 

  • Bowes, D., Hall, T., & Petrić, J. (2018). Software defect prediction: do different classifiers find the same defects? Software Quality Journal, 26(2), 525–552.

    Article  Google Scholar 

  • Canfora, G., De Lucia, A., Di Penta, M., Oliveto, R., Panichella, A., & Panichella, S. (2023). In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation (IEEE, 2013), pp. 252–261.

  • Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215.

    Article  Google Scholar 

  • Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.

    Article  Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.

    Article  MATH  Google Scholar 

  • D’ Ambros, M., Lanza, M., & Robbes, R. (2010) In 2010 7th IEEE working conference on mining software repositories (MSR 2010) (IEEE, 2010), pp. 31–41.

  • D’Ambros, M., Lanza, M., & Robbes, R. (2012). Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering, 17, 531–577.

    Article  Google Scholar 

  • Deep Singh, P., & Chug, A. (2017). In 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence pp. 775–781.

  • Di Nucci, D., Palomba, F., &  De Lucia, A. (2018). In 2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE) (IEEE, 2018), pp. 48–54.

  • Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics, 35(5–6), 352–359.

    Article  Google Scholar 

  • Elmishali, A., & Kalech, M. (2023). Issues-driven features for software fault prediction. Information and Software Technology 155, 107102.

  • Feng, S., Keung, J., Yu, X., Xiao, Y., Bennin, K. E., Kabir, M. A., & Zhang, M. (2021). Coste: Complexity-based oversampling technique to alleviate the class imbalance problem in software defect prediction. Information and Software Technology 129, 106432.

  • Feng, S. J., Keung, X., Yu, Y., & Xiao, M. (2021). Zhang, Investigation on the stability of smote-based oversampling techniques in software defect prediction. Information and Software Technology 139, 106662.

  • Ferrari, D., & Milioni, A. (2011). Choices and pitfalls concerning mixture-of-experts modeling. Pesquisa Operacional, 31, 95–111.

    Article  Google Scholar 

  • Ghosh S., Rana A., & Kansal V. (2018). A nonlinear manifold detection based model for software defect prediction. International Conference on Computational Intelligence and Data Science, Procedia Computer Science 132, 581–594.

  • Gormley, I. C., & Frühwirth-Schnatter, S. (2019). In Handbook of mixture analysis (Chapman and Hall/CRC, 2019) pp. 271–307.

  • Jović, A., Brkić, K., & Bogunović, N. (2015). In 2015 38th International convention on information and communication technology, electronics and microelectronics (MIPRO) (IEEE, 2015), pp. 1200–1205.

  • Jureczko, M., & Madeyski, L. (2010). In Proceedings of the 6th international conference on predictive models in software engineering pp. 1–10.

  • Kondratyuk, D., Tan, M., Brown, M., & Gong, B. (2020). When ensembling smaller models is more efficient than single large models. arXiv preprint arXiv:2005.00570

  • Komaroff, E. (2020). Relationships between p-values and pearson correlation coefficients, type 1 errors and effect size errors, under a true null hypothesis. Journal of Statistical Theory and Practice, 14(3), 1–13.

    Article  MathSciNet  MATH  Google Scholar 

  • Kwak, S. K., & Kim, J. H. (2017). Statistical data preparation: Management of missing values and outliers. Korean Journal of Anesthesiology, 70(4), 407–411.

    Article  Google Scholar 

  • Li, L., Lessmann, S., & Baesens, B. (2019). Evaluating software defect prediction performance: an updated benchmarking study. arXiv preprint. http://arxiv.org/abs/1901.01726

  • Li, N., Shepperd, M., & Guo, Y. (2020). A systematic review of unsupervised learning techniques for software defect prediction. Information and Software Technology 122, 106287.

  • Liaw, A., Wiener, M., et al., (2002). Classification and regression by randomforest. R news, 2(3), 18–22.

    Google Scholar 

  • Majd, A., Vahidi-Asl, M., Khalilian, A., Poorsarvi-Tehrani, P., & Haghighi, H. (2020). Sldeep: Statement-level software defect prediction using deep-learning model on static code features. Expert Systems with Applications 147, 113156.

  • Masoudnia, S., & Ebrahimpour, R. (2014). Mixture of experts: a literature survey. Artificial Intelligence Review, 42(2), 275–293.

    Article  Google Scholar 

  • Moustafa, S., ElNainay, M. Y., El Makky, N., & Abougabal, M. S. (2018). Software bug prediction using weighted majority voting techniques. Alexandria engineering journal, 57(4), 2763–2774.

    Article  Google Scholar 

  • Nam, J. (2014). Survey on software defect prediction. Department of Compter Science and Engineering, The Hong Kong University of Science and Technology, Tech. Rep.

  • Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21.

    Article  Google Scholar 

  • Niu, J., Li, Z., Chen, H., Dong, X., & Jing, X. Y. (2022) Data sampling and kernel manifold discriminant alignment for mixed-project heterogeneous defect prediction. Software Quality Journal pp. 1–35.

  • O’Doherty, J. P., Lee, S. W., Tadayonnejad, R., Cockburn, J., Iigaya, K., & Charpentier, C. J. (2021). Why and how the brain weights contributions from a mixture of experts. Neuroscience & Biobehavioral Reviews, 123, 14–23.

    Article  Google Scholar 

  • Pandey, S. K., Mishra, R. B., & Tripathi, A. K. (2020) BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Systems with Applications 144, 113085.

  • Parsons, V. L. (2014). Stratified sampling. Wiley Stats Ref: Statistics Reference Online pp. 1–11.

  • Pelleg, D., Moore, A. W. et al. (2000). In ICML, vol. 1 pp. 727–734.

  • Priyanka, D. (2020). Kumar, Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269.

    Article  Google Scholar 

  • Qiao, L., Li, X., Umer, Q., & Guo, P. (2020). Deep learning based software defect prediction. Neurocomputing, 385, 100–110.

    Article  Google Scholar 

  • Rathore, S. S., & Kumar, S. (2021). An empirical study of ensemble techniques for software fault prediction. Applied Intelligence, 51(6), 3615–3644.

    Article  Google Scholar 

  • Radwan, A., Kamarudin, N., Solihin, M. I., Leong, H., Rizon, M., Hazry, D., & Bin Azizan, M. A. (2020). X-means clustering for wireless sensor networks. Journal of Robotics Networking and Artificial Life 7(2), 111–115.

  • Rey, D., & Neuhäuser, M. (2011) In International encyclopedia of statistical science (Springer, 2011), pp. 1658–1659.

  • Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: Roc area, cohen’s d, and r. Law and human behavior, 29(5), 615–620.

    Article  Google Scholar 

  • Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.

    Google Scholar 

  • Shao, Y., Liu, B., Wang, S., & Li, G. (2020). Software defect prediction based on correlation weighted class association rule mining. Knowledge-Based Systems 196, 105742.

  • Singh, P. K., Panda, R., & Sangwan, O. P. (2015). A critical analysis on software fault prediction techniques. World applied sciences journal, 33(3), 371–379.

    Google Scholar 

  • Sotto-Mayor, B., Elmishali, A., Kalech, M., & Abreu, R. (2022). Exploring design smells for smell-based defect prediction. Engineering Applications of Artificial Intelligence 115, 105240.

  • Sotto-Mayor, B., & Kalech, M. (2021). Cross-project smell-based defect prediction. Soft Computing, 25(22), 14171–14181.

    Article  Google Scholar 

  • Tanaka, K., Monden, A., & Yücel, Z. (2019, July). Prediction of software defects using automated machine learning. In 2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 490-494). IEEE.

  • Tantithamthavorn, C. K. (2022). Large defect prediction benchmark. Zenodo. Retrieved from https://zenodo.org/record/6342328

  • Thota, M. K., Shajin, F. H., Rajesh, P., et al., (2020). Survey on software defect prediction techniques. International Journal of Applied Science and Engineering, 17(4), 331–344.

    Google Scholar 

  • Wahono, R. S. (2015). A systematic literature review of software defect prediction. Journal of Software Engineering, 1(1), 1–16.

    Google Scholar 

  • Wang, H., Zhuang, W., & Zhang, X. (2021). Software defect prediction based on gated hierarchical lstms. IEEE Transactions on Reliability, 70(2), 711–727.

    Article  Google Scholar 

  • Waterhouse S. R. (1998). Classification and regression using mixtures of experts. Ph.D. thesis, CiteSeer.

  • Woolson, R. F. (2007). Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials pp. 1–3.

  • Wu, J., Wu, Y., Niu, N., & Zhou, M. (2021). Mhcpdp: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder. Software Quality Journal, 29(2), 405–430.

    Article  Google Scholar 

  • Xu, Z., Liu, J., Luo, X., Yang, Z., Zhang, Y., Yuan, P., Tang, Y., & Zhang, T. (2019). Software defect prediction based on kernel pca and weighted extreme learning machine. Information and Software Technology, 106, 182–200.

    Article  Google Scholar 

  • Yang, X., Lo, D., Xia, X., & Sun, J. (2017). Tlel: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology, 87, 206–220.

    Article  Google Scholar 

  • Yatish, S., Jiarpakdee, J., Thongtanunam, P., & Tantithamthavorn, C. (2019). In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (IEEE, 2019), pp. 654–665.

  • Yigit, H. (2013). In 2013 international conference on electronics, computer and computation (ICECCO) (IEEE, 2013), pp. 228–231.

  • Yuksel, S. E., Wilson, J. N., & Gader, P. D. (2012). Twenty years of mixture of experts. IEEE transactions on neural networks and learning systems, 23(8), 1177–1193.

    Article  Google Scholar 

  • Zeng, Z., Zhang, Y., Zhang, H., & Zhang, L. (2021). In Proceedings of the 30th ACM SIGSOFT. International Symposium on Software Testing and Analysis pp. 427–438.

  • Zimmermann, T., Premraj, R., &  Zeller, A. (2007). In Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007) (IEEE, 2007), pp. 9–9.

Download references

Author information

Authors and Affiliations

Authors

Contributions

The authors confirm their contribution to the paper as follows: Aditya Shankar Mishra: study conception and design, data collection, experimental analysis. Santosh Singh Rathore: concept design, analysis and interpretation of results, draft manuscript. Both authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Santosh Singh Rathore.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Comparison with other state-of- the-art works

A comparative analysis of the presented MoE-based prediction models (MIoE and MEoE) has been performed against other previously published SDP-related works. We have not re-produced the results but borrowed them from the studies cited and used them. The works specified in Table 13 for comparison purposes are for the same datasets, with the same performance measures, and a similar experimental environment as used for our presented models. The table shows that except for the work presented by Panday et al. (2020), for all other works, the MIoE and MEoE models produced better performances for different performance measures. The presented work has used twenty-two defect datasets of different domains. For some of the datasets, the presented MoE model performed relatively poorly, reducing the methods’ average performance. In contrast, other previous works have considered fewer fault datasets for the performance evaluation of their methods. The average and highest values of the presented models were greater than their counterparts. Most of the works reported in Table 13 have used static ensemble methods, where the weights of the base learners were decided during training only. Moreover, only a few works have analyzed the base learners’ competitiveness for the ensemble, and the experimental analysis has also been limited to a few datasets.

Table 13 Comparison of the presented MIoE and MEoE models with state-of-the-art works (* showing the average values)
Table 14 Parameter values of different used techniques

Appendix B. Parameter details

In this study, we have used Python’s implementation of used machine learning techniques and ensemble methods. The following parameter values have been set for these techniques/methods as given in Table 14.

Appendix C. Description of the used software metrics

Table 15 provides descriptions of the used software metrics available in different software defect datasets used for experimentation in this work. A detailed description of these metrics can be found in Jureczko and Madeyski (2010), D’Ambros et al. (2010), Zimmermann et al. (2007), and Yatish et al. (2019).

Appendix D. Analysis of MIoE and MEoE models

The working of the presented MIoE and MEoE models (Algorithms 1 and 2) is based on standard algorithms and techniques. The analysis of the models and the steps of the algorithms are discussed as follows. The initial steps of the method involve data cleaning, data balancing, and handling high data dimensionality, which takes some non-constant time. In Algorithm-1 (MIoE model), the training data is initially randomly partitioned into sub-spaces equal to the number of experts. This step takes constant time and is applied only once. After that, experts are trained on their corresponding input sub-spaces. The training time depends on the learning technique; however, it is always less than training a single large model. Finally, the obtained best experts are tested on the testing dataset, and a gating function is then applied to combine the experts’ predictions. It takes some linear computation time. Other computations take constant time and are applied only once. Similarly, in Algorithm-2 (MEoE model), the X-means clustering algorithm is used to partition the training data into sub-spaces. It is applied only once and takes some linear computation time. After that, experts are trained on their corresponding input sub-spaces. Again, the training time depends on the learning technique, but it is always less than training a single large model. Finally, the obtained best experts are tested on the testing dataset, and the results are passed to the gating function for the final prediction. It takes some linear computation time. Other computations take constant time and are applied only once. On the machine with 16 GB of RAM and an Intel i7 processor, it took only 160 s to build the MIoE model and make the prediction and 180 s to build the MEoE model and make the prediction. In their work, Kondratyuk et al. (2020) studied the performance of ensemble or combinational models and found that ensembling can often be more efficient than training larger models. Additionally, the authors concluded that ensembles execute multiple models in parallel and then combine their outputs to make the final prediction. Therefore, the overall time will be better than the large single model.

Table 15 Description of the software metrics (Jureczko & Madeyski, 2010; D’Ambros et al., 2010; Zimmermann et al., 2007; Yatish et al., 2019)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shankar Mishra, A., Singh Rathore, S. Implicit and explicit mixture of experts models for software defect prediction. Software Qual J 31, 1331–1368 (2023). https://doi.org/10.1007/s11219-023-09640-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-023-09640-6

Keywords

Navigation