Abstract
Defect prediction in software development is a very active topic of study. Software defect prediction (SDP) findings give the list of defect-prone source code artefacts, enabling quality assurance teams to efficiently allocate limited resources for validating software products. In order to enable both developers and reduce the time to market for more dependable software products, software defect prediction tools will play an increasingly significant role. Many machine learning approaches are present in the existing literature for SDP to enhance the performance of the software development team. However, very little work is reported for SDP using multi-core parallel computing. In this paper, a multi-core parallel machine learning approach for software defect prediction is proposed to classify a component as defective or non-defective. The proposed model has been built, trained and tested by varying the number of CPU cores involved in the processing. Extensive empirical studies have been conducted by applying the proposed approach on 11 software systems of NASA/PROMISE and other relevant repositories. The proposed approach has been compared with various state-of-art machine learning models to investigate the proposed models' supremacy in comparison with the other existing models. The experimental results indicate that the predictive performance of the proposed model is improved, and execution time is decreased by involving a greater number of CPU cores. Through evaluation of calculated results, it has been observed that the multi-core parallel processing Random Forest approach gives the best predicting performance parameters values nearly 99 or 100%. Moreover, the proposed approach performs significantly better in accuracy, precision, recall, F-Measures, and AUC compared to other machine learning models.
Similar content being viewed by others
References
http://promise.site.uottawa.ca/SERepository/datasets-page.html. (2022) Accessed Jan 2022
https://scikit-learn.org/stable/.(2022) Accessed Jan 2022
B. Ghotra , S. McIntosh , A.E. Hassan ,: A large-scale study of the impact of fea- ture selection techniques on defect classification models, In: Proceedings of the 14th International Conference on Mining Software Repositories (MSR), IEEE, 2017, pp. 146–157
Das, R., Walia, E.: Partition selection with sparse autoencoders for content based image classification. Neural. Comput. Appl. 31, 675–690 (2019)
Defect Datasets: https://github.com/klainfo/DefectData (2022). Accessed Jan 2022
Gong, L., Jiang, S., Bo, L., Jiang, L., Qian, J.: A novel class-imbalance learning approach for both within-project and cross-project defect prediction. IEEE Trans. Reliab. 69(1), 40–54 (2019)
Guo, J., Chen, Z., Ban, Y.-L.: Precise enumeration of circulating tumor cells using support vector machine algorithm on a microfluidic sensor. IEEE Trans. Emerging Top. Comput. 5(99), 518–525 (2017)
Herbold, S.: Comments on ScottKnottESD in response to : an empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 99, 1091–1094 (2017)
Hijazi, N.M., Faris, H., Aljarah, I.: A parallel metaheuristic approach for ensemble feature selection based on multi-core architectures. Expert Syst. Appl. 182, 115290 (2021)
Hong, L., Dai, F., Liu, H.: A fused-lasso-based Doppler imaging algorithm for spinning targets with occlusion effect. IEEE Sens. J. 16(9), 3099–3108 (2016)
Jin, C.: Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Syst. Appl. 171, 114637 (2021)
Kalaivani, N., Beena, R.: Overview of software defect prediction using machine learning algorithms. Int. J. Pure Appl. Math. 118(20), 3863–3873 (2018)
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)
Li, F.: Lu Y :Lasso-type estimation for covariate-adjusted linear model. J. Appl. Stat. 45(1), 1–17 (2016)
Limsettho, N., Bennin, K.E., Keung, J.W., Hata, H., Matsumoto, K.: Cross project defect prediction using class distribution estimation and oversampling. Inf. Softw. Technol. 100, 87–102 (2018)
Liu, C., Yang, D., Xia, X., Yan, M., Zhang, X.: A two-phase transfer learning model for cross-project defect prediction. Inf. Softw. Technol. 107, 125–136 (2019)
Luo, G., Chen, H.: Kernel based asymmetric learning for software defect prediction. IEICE Trans. Inf. Syst. 95(1), 267–270 (2012)
Luo, G., Ma, Y., Qin, K.: Asymmetric learning based on kernel partial least squares for software defect prediction. IEICE Trans. Inf. Syst. 95(7), 2006–2008 (2012)
Majumder, S., Mody, P., Menzies, T.: Revisiting process versus product metrics: a large scale analysis. Empir. Softw. Eng. 27(3), 1–42 (2020)
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Softw. Comput. 27, 504–518 (2015)
Malhotra, R.: An empirical framework for defect prediction using machine learning techniques with android software. Appl. Softw. Comput. 49, 1034–1050 (2016)
Nam, J., Fu, W., Kim, S.: Heterogeneous defect prediction. IEEE Trans. Softw. Eng. 44(9), 874–896 (2018)
Peng, X.: A spheres-based support vector machine for pattern classification. Neural. Comput. Appl. 31, 379–396 (2019)
R. Malhotra , R. Raje :An empirical comparison of machine learning techniques for software defect prediction, In: Proceedings of the 8th International Conference on Bioinspired Information and Communications Technologies, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2014, pp. 320–327
Radmanesh, N., Burnett, I., Rao, B.: A lasso-LS optimization with a frequency variable dictionary in a multizone sound system. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 583–593 (2016)
Random Forest. https://www.datacamp.com/community/tutorials/random-forests-classifier-python#building. (2021) Accessed Aug, 2021
Random Forest. https://towardsdatascience.com/the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3. (2021) Accessed on Aug, 2021
Ren, K., Qin, Y., Ma, G. Luo.: On software defect prediction using machine learning. J. Appl. Math (2014). https://doi.org/10.1155/2014/785435
Shrikanth, NC. Majumder, S. and Menzies T (2021). Early life cycle software defect prediction. why? how? In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 448–459, IEEE Computer Society
Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., Rellermeyer, J.S.: A survey on distributed machine learning. ACM Comput. Surv. 53(2), 1–33 (2017)
Vijayakumar, K., Arun, C.: Continuous security assessment of cloud based applications using distributed hashing algorithm in SDLC. Clust. Computing 22(5), 10789–10800 (2019)
Wang, K., Liu, L., Yuan, C., Wang, Z.: Software defect prediction model based on LASSO–SVM. Neural Comput. Appl. 33(14), 8249–8259 (2021)
Xu, Z., Liu, J., Luo, X., Yang, Z., Zhang, Y., Yuan, P., Zhang, T.: Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf. Softw. Technol. 106, 182–200 (2019)
Yu, X., Liu, J., Peng, W.: Improving cross-company defect prediction with data filtering. Int. J. Softw. Eng. Knowl. Eng. 27(10), 1427–1438 (2017)
Zhang, Z.-W., Jing, X.-Y., Wang, T.-J.: Label propagation based semi-supervised learning for software defect prediction. Autom. Softw. Eng. 24(1), 1–23 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Parashar, A., Kumar Goyal, R., Kaushal, S. et al. Machine learning approach for software defect prediction using multi-core parallel computing. Autom Softw Eng 29, 44 (2022). https://doi.org/10.1007/s10515-022-00340-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-022-00340-2