Abstract
Cross project fault prediction (CPFP) is a challenging issue in the software fault prediction (SFP) domain due to different data distributions in source and target datasets. To resolve this issue, we have proposed an efficient and improved version of the existing regularized extreme learning machine (RELM), we call it as RELM Plus. The proposed RELM Plus model is further extended using the concept of matched metrics to predict the number of software faults on cross-project data, we call it as RELMP-MM model. The proposed RELMP-MM model selects the source dataset corresponding to the given target dataset based on the number of identical matched metrics. Then, the proposed model predicts the number of software faults on the given target dataset. In this paper, we have considered both within project fault prediction (WPFP) as well as CPFP. The proposed model is validated using twenty-five public datasets. The experimental results along with the statistical analysis show that the proposed RELMP-MM model performs significantly better as compared to existing state of the art models. It shows an improvement of at least 8% to 13% in terms of Average Absolute Error (AAE) and 7% to 12% in terms of Average Relative Error (ARE).
Similar content being viewed by others
Data availibility statement
The datasets generated during or analysed during the current study are available in the [PROMISE] repository, [www.github.com/klainfo/DefectData].
Notes
Ordinary Least Square.
We have used samples and modules interchangeably in this paper. Both represent the same meaning.
References
Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735–739
Amasaki S (2020) Cross-version defect prediction: use historical data, cross-project data, or both? Empir Softw Eng 25(2):1573–1595
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 2011 33rd international conference on software engineering (ICSE), IEEE, pp 1–10
Bal PR, Kumar S (2020) Wr-elm: weighted regularization extreme learning machine for imbalance learning in software fault prediction. IEEE Trans Reliab 69(4):1355–1375
Breheny P (2011) Ridge regression. University of Kentucky, Class Lecture. www.webasukyedu/statistics/users/pbreheny/764-F11/notes/9-1pdf
Chen J, Hu K, Yang Y, Liu Y, Xuan Q (2020) Collective transfer learning for defect prediction. Neurocomputing 416:103–116
Chen M, Ma Y (2015) An empirical study on predicting defect numbers. Int Conf Softw Eng Knowl Eng 15:397–402
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, pp 389–395
Ding S, Zhao H, Zhang Y, Xu X, Nie R (2015) Extreme learning machine: algorithm, theory and applications. Artif Intell Rev 44(1):103–115
Du X, Zhou Z, Yin B, Xiao G (2020) Cross-project bug type prediction based on transfer learning. Softw Qual J 28(1):39–57
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Gao K, Khoshgoftaar TM (2007) A comprehensive empirical study of count models for software fault prediction. IEEE Trans Reliab 56(2):223–236
Golub GH, Reinsch C (1971) Singular value decomposition and least squares solutions. In: Linear algebra, Springer, pp 134–151
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Hosseini S, Turhan B, Gunarathna D (2017) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Huang GB, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529
Jureczko M (2011) Significance of different software metrics in defect prediction. Softw Eng Int J 1(1):86–95
Khoshgoftaar TM, Gao K (2007) Count models for software quality estimation. IEEE Trans Reliab 56(2):212–222
Kläs M, Elberzhager F, Münch J, Hartjes K, von Graevemeyer O (2010) Transparent combination of expert and measurement data for defect prediction: an industrial case study. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering-Volume 2, pp 119–128
Krasner H (2018) The cost of poor quality software in the us: a 2018 report. Consortium for IT Software Quality, Tech Rep, p 10
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Lewis C, Ou R (2011) Bug prediction at google. www.google-engtoolsblogspotin/2011/12/bug-prediction-at-goodlehtml
Li PL, Herbsleb J, Shaw M, Robinson B (2006) Experiences and results from initiating field defect prediction and product test prioritization efforts at abb inc. In: Proceedings of the 28th international conference on Software engineering, pp 413–422
Lilliefors HW (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 62(318):399–402
Limsettho N, Bennin KE, Keung JW, Hata H, Matsumoto K (2018) Cross project defect prediction using class distribution estimation and oversampling. Inf Softw Technol 100:87–102
MacDonell SG (1997) Establishing relationships between specification size and software process effort in case environments. Inf Softw Technol 39(1):35–45
Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78
Melekoodappattu JG, Subbian PS (2020) Automated breast cancer detection using hybrid extreme learning machine classifier. J Ambient Intell Hum Comput pp 1–10
Murugan R, Goel T (2021) E-diconet: extreme learning machine based classifier for diagnosis of covid-19 using deep convolutional network. J Ambient Intell Hum Comput 12(9):8887–8898
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on Software engineering, pp 452–461
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), IEEE, pp 382–391
Nam J, Fu W, Kim S, Menzies T, Tan L (2017) Heterogeneous defect prediction. IEEE Trans Software Eng 44(9):874–896
Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. ACM SIGSOFT Softw Eng Notes 29(4):86–96
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pp 2–12
Rathore SS, Kumar S (2017) An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput 21(24):7417–7434
Rathore SS, Kumar S (2017) Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl-Based Syst 119:232–256
Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82:357–382
Rathore SS, Kumar S (2018) An approach for the prediction of number of software faults based on the dynamic selection of learning techniques. IEEE Trans Reliab 68(1):216–236
Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71
Selesnick I (2013) Least squares with examples in signal processing. Connexions 4
Sureka N, Gunaseelan K (2021) Investigations on detection and prevention of primary user emulation attack in cognitive radio networks using extreme machine learning algorithm. J Ambient Intell Hum Comput pp 1–10
Tantithamthavorn C (2015) An R package of defect prediction datasets for software engineering research. www.github.com/klainfo/DefectData
Torgo L, Ribeiro RP, Pfahringer B, Branco P (2013) Smote for regression. In: Proceedings of the 2013 Portuguese conference on artificial intelligence, Springer, pp 378–389
Weng F, Chen Y, Wang Z, Hou M, Luo J, Tian Z (2020) Gold price forecasting research based on an improved online extreme learning machine algorithm. J Ambient Intell Hum Comput 11(10):4101–4111
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics, Springer, pp 196–202
Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE), IEEE, pp 309–320
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 91–100
Acknowledgements
“Authors are thankful to the SERB, Government of India for project funding under VAJRA Scheme. We are thankful to the editor and anonymous reviewers for valuable feedback.”
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
List of matched source and target datasets
The complete list of matched source and target datasets for the CPFP scenario as generated using Matched Metrics process under RELMP-MM is given below:
-
1.
Jedit 4.1 - Ant 1.3
-
2.
Jedit 4.1 - Ant 1.4
-
3.
Xalan 2.4 - Ant 1.5
-
4.
Jedit 4.1 - Ant 1.6
-
5.
Jedit 4.1 - Ant 1.7
-
6.
Ivy 1.4 - Camel 1.0
-
7.
Ant 1.4 - Camel 1.2
-
8.
Ivy 1.4 - Camel 1.4
-
9.
Ivy 1.4 - Camel 1.6
-
10.
Ant 1.6 - Jedit 4.0
-
11.
Ant 1.6 - Jedit 4.1
-
12.
Ant 1.6 - Jedit 4.2
-
13.
Ant 1.4 - Jedit 4.3
-
14.
Log4j 1.1 - Synapse 1.0
-
15.
Ant 1.5 - Synapse 1.1
-
16.
Jedit 4.0 - Synapse 1.2
-
17.
Ant 1.5 - Xalan 2.4
-
18.
Ant 1.5 - Xalan 2.5
-
19.
Ant 1.4 - Xalan 2.6
-
20.
Ant 1.4 - Ivy 1.4
-
21.
Jedit 4.0 - Ivy 2.0
-
22.
Ant 1.4 - Log4j 1.0
-
23.
Ant 1.4 - Log4j 1.1
-
24.
Ivy 1.4 - Xerces 1.2
-
25.
Ivy 1.4 - Xerces 1.3
Rights and permissions
About this article
Cite this article
Bal, P.R., Kumar, S. RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics. J Ambient Intell Human Comput 14, 13523–13542 (2023). https://doi.org/10.1007/s12652-022-03820-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03820-1