RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Bal, Pravas Ranjan; Kumar, Sandeep

doi:10.1007/s12652-022-03820-1

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Original Research
Published: 31 March 2022

Volume 14, pages 13523–13542, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

222 Accesses
Explore all metrics

Abstract

Cross project fault prediction (CPFP) is a challenging issue in the software fault prediction (SFP) domain due to different data distributions in source and target datasets. To resolve this issue, we have proposed an efficient and improved version of the existing regularized extreme learning machine (RELM), we call it as RELM Plus. The proposed RELM Plus model is further extended using the concept of matched metrics to predict the number of software faults on cross-project data, we call it as RELMP-MM model. The proposed RELMP-MM model selects the source dataset corresponding to the given target dataset based on the number of identical matched metrics. Then, the proposed model predicts the number of software faults on the given target dataset. In this paper, we have considered both within project fault prediction (WPFP) as well as CPFP. The proposed model is validated using twenty-five public datasets. The experimental results along with the statistical analysis show that the proposed RELMP-MM model performs significantly better as compared to existing state of the art models. It shows an improvement of at least 8% to 13% in terms of Average Absolute Error (AAE) and 7% to 12% in terms of Average Relative Error (ARE).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive software maintenance utilizing cross-project data

Article 23 June 2023

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Article 03 January 2021

Ensemble Based-Cross Project Defect Prediction

Data availibility statement

The datasets generated during or analysed during the current study are available in the [PROMISE] repository, [www.github.com/klainfo/DefectData].

Notes

Ordinary Least Square.
We have used samples and modules interchangeably in this paper. Both represent the same meaning.

References

Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735–739
Article Google Scholar
Amasaki S (2020) Cross-version defect prediction: use historical data, cross-project data, or both? Empir Softw Eng 25(2):1573–1595
Article Google Scholar
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 2011 33rd international conference on software engineering (ICSE), IEEE, pp 1–10
Bal PR, Kumar S (2020) Wr-elm: weighted regularization extreme learning machine for imbalance learning in software fault prediction. IEEE Trans Reliab 69(4):1355–1375
Article Google Scholar
Breheny P (2011) Ridge regression. University of Kentucky, Class Lecture. www.webasukyedu/statistics/users/pbreheny/764-F11/notes/9-1pdf
Chen J, Hu K, Yang Y, Liu Y, Xuan Q (2020) Collective transfer learning for defect prediction. Neurocomputing 416:103–116
Article Google Scholar
Chen M, Ma Y (2015) An empirical study on predicting defect numbers. Int Conf Softw Eng Knowl Eng 15:397–402
Article Google Scholar
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, pp 389–395
Ding S, Zhao H, Zhang Y, Xu X, Nie R (2015) Extreme learning machine: algorithm, theory and applications. Artif Intell Rev 44(1):103–115
Article Google Scholar
Du X, Zhou Z, Yin B, Xiao G (2020) Cross-project bug type prediction based on transfer learning. Softw Qual J 28(1):39–57
Article Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Article MathSciNet MATH Google Scholar
Gao K, Khoshgoftaar TM (2007) A comprehensive empirical study of count models for software fault prediction. IEEE Trans Reliab 56(2):223–236
Article Google Scholar
Golub GH, Reinsch C (1971) Singular value decomposition and least squares solutions. In: Linear algebra, Springer, pp 134–151
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Article Google Scholar
Hosseini S, Turhan B, Gunarathna D (2017) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Article Google Scholar
Huang GB, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529
Article Google Scholar
Jureczko M (2011) Significance of different software metrics in defect prediction. Softw Eng Int J 1(1):86–95
Google Scholar
Khoshgoftaar TM, Gao K (2007) Count models for software quality estimation. IEEE Trans Reliab 56(2):212–222
Article Google Scholar
Kläs M, Elberzhager F, Münch J, Hartjes K, von Graevemeyer O (2010) Transparent combination of expert and measurement data for defect prediction: an industrial case study. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering-Volume 2, pp 119–128
Krasner H (2018) The cost of poor quality software in the us: a 2018 report. Consortium for IT Software Quality, Tech Rep, p 10
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Article Google Scholar
Lewis C, Ou R (2011) Bug prediction at google. www.google-engtoolsblogspotin/2011/12/bug-prediction-at-goodlehtml
Li PL, Herbsleb J, Shaw M, Robinson B (2006) Experiences and results from initiating field defect prediction and product test prioritization efforts at abb inc. In: Proceedings of the 28th international conference on Software engineering, pp 413–422
Lilliefors HW (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 62(318):399–402
Article Google Scholar
Limsettho N, Bennin KE, Keung JW, Hata H, Matsumoto K (2018) Cross project defect prediction using class distribution estimation and oversampling. Inf Softw Technol 100:87–102
Article Google Scholar
MacDonell SG (1997) Establishing relationships between specification size and software process effort in case environments. Inf Softw Technol 39(1):35–45
Article Google Scholar
Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78
Article MATH Google Scholar
Melekoodappattu JG, Subbian PS (2020) Automated breast cancer detection using hybrid extreme learning machine classifier. J Ambient Intell Hum Comput pp 1–10
Murugan R, Goel T (2021) E-diconet: extreme learning machine based classifier for diagnosis of covid-19 using deep convolutional network. J Ambient Intell Hum Comput 12(9):8887–8898
Article Google Scholar
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on Software engineering, pp 452–461
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), IEEE, pp 382–391
Nam J, Fu W, Kim S, Menzies T, Tan L (2017) Heterogeneous defect prediction. IEEE Trans Software Eng 44(9):874–896
Article Google Scholar
Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. ACM SIGSOFT Softw Eng Notes 29(4):86–96
Article Google Scholar
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pp 2–12
Rathore SS, Kumar S (2017) An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput 21(24):7417–7434
Article Google Scholar
Rathore SS, Kumar S (2017) Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl-Based Syst 119:232–256
Article Google Scholar
Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82:357–382
Article Google Scholar
Rathore SS, Kumar S (2018) An approach for the prediction of number of software faults based on the dynamic selection of learning techniques. IEEE Trans Reliab 68(1):216–236
Article Google Scholar
Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71
Article Google Scholar
Selesnick I (2013) Least squares with examples in signal processing. Connexions 4
Sureka N, Gunaseelan K (2021) Investigations on detection and prevention of primary user emulation attack in cognitive radio networks using extreme machine learning algorithm. J Ambient Intell Hum Comput pp 1–10
Tantithamthavorn C (2015) An R package of defect prediction datasets for software engineering research. www.github.com/klainfo/DefectData
Torgo L, Ribeiro RP, Pfahringer B, Branco P (2013) Smote for regression. In: Proceedings of the 2013 Portuguese conference on artificial intelligence, Springer, pp 378–389
Weng F, Chen Y, Wang Z, Hou M, Luo J, Tian Z (2020) Gold price forecasting research based on an improved online extreme learning machine algorithm. J Ambient Intell Hum Comput 11(10):4101–4111
Article Google Scholar
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics, Springer, pp 196–202
Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200
Article Google Scholar
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE), IEEE, pp 309–320
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 91–100

Download references

Acknowledgements

“Authors are thankful to the SERB, Government of India for project funding under VAJRA Scheme. We are thankful to the editor and anonymous reviewers for valuable feedback.”

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, 247667, India
Pravas Ranjan Bal & Sandeep Kumar

Authors

Pravas Ranjan Bal
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandeep Kumar.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

List of matched source and target datasets

The complete list of matched source and target datasets for the CPFP scenario as generated using Matched Metrics process under RELMP-MM is given below:

1.
Jedit 4.1 - Ant 1.3
2.
Jedit 4.1 - Ant 1.4
3.
Xalan 2.4 - Ant 1.5
4.
Jedit 4.1 - Ant 1.6
5.
Jedit 4.1 - Ant 1.7
6.
Ivy 1.4 - Camel 1.0
7.
Ant 1.4 - Camel 1.2
8.
Ivy 1.4 - Camel 1.4
9.
Ivy 1.4 - Camel 1.6
10.
Ant 1.6 - Jedit 4.0
11.
Ant 1.6 - Jedit 4.1
12.
Ant 1.6 - Jedit 4.2
13.
Ant 1.4 - Jedit 4.3
14.
Log4j 1.1 - Synapse 1.0
15.
Ant 1.5 - Synapse 1.1
16.
Jedit 4.0 - Synapse 1.2
17.
Ant 1.5 - Xalan 2.4
18.
Ant 1.5 - Xalan 2.5
19.
Ant 1.4 - Xalan 2.6
20.
Ant 1.4 - Ivy 1.4
21.
Jedit 4.0 - Ivy 2.0
22.
Ant 1.4 - Log4j 1.0
23.
Ant 1.4 - Log4j 1.1
24.
Ivy 1.4 - Xerces 1.2
25.
Ivy 1.4 - Xerces 1.3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bal, P.R., Kumar, S. RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics. J Ambient Intell Human Comput 14, 13523–13542 (2023). https://doi.org/10.1007/s12652-022-03820-1

Download citation

Received: 22 April 2021
Accepted: 10 March 2022
Published: 31 March 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s12652-022-03820-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Abstract

Access this article

Similar content being viewed by others

Predictive software maintenance utilizing cross-project data

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Ensemble Based-Cross Project Defect Prediction

Data availibility statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

List of matched source and target datasets

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Abstract

Access this article

Similar content being viewed by others

Predictive software maintenance utilizing cross-project data

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Ensemble Based-Cross Project Defect Prediction

Data availibility statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

List of matched source and target datasets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation