Skip to main content
Log in

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Cross project fault prediction (CPFP) is a challenging issue in the software fault prediction (SFP) domain due to different data distributions in source and target datasets. To resolve this issue, we have proposed an efficient and improved version of the existing regularized extreme learning machine (RELM), we call it as RELM Plus. The proposed RELM Plus model is further extended using the concept of matched metrics to predict the number of software faults on cross-project data, we call it as RELMP-MM model. The proposed RELMP-MM model selects the source dataset corresponding to the given target dataset based on the number of identical matched metrics. Then, the proposed model predicts the number of software faults on the given target dataset. In this paper, we have considered both within project fault prediction (WPFP) as well as CPFP. The proposed model is validated using twenty-five public datasets. The experimental results along with the statistical analysis show that the proposed RELMP-MM model performs significantly better as compared to existing state of the art models. It shows an improvement of at least 8% to 13% in terms of Average Absolute Error (AAE) and 7% to 12% in terms of Average Relative Error (ARE).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availibility statement

The datasets generated during or analysed during the current study are available in the [PROMISE] repository, [www.github.com/klainfo/DefectData].

Notes

  1. Ordinary Least Square.

  2. We have used samples and modules interchangeably in this paper. Both represent the same meaning.

References

  • Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735–739

    Article  Google Scholar 

  • Amasaki S (2020) Cross-version defect prediction: use historical data, cross-project data, or both? Empir Softw Eng 25(2):1573–1595

    Article  Google Scholar 

  • Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 2011 33rd international conference on software engineering (ICSE), IEEE, pp 1–10

  • Bal PR, Kumar S (2020) Wr-elm: weighted regularization extreme learning machine for imbalance learning in software fault prediction. IEEE Trans Reliab 69(4):1355–1375

    Article  Google Scholar 

  • Breheny P (2011) Ridge regression. University of Kentucky, Class Lecture. www.webasukyedu/statistics/users/pbreheny/764-F11/notes/9-1pdf

  • Chen J, Hu K, Yang Y, Liu Y, Xuan Q (2020) Collective transfer learning for defect prediction. Neurocomputing 416:103–116

    Article  Google Scholar 

  • Chen M, Ma Y (2015) An empirical study on predicting defect numbers. Int Conf Softw Eng Knowl Eng 15:397–402

    Article  Google Scholar 

  • Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, pp 389–395

  • Ding S, Zhao H, Zhang Y, Xu X, Nie R (2015) Extreme learning machine: algorithm, theory and applications. Artif Intell Rev 44(1):103–115

    Article  Google Scholar 

  • Du X, Zhou Z, Yin B, Xiao G (2020) Cross-project bug type prediction based on transfer learning. Softw Qual J 28(1):39–57

    Article  Google Scholar 

  • Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92

    Article  MathSciNet  MATH  Google Scholar 

  • Gao K, Khoshgoftaar TM (2007) A comprehensive empirical study of count models for software fault prediction. IEEE Trans Reliab 56(2):223–236

    Article  Google Scholar 

  • Golub GH, Reinsch C (1971) Singular value decomposition and least squares solutions. In: Linear algebra, Springer, pp 134–151

  • He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190

    Article  Google Scholar 

  • Hosseini S, Turhan B, Gunarathna D (2017) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147

    Article  Google Scholar 

  • Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  • Huang GB, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529

    Article  Google Scholar 

  • Jureczko M (2011) Significance of different software metrics in defect prediction. Softw Eng Int J 1(1):86–95

    Google Scholar 

  • Khoshgoftaar TM, Gao K (2007) Count models for software quality estimation. IEEE Trans Reliab 56(2):212–222

    Article  Google Scholar 

  • Kläs M, Elberzhager F, Münch J, Hartjes K, von Graevemeyer O (2010) Transparent combination of expert and measurement data for defect prediction: an industrial case study. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering-Volume 2, pp 119–128

  • Krasner H (2018) The cost of poor quality software in the us: a 2018 report. Consortium for IT Software Quality, Tech Rep, p 10

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  • Lewis C, Ou R (2011) Bug prediction at google. www.google-engtoolsblogspotin/2011/12/bug-prediction-at-goodlehtml

  • Li PL, Herbsleb J, Shaw M, Robinson B (2006) Experiences and results from initiating field defect prediction and product test prioritization efforts at abb inc. In: Proceedings of the 28th international conference on Software engineering, pp 413–422

  • Lilliefors HW (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 62(318):399–402

    Article  Google Scholar 

  • Limsettho N, Bennin KE, Keung JW, Hata H, Matsumoto K (2018) Cross project defect prediction using class distribution estimation and oversampling. Inf Softw Technol 100:87–102

    Article  Google Scholar 

  • MacDonell SG (1997) Establishing relationships between specification size and software process effort in case environments. Inf Softw Technol 39(1):35–45

    Article  Google Scholar 

  • Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78

    Article  MATH  Google Scholar 

  • Melekoodappattu JG, Subbian PS (2020) Automated breast cancer detection using hybrid extreme learning machine classifier. J Ambient Intell Hum Comput pp 1–10

  • Murugan R, Goel T (2021) E-diconet: extreme learning machine based classifier for diagnosis of covid-19 using deep convolutional network. J Ambient Intell Hum Comput 12(9):8887–8898

    Article  Google Scholar 

  • Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on Software engineering, pp 452–461

  • Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), IEEE, pp 382–391

  • Nam J, Fu W, Kim S, Menzies T, Tan L (2017) Heterogeneous defect prediction. IEEE Trans Software Eng 44(9):874–896

    Article  Google Scholar 

  • Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. ACM SIGSOFT Softw Eng Notes 29(4):86–96

    Article  Google Scholar 

  • Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  • Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pp 2–12

  • Rathore SS, Kumar S (2017) An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput 21(24):7417–7434

    Article  Google Scholar 

  • Rathore SS, Kumar S (2017) Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl-Based Syst 119:232–256

    Article  Google Scholar 

  • Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82:357–382

    Article  Google Scholar 

  • Rathore SS, Kumar S (2018) An approach for the prediction of number of software faults based on the dynamic selection of learning techniques. IEEE Trans Reliab 68(1):216–236

    Article  Google Scholar 

  • Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71

    Article  Google Scholar 

  • Selesnick I (2013) Least squares with examples in signal processing. Connexions 4

  • Sureka N, Gunaseelan K (2021) Investigations on detection and prevention of primary user emulation attack in cognitive radio networks using extreme machine learning algorithm. J Ambient Intell Hum Comput pp 1–10

  • Tantithamthavorn C (2015) An R package of defect prediction datasets for software engineering research. www.github.com/klainfo/DefectData

  • Torgo L, Ribeiro RP, Pfahringer B, Branco P (2013) Smote for regression. In: Proceedings of the 2013 Portuguese conference on artificial intelligence, Springer, pp 378–389

  • Weng F, Chen Y, Wang Z, Hou M, Luo J, Tian Z (2020) Gold price forecasting research based on an improved online extreme learning machine algorithm. J Ambient Intell Hum Comput 11(10):4101–4111

    Article  Google Scholar 

  • Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics, Springer, pp 196–202

  • Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200

    Article  Google Scholar 

  • Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE), IEEE, pp 309–320

  • Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 91–100

Download references

Acknowledgements

“Authors are thankful to the SERB, Government of India for project funding under VAJRA Scheme. We are thankful to the editor and anonymous reviewers for valuable feedback.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandeep Kumar.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

List of matched source and target datasets

The complete list of matched source and target datasets for the CPFP scenario as generated using Matched Metrics process under RELMP-MM is given below:

  1. 1.

    Jedit 4.1 - Ant 1.3

  2. 2.

    Jedit 4.1 - Ant 1.4

  3. 3.

    Xalan 2.4 - Ant 1.5

  4. 4.

    Jedit 4.1 - Ant 1.6

  5. 5.

    Jedit 4.1 - Ant 1.7

  6. 6.

    Ivy 1.4 - Camel 1.0

  7. 7.

    Ant 1.4 - Camel 1.2

  8. 8.

    Ivy 1.4 - Camel 1.4

  9. 9.

    Ivy 1.4 - Camel 1.6

  10. 10.

    Ant 1.6 - Jedit 4.0

  11. 11.

    Ant 1.6 - Jedit 4.1

  12. 12.

    Ant 1.6 - Jedit 4.2

  13. 13.

    Ant 1.4 - Jedit 4.3

  14. 14.

    Log4j 1.1 - Synapse 1.0

  15. 15.

    Ant 1.5 - Synapse 1.1

  16. 16.

    Jedit 4.0 - Synapse 1.2

  17. 17.

    Ant 1.5 - Xalan 2.4

  18. 18.

    Ant 1.5 - Xalan 2.5

  19. 19.

    Ant 1.4 - Xalan 2.6

  20. 20.

    Ant 1.4 - Ivy 1.4

  21. 21.

    Jedit 4.0 - Ivy 2.0

  22. 22.

    Ant 1.4 - Log4j 1.0

  23. 23.

    Ant 1.4 - Log4j 1.1

  24. 24.

    Ivy 1.4 - Xerces 1.2

  25. 25.

    Ivy 1.4 - Xerces 1.3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bal, P.R., Kumar, S. RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics. J Ambient Intell Human Comput 14, 13523–13542 (2023). https://doi.org/10.1007/s12652-022-03820-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-03820-1

Keywords

Navigation