Skip to main content
Log in

DBDNN-Estimator: A Cross-Project Number of Fault Estimation Technique

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Cross-project fault prediction (CPFP) uses data sets from projects to predict faulty/non-faulty modules. Cross-project fault number estimation (CPFNE) is one step ahead of CPFP, because it not only predicts faulty modules but also estimates the number of faults in that module. In this article, we proposed a new computational architecture using a deep belief network and deep neural network called DBDNN-Estimator for CPFNE. We investigated the effectiveness of our proposed approach on five projects and their respective versions from the PROMISE repository in our experiment and compared its performance over the existing eight benchmark approaches. We found that the proposed model required a few instances from the source project for optimal performance. Out of 23, we found that DBDNN-Estimator significantly outperforms in 19 and 14 data sets over baseline approaches in terms of mean absolute error (MAE) and mean squared error (MSE), respectively. The mean MAE and MSE produced by the proposed work are \(0.38\pm 0.023\) and \(2.29\pm 0.18\), respectively, which is minimum amongst benchmark techniques. We also found the Kendall and Fault Percentage Average (FPA) of the proposed model significantly better than baseline methods in 17 projects. We found the DBDNN-Estimator produces optimal results for small, moderate, and large-size software projects. The model is stable and tackles class imbalance and overfitting problems.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability and Access

The data sets used in our experiments can be found in PROMISE repository (http://promise.site.uottawa.ca/SERepository/) [4], and UCI repository (http://archive.ics.uci.edu/ml/index.php). We have also provided the codes along with data sets of our work at GitHub repository (https://github.com/sushantkumar007007/DBDNN-Estimator).

Notes

  1. https://colab.research.google.com/.

  2. https://github.com/sushantkumar007007/DBDNN-Estimator.

References

  1. Pandey SK, Mishra RB, Tripathi AK. Machine learning based methods for software fault prediction: a survey. Expert Syst Appl. 2021;172: 114595.

    Article  Google Scholar 

  2. Pachouly J, Ahirrao S, Kotecha K, Selvachandran G, Abraham A. A systematic literature review on software defect prediction using artificial intelligence: datasets, data validation methods, approaches, and tools. Eng Appl Artif Intell. 2022;111: 104773.

    Article  Google Scholar 

  3. Catal C, Diri B. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci. 2009;179(8):1040–58.

    Article  Google Scholar 

  4. Sayyad Shirabad J, Menzies T. The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada 2005. http://promise.site.uottawa.ca/SERepository

  5. Nam J, Pan SJ, Kim S. Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE) 2013; p. 382–91.

  6. He Z, Shu F, Yang Y, Li M, Wang Q. An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng. 2012;19(2):167–99.

    Article  Google Scholar 

  7. Pandey SK, Tripathi AK. Bcv-predictor: a bug count vector predictor of a successive version of the software system. Knowl-Based Syst. 2020;105924.

  8. Rathore SS, Kumar S. Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl-Based Syst. 2017;119:232–56.

    Article  Google Scholar 

  9. Santosh Singh R, Sandeep K. Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl. 2017;82:357–82.

    Article  Google Scholar 

  10. Pandey SK, Tripathi AK. Dnnattention: a deep neural network and attention based architecture for cross project defect number prediction. Knowl-Based Syst. 2021;107541.

  11. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.

    Article  Google Scholar 

  12. Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT 2009; p. 91–100.

  13. Craig RD, Jaskiel SP. Systematic software testing. Artech House; 2002.

  14. Pandey SK, Rathee D, Tripathi AK. Software defect prediction using k-pca and various kernel-based extreme learning machine: an empirical study. IET Software. 2020;14(7):768–82.

    Article  Google Scholar 

  15. Pandey SK, Tripathi AK. In: 2021 8th International Conference on Smart Computing and Communications (ICSCC) (IEEE), 2021; p. 58–63.

  16. Cartwright M, Shepperd M. An empirical investigation of an object-oriented software system. IEEE Trans Software Eng. 2000;26(8):786–96.

    Article  Google Scholar 

  17. Abdi L, Hashemi S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng. 2015;28(1):238–51.

    Article  Google Scholar 

  18. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.

    MathSciNet  MATH  Google Scholar 

  19. Pandey SK, Mishra RB, Tripathi AK. Bpdet: an effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl. 2020;144: 113085.

    Article  Google Scholar 

  20. Wang S, Liu T, Nam J, Tan L. Deep semantic feature learning for software defect prediction. IEEE Trans Softw Eng. 2018;46(12):1267–93.

    Article  Google Scholar 

  21. Li J, He P, Zhu J, Lyu MR. Software defect prediction via convolutional neural network. In: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) 2017; p. 318–28.

  22. Chen D, Chen X, Li H, Xie J, Mu Y. Deepcpdp: deep learning based cross-project defect prediction. IEEE Access. 2019;7:184832–48.

    Article  Google Scholar 

  23. Chen X, Zhang D, Zhao Y, Cui Z, Ni C. Software defect number prediction: unsupervised vs supervised methods. Inf Softw Technol. 2019;106:161–81.

    Article  Google Scholar 

  24. Le Roux N, Bengio Y. Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput. 2008;20(6):1631–49.

    Article  MathSciNet  MATH  Google Scholar 

  25. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

    Article  Google Scholar 

  26. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  27. Shepperd M, Song Q, Sun Z, Mair C. Data quality: some comments on the Nasa software defect datasets. IEEE Trans Softw Eng. 2013;39(9):1208–15.

    Article  Google Scholar 

  28. Neal RM. Connectionist learning of belief networks. Artif Intell. 1992;56(1):71–113.

    Article  MathSciNet  MATH  Google Scholar 

  29. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.

    Article  MathSciNet  MATH  Google Scholar 

  30. Smolensky P. Information processing in dynamical systems: foundations of harmony theory. Tech. rep., Colorado Univ at Boulder Dept of Computer Science 1986

  31. Welling M, Rosen-Zvi M, Hinton GE. Exponential family harmoniums with an application to information retrieval. Adv Neural Inf Process Syst. 2005; 1481–8.

  32. Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81.

    Article  Google Scholar 

  33. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst. 1998;6(02):107–16.

    Article  MATH  Google Scholar 

  34. Pascanu R, Mikolov T, Bengio Y. Understanding the exploding gradient problem. arXiv:1211.5063 2012;2

  35. Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning 2007; p. 791–8.

  36. Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recogn. 2005;38(12):2270–85.

    Article  Google Scholar 

  37. Eesa AS, Arabo WK. A normalization methods for backpropagation: a comparative study. Sci J Univ Zakho. 2017;5(4):319–23.

    Article  Google Scholar 

  38. Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intell Data Anal. 2002;6(5):429–49.

    Article  MATH  Google Scholar 

  39. Malhotra R. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput. 2015;27:504–18.

    Article  Google Scholar 

  40. Pandey SK, Tripathi AK. An empirical study toward dealing with noise and class imbalance issues in software defect prediction. Soft Comput. 2021;25(21):13465–92.

    Article  Google Scholar 

  41. Charte F, Rivera AJ, del Jesus MJ, Herrera F. Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing. 2015;163:3–16.

    Article  Google Scholar 

  42. Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng. 2009;14(5):540–78.

    Article  Google Scholar 

  43. Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell. 2006;28(4):594–611.

    Article  Google Scholar 

  44. Shah C, Pomerantz J. Evaluating and predicting answer quality in community qa. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval 2010; p. 411–8.

  45. Kendall MG. A new measure of rank correlation. Biometrika. 1938;30(1/2):81–93.

    Article  MATH  Google Scholar 

  46. Yu X, Liu J, Yang Z, Jia X, Ling Q, Ye S. Learning from imbalanced data for predicting the number of software defects. In: 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE) 2017; p. 78–89.

  47. Weyuker EJ, Ostrand TJ, Bell RM. Comparing the effectiveness of several modeling methods for fault prediction. Empir Softw Eng. 2010;15(3):277–95.

    Article  Google Scholar 

  48. Ng AY. In Proceedings of the twenty-first international conference on Machine learning 2004; 78.

  49. Rathore SS, Kumar S. An approach for the prediction of number of software faults based on the dynamic selection of learning techniques. IEEE Trans Reliab. 2018;68(1):216–36.

    Article  Google Scholar 

  50. Catal C. Software fault prediction: a literature review and current trends. Expert Syst Appl. 2011;38(4):4626–36.

    Article  Google Scholar 

  51. Garner SR, et al. Weka: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference 1995l p. 57–64.

  52. Woolson R. Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials 2007; p. 1–3.

  53. Cliff N. Ordinal methods for behavioral data analysis. Psychology Press; 2014.

    Book  Google Scholar 

  54. Abdi H. Bonferroni and šidák corrections for multiple comparisons. Encyclopedia Measure Stat. 2007;3:103–7.

    Google Scholar 

  55. Rotman M, Wolf L. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021;35:9428–35.

  56. Sommerville I. Software engineering 9th edition. ISBN-10 2011;137035152:18

  57. Gonzalez J, Yu W. Non-linear system modeling using lstm neural networks. IFAC-PapersOnLine. 2018;51(13):485–9.

    Article  Google Scholar 

  58. Hosseini S, Turhan B, Gunarathna D. A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng. 2017;45(2):111–47.

    Article  Google Scholar 

  59. Herbold S, Trautsch A, Grabowski J. A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Software Eng. 2017;44(9):811–33.

    Article  Google Scholar 

  60. Ni C, Xia X, Lo D, Chen X, Gu Q. Revisiting supervised and unsupervised methods for effort-aware cross-project defect prediction. IEEE Trans Softw Eng. 2020.

  61. Bangash AA, Sahar H, Hindle A, Ali K. On the time-based conclusion stability of cross-project defect prediction models. Empir Softw Eng. 2020;25(6):5047–83.

    Article  Google Scholar 

  62. Hosseini S, Turhan B, Mäntylä M. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol. 2018;95:296–312.

    Article  Google Scholar 

  63. Tabassum S, Minku LL, Feng D, Cabral GG, Song L. An investigation of cross-project learning in online just-in-time software defect prediction. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) 2020; p. 554–65.

  64. Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross-company software defect prediction. Inf Softw Technol. 2012;54(3):248–56.

    Article  Google Scholar 

  65. Liu C, Yang D, Xia X, Yan M, Zhang X. A two-phase transfer learning model for cross-project defect prediction. Inf Softw Technol. 2019;107:125–36.

    Article  Google Scholar 

  66. Herbold S, Trautsch A, Grabowski J. In Proceedings of the 40th International Conference on Software Engineering 2018; p. 063.

  67. Li K, Xiang Z, Chen T, Tan KC. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) (IEEE), 2020; p. 573–84.

  68. Jin C. Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Syst Appl. 2021;171: 114637.

    Article  Google Scholar 

  69. Sun Z, Li J, Sun H, He L. Cfps: Collaborative filtering based source projects selection for cross-project defect prediction. Appl Soft Comput. 2021;99: 106940.

    Article  Google Scholar 

  70. Amasaki S, Aman H, Yokogawa T. An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation. Empir Softw Eng. 2022;27(2):1–29.

    Article  Google Scholar 

  71. Bal PR, Kumar S. Wr-elm: Weighted regularization extreme learning machine for imbalance learning in software fault prediction. IEEE Trans Reliab. 2020;69(4):1355–75.

    Article  Google Scholar 

  72. Panichella A, Oliveto R, De Lucia A. Cross-project defect prediction models: L’union fait la force. In: 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE) 2014; p. 164–73.

  73. Xia X, Lo D, Pan SJ, Nagappan N, Wang X. Hydra: Massively compositional model for cross-project defect prediction. IEEE Trans Software Eng. 2016;42(10):977–98.

    Article  Google Scholar 

  74. Nevendra M, Singh P. Defect count prediction via metric-based convolutional neural network. Neural Comput Appl. 2021;1–26.

  75. Bai CG, Cai KY, Hu QP, Ng SH. On the trend of remaining software defect estimation. IEEE Trans Syst Man Cybern-Part A. 2008;38(5):1129–42.

    Article  Google Scholar 

  76. Huang Q, Ni C, Chen X, Gu Q, Cao K. Multi-project regression based approach for software defect number prediction. SEKE. 2019; 425–546.

  77. Rathore SS, Kumar S. An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput. 2017;21(24):7417–34.

    Article  Google Scholar 

  78. Kumar C, Yadav DK. Software defects estimation using metrics of early phases of software development life cycle. Int J Syst Assur Eng Manage. 2017;8(4):2109–17.

    Article  MathSciNet  Google Scholar 

  79. Bernstein A, Ekanayake J, Pinzger M. Improving defect prediction using temporal features and non linear models. Ninth international workshop on Principles of software evolution: In conjunction with the 6th ESEC/FSE joint meeting 2007; p. 11–8

  80. D’Ambros M, Lanza M, Robbes R. An extensive comparison of bug prediction approaches. 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) 2010; p. 31–41.

  81. Jiang T, Tan L, Kim S. Personalized defect prediction. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2013; p. 279–89.

  82. Predicting defects using network analysis on dependency graphs. Zimmermann, Thomas and Nagappan, Nachiappan 2008; 531–40.

  83. Koru AG, El Emam K, Zhang D, Liu H, Mathew D. Theory of relative defect proneness. Empir Softw Eng. 2008;13(5):473.

    Article  Google Scholar 

  84. Bettenburg N, Nagappan M, Hassan AE. Think locally, act globally: improving defect and effort prediction models. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR) 2012; p. 60–9.

  85. Kim S, Whitehead EJ, Zhang Y. Classifying software changes: clean or buggy? IEEE Trans Softw Eng. 2008;34(2):181–96.

    Article  Google Scholar 

  86. Zhiyi H, Haidong S, Lin J, Junsheng C, Yu Y. Transfer fault diagnosis of bearing installed in different machines using enhanced deep auto-encoder. Measurement. 2020;152: 107393.

    Article  Google Scholar 

  87. Xiao Y, Shao H, Han S, Huo Z, Wan J. Novel joint transfer network for unsupervised bearing fault diagnosis from simulation domain to experimental domain. IEEE/ASME Trans Mechatron. 2022.

  88. Liu Y, Khoshgoftaar TM, Seliya N. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng. 2010;36(6):852–64.

    Article  Google Scholar 

  89. Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth international conference on software testing, verification and validation 2013; p. 252–61.

  90. Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui F, Sun Y. Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab. 2018;67(2):581–97.

    Article  Google Scholar 

  91. Shao H, Jiang H, Li X, Liang T. Rolling bearing fault detection using continuous deep belief network with locally linear embedding. Comput Ind. 2018;96:27–39.

    Article  Google Scholar 

  92. Hua W, Chun S, Changzhen H, ZHANG Y, Xiao Y, et al. Software defect prediction via deep belief network. Chin J Electron. 2019;28(5):925–32.

    Article  Google Scholar 

  93. Chen Y, Zhao X, Jia X. Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Select Top Appl Earth Observ Remote Sens. 2015;8(6):2381–92.

    Article  Google Scholar 

  94. Sun X, Li T, Li Q, Huang Y, Li Y. Deep belief echo-state network and its application to time series prediction. Knowl-Based Syst. 2017;130:17–29.

    Article  Google Scholar 

  95. Zhao Z, Jiao L, Zhao J, Gu J, Zhao J. Discriminant deep belief network for high-resolution sar image classification. Pattern Recogn. 2017;61:686–701.

    Article  Google Scholar 

  96. O’Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M. Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci. 2013;7:178.

    Google Scholar 

  97. Deng L, Yu D, Dahl GE. Deep belief network for large vocabulary continuous speech recognition (2015). US Patent 8,972,253

  98. Mohamed A, Dahl G, Hinton G. Deep belief networks for phone recognition. Nips workshop on deep learning for speech recognition and related applications. 2009;1(9):39.

  99. Nayak SK, Ojha AC. Data leakage detection and prevention: Review and research directions. Mach Learn Inf Proc. 2020;203–12.

Download references

Acknowledgements

This work is supported by the IIT(BHU), India. The authors also like to thank Professor David Lo from SMU Singapore for encouraging me for this work. His valuable suggestions and critical comments increased the quality of the manuscript.

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

SKP (corresponding author): conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review & editing, and visualization. AKT: supervision, writing—review & editing, resources, formal analysis, and methodology.

Corresponding author

Correspondence to Sushant Kumar Pandey.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pandey, S.K., Tripathi, A.K. DBDNN-Estimator: A Cross-Project Number of Fault Estimation Technique. SN COMPUT. SCI. 5, 29 (2024). https://doi.org/10.1007/s42979-023-02364-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02364-1

Keywords

Navigation