DBDNN-Estimator: A Cross-Project Number of Fault Estimation Technique

Pandey, Sushant Kumar; Tripathi, Anil Kumar

doi:10.1007/s42979-023-02364-1

DBDNN-Estimator: A Cross-Project Number of Fault Estimation Technique

Original Research
Published: 20 November 2023

Volume 5, article number 29, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

54 Accesses
Explore all metrics

Abstract

Cross-project fault prediction (CPFP) uses data sets from projects to predict faulty/non-faulty modules. Cross-project fault number estimation (CPFNE) is one step ahead of CPFP, because it not only predicts faulty modules but also estimates the number of faults in that module. In this article, we proposed a new computational architecture using a deep belief network and deep neural network called DBDNN-Estimator for CPFNE. We investigated the effectiveness of our proposed approach on five projects and their respective versions from the PROMISE repository in our experiment and compared its performance over the existing eight benchmark approaches. We found that the proposed model required a few instances from the source project for optimal performance. Out of 23, we found that DBDNN-Estimator significantly outperforms in 19 and 14 data sets over baseline approaches in terms of mean absolute error (MAE) and mean squared error (MSE), respectively. The mean MAE and MSE produced by the proposed work are \(0.38\pm 0.023\) and \(2.29\pm 0.18\), respectively, which is minimum amongst benchmark techniques. We also found the Kendall and Fault Percentage Average (FPA) of the proposed model significantly better than baseline methods in 17 projects. We found the DBDNN-Estimator produces optimal results for small, moderate, and large-size software projects. The model is stable and tackles class imbalance and overfitting problems.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive software maintenance utilizing cross-project data

Article 23 June 2023

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Article 31 March 2022

Feature Engineering to Heterogeneous Cross Software Projects Defect Prediction: A Novel Framework

Article 02 November 2022

Data Availability and Access

The data sets used in our experiments can be found in PROMISE repository (http://promise.site.uottawa.ca/SERepository/) [4], and UCI repository (http://archive.ics.uci.edu/ml/index.php). We have also provided the codes along with data sets of our work at GitHub repository (https://github.com/sushantkumar007007/DBDNN-Estimator).

Notes

References

Pandey SK, Mishra RB, Tripathi AK. Machine learning based methods for software fault prediction: a survey. Expert Syst Appl. 2021;172: 114595.
Article Google Scholar
Pachouly J, Ahirrao S, Kotecha K, Selvachandran G, Abraham A. A systematic literature review on software defect prediction using artificial intelligence: datasets, data validation methods, approaches, and tools. Eng Appl Artif Intell. 2022;111: 104773.
Article Google Scholar
Catal C, Diri B. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci. 2009;179(8):1040–58.
Article Google Scholar
Sayyad Shirabad J, Menzies T. The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada 2005. http://promise.site.uottawa.ca/SERepository
Nam J, Pan SJ, Kim S. Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE) 2013; p. 382–91.
He Z, Shu F, Yang Y, Li M, Wang Q. An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng. 2012;19(2):167–99.
Article Google Scholar
Pandey SK, Tripathi AK. Bcv-predictor: a bug count vector predictor of a successive version of the software system. Knowl-Based Syst. 2020;105924.
Rathore SS, Kumar S. Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl-Based Syst. 2017;119:232–56.
Article Google Scholar
Santosh Singh R, Sandeep K. Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl. 2017;82:357–82.
Article Google Scholar
Pandey SK, Tripathi AK. Dnnattention: a deep neural network and attention based architecture for cross project defect number prediction. Knowl-Based Syst. 2021;107541.
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.
Article Google Scholar
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT 2009; p. 91–100.
Craig RD, Jaskiel SP. Systematic software testing. Artech House; 2002.
Pandey SK, Rathee D, Tripathi AK. Software defect prediction using k-pca and various kernel-based extreme learning machine: an empirical study. IET Software. 2020;14(7):768–82.
Article Google Scholar
Pandey SK, Tripathi AK. In: 2021 8th International Conference on Smart Computing and Communications (ICSCC) (IEEE), 2021; p. 58–63.
Cartwright M, Shepperd M. An empirical investigation of an object-oriented software system. IEEE Trans Software Eng. 2000;26(8):786–96.
Article Google Scholar
Abdi L, Hashemi S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng. 2015;28(1):238–51.
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
MathSciNet MATH Google Scholar
Pandey SK, Mishra RB, Tripathi AK. Bpdet: an effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl. 2020;144: 113085.
Article Google Scholar
Wang S, Liu T, Nam J, Tan L. Deep semantic feature learning for software defect prediction. IEEE Trans Softw Eng. 2018;46(12):1267–93.
Article Google Scholar
Li J, He P, Zhu J, Lyu MR. Software defect prediction via convolutional neural network. In: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) 2017; p. 318–28.
Chen D, Chen X, Li H, Xie J, Mu Y. Deepcpdp: deep learning based cross-project defect prediction. IEEE Access. 2019;7:184832–48.
Article Google Scholar
Chen X, Zhang D, Zhao Y, Cui Z, Ni C. Software defect number prediction: unsupervised vs supervised methods. Inf Softw Technol. 2019;106:161–81.
Article Google Scholar
Le Roux N, Bengio Y. Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput. 2008;20(6):1631–49.
Article MathSciNet MATH Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Article Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Shepperd M, Song Q, Sun Z, Mair C. Data quality: some comments on the Nasa software defect datasets. IEEE Trans Softw Eng. 2013;39(9):1208–15.
Article Google Scholar
Neal RM. Connectionist learning of belief networks. Artif Intell. 1992;56(1):71–113.
Article MathSciNet MATH Google Scholar
Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.
Article MathSciNet MATH Google Scholar
Smolensky P. Information processing in dynamical systems: foundations of harmony theory. Tech. rep., Colorado Univ at Boulder Dept of Computer Science 1986
Welling M, Rosen-Zvi M, Hinton GE. Exponential family harmoniums with an application to information retrieval. Adv Neural Inf Process Syst. 2005; 1481–8.
Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81.
Article Google Scholar
Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst. 1998;6(02):107–16.
Article MATH Google Scholar
Pascanu R, Mikolov T, Bengio Y. Understanding the exploding gradient problem. arXiv:1211.5063 2012;2
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning 2007; p. 791–8.
Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recogn. 2005;38(12):2270–85.
Article Google Scholar
Eesa AS, Arabo WK. A normalization methods for backpropagation: a comparative study. Sci J Univ Zakho. 2017;5(4):319–23.
Article Google Scholar
Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intell Data Anal. 2002;6(5):429–49.
Article MATH Google Scholar
Malhotra R. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput. 2015;27:504–18.
Article Google Scholar
Pandey SK, Tripathi AK. An empirical study toward dealing with noise and class imbalance issues in software defect prediction. Soft Comput. 2021;25(21):13465–92.
Article Google Scholar
Charte F, Rivera AJ, del Jesus MJ, Herrera F. Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing. 2015;163:3–16.
Article Google Scholar
Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng. 2009;14(5):540–78.
Article Google Scholar
Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell. 2006;28(4):594–611.
Article Google Scholar
Shah C, Pomerantz J. Evaluating and predicting answer quality in community qa. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval 2010; p. 411–8.
Kendall MG. A new measure of rank correlation. Biometrika. 1938;30(1/2):81–93.
Article MATH Google Scholar
Yu X, Liu J, Yang Z, Jia X, Ling Q, Ye S. Learning from imbalanced data for predicting the number of software defects. In: 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE) 2017; p. 78–89.
Weyuker EJ, Ostrand TJ, Bell RM. Comparing the effectiveness of several modeling methods for fault prediction. Empir Softw Eng. 2010;15(3):277–95.
Article Google Scholar
Ng AY. In Proceedings of the twenty-first international conference on Machine learning 2004; 78.
Rathore SS, Kumar S. An approach for the prediction of number of software faults based on the dynamic selection of learning techniques. IEEE Trans Reliab. 2018;68(1):216–36.
Article Google Scholar
Catal C. Software fault prediction: a literature review and current trends. Expert Syst Appl. 2011;38(4):4626–36.
Article Google Scholar
Garner SR, et al. Weka: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference 1995l p. 57–64.
Woolson R. Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials 2007; p. 1–3.
Cliff N. Ordinal methods for behavioral data analysis. Psychology Press; 2014.
Book Google Scholar
Abdi H. Bonferroni and šidák corrections for multiple comparisons. Encyclopedia Measure Stat. 2007;3:103–7.
Google Scholar
Rotman M, Wolf L. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021;35:9428–35.
Sommerville I. Software engineering 9th edition. ISBN-10 2011;137035152:18
Gonzalez J, Yu W. Non-linear system modeling using lstm neural networks. IFAC-PapersOnLine. 2018;51(13):485–9.
Article Google Scholar
Hosseini S, Turhan B, Gunarathna D. A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng. 2017;45(2):111–47.
Article Google Scholar
Herbold S, Trautsch A, Grabowski J. A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Software Eng. 2017;44(9):811–33.
Article Google Scholar
Ni C, Xia X, Lo D, Chen X, Gu Q. Revisiting supervised and unsupervised methods for effort-aware cross-project defect prediction. IEEE Trans Softw Eng. 2020.
Bangash AA, Sahar H, Hindle A, Ali K. On the time-based conclusion stability of cross-project defect prediction models. Empir Softw Eng. 2020;25(6):5047–83.
Article Google Scholar
Hosseini S, Turhan B, Mäntylä M. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol. 2018;95:296–312.
Article Google Scholar
Tabassum S, Minku LL, Feng D, Cabral GG, Song L. An investigation of cross-project learning in online just-in-time software defect prediction. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) 2020; p. 554–65.
Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross-company software defect prediction. Inf Softw Technol. 2012;54(3):248–56.
Article Google Scholar
Liu C, Yang D, Xia X, Yan M, Zhang X. A two-phase transfer learning model for cross-project defect prediction. Inf Softw Technol. 2019;107:125–36.
Article Google Scholar
Herbold S, Trautsch A, Grabowski J. In Proceedings of the 40th International Conference on Software Engineering 2018; p. 063.
Li K, Xiang Z, Chen T, Tan KC. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) (IEEE), 2020; p. 573–84.
Jin C. Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Syst Appl. 2021;171: 114637.
Article Google Scholar
Sun Z, Li J, Sun H, He L. Cfps: Collaborative filtering based source projects selection for cross-project defect prediction. Appl Soft Comput. 2021;99: 106940.
Article Google Scholar
Amasaki S, Aman H, Yokogawa T. An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation. Empir Softw Eng. 2022;27(2):1–29.
Article Google Scholar
Bal PR, Kumar S. Wr-elm: Weighted regularization extreme learning machine for imbalance learning in software fault prediction. IEEE Trans Reliab. 2020;69(4):1355–75.
Article Google Scholar
Panichella A, Oliveto R, De Lucia A. Cross-project defect prediction models: L’union fait la force. In: 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE) 2014; p. 164–73.
Xia X, Lo D, Pan SJ, Nagappan N, Wang X. Hydra: Massively compositional model for cross-project defect prediction. IEEE Trans Software Eng. 2016;42(10):977–98.
Article Google Scholar
Nevendra M, Singh P. Defect count prediction via metric-based convolutional neural network. Neural Comput Appl. 2021;1–26.
Bai CG, Cai KY, Hu QP, Ng SH. On the trend of remaining software defect estimation. IEEE Trans Syst Man Cybern-Part A. 2008;38(5):1129–42.
Article Google Scholar
Huang Q, Ni C, Chen X, Gu Q, Cao K. Multi-project regression based approach for software defect number prediction. SEKE. 2019; 425–546.
Rathore SS, Kumar S. An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput. 2017;21(24):7417–34.
Article Google Scholar
Kumar C, Yadav DK. Software defects estimation using metrics of early phases of software development life cycle. Int J Syst Assur Eng Manage. 2017;8(4):2109–17.
Article MathSciNet Google Scholar
Bernstein A, Ekanayake J, Pinzger M. Improving defect prediction using temporal features and non linear models. Ninth international workshop on Principles of software evolution: In conjunction with the 6th ESEC/FSE joint meeting 2007; p. 11–8
D’Ambros M, Lanza M, Robbes R. An extensive comparison of bug prediction approaches. 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) 2010; p. 31–41.
Jiang T, Tan L, Kim S. Personalized defect prediction. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2013; p. 279–89.
Predicting defects using network analysis on dependency graphs. Zimmermann, Thomas and Nagappan, Nachiappan 2008; 531–40.
Koru AG, El Emam K, Zhang D, Liu H, Mathew D. Theory of relative defect proneness. Empir Softw Eng. 2008;13(5):473.
Article Google Scholar
Bettenburg N, Nagappan M, Hassan AE. Think locally, act globally: improving defect and effort prediction models. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR) 2012; p. 60–9.
Kim S, Whitehead EJ, Zhang Y. Classifying software changes: clean or buggy? IEEE Trans Softw Eng. 2008;34(2):181–96.
Article Google Scholar
Zhiyi H, Haidong S, Lin J, Junsheng C, Yu Y. Transfer fault diagnosis of bearing installed in different machines using enhanced deep auto-encoder. Measurement. 2020;152: 107393.
Article Google Scholar
Xiao Y, Shao H, Han S, Huo Z, Wan J. Novel joint transfer network for unsupervised bearing fault diagnosis from simulation domain to experimental domain. IEEE/ASME Trans Mechatron. 2022.
Liu Y, Khoshgoftaar TM, Seliya N. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng. 2010;36(6):852–64.
Article Google Scholar
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth international conference on software testing, verification and validation 2013; p. 252–61.
Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui F, Sun Y. Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab. 2018;67(2):581–97.
Article Google Scholar
Shao H, Jiang H, Li X, Liang T. Rolling bearing fault detection using continuous deep belief network with locally linear embedding. Comput Ind. 2018;96:27–39.
Article Google Scholar
Hua W, Chun S, Changzhen H, ZHANG Y, Xiao Y, et al. Software defect prediction via deep belief network. Chin J Electron. 2019;28(5):925–32.
Article Google Scholar
Chen Y, Zhao X, Jia X. Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Select Top Appl Earth Observ Remote Sens. 2015;8(6):2381–92.
Article Google Scholar
Sun X, Li T, Li Q, Huang Y, Li Y. Deep belief echo-state network and its application to time series prediction. Knowl-Based Syst. 2017;130:17–29.
Article Google Scholar
Zhao Z, Jiao L, Zhao J, Gu J, Zhao J. Discriminant deep belief network for high-resolution sar image classification. Pattern Recogn. 2017;61:686–701.
Article Google Scholar
O’Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M. Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci. 2013;7:178.
Google Scholar
Deng L, Yu D, Dahl GE. Deep belief network for large vocabulary continuous speech recognition (2015). US Patent 8,972,253
Mohamed A, Dahl G, Hinton G. Deep belief networks for phone recognition. Nips workshop on deep learning for speech recognition and related applications. 2009;1(9):39.
Nayak SK, Ojha AC. Data leakage detection and prevention: Review and research directions. Mach Learn Inf Proc. 2020;203–12.

Download references

Acknowledgements

This work is supported by the IIT(BHU), India. The authors also like to thank Professor David Lo from SMU Singapore for encouraging me for this work. His valuable suggestions and critical comments increased the quality of the manuscript.

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Indian Institute of Technology (BHU), Vararanasi, 221001, India
Sushant Kumar Pandey & Anil Kumar Tripathi

Authors

Sushant Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Tripathi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SKP (corresponding author): conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review & editing, and visualization. AKT: supervision, writing—review & editing, resources, formal analysis, and methodology.

Corresponding author

Correspondence to Sushant Kumar Pandey.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pandey, S.K., Tripathi, A.K. DBDNN-Estimator: A Cross-Project Number of Fault Estimation Technique. SN COMPUT. SCI. 5, 29 (2024). https://doi.org/10.1007/s42979-023-02364-1

Download citation

Received: 03 June 2023
Accepted: 25 September 2023
Published: 20 November 2023
DOI: https://doi.org/10.1007/s42979-023-02364-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DBDNN-Estimator: A Cross-Project Number of Fault Estimation Technique