Abstract
Cross-project fault prediction (CPFP) uses data sets from projects to predict faulty/non-faulty modules. Cross-project fault number estimation (CPFNE) is one step ahead of CPFP, because it not only predicts faulty modules but also estimates the number of faults in that module. In this article, we proposed a new computational architecture using a deep belief network and deep neural network called DBDNN-Estimator for CPFNE. We investigated the effectiveness of our proposed approach on five projects and their respective versions from the PROMISE repository in our experiment and compared its performance over the existing eight benchmark approaches. We found that the proposed model required a few instances from the source project for optimal performance. Out of 23, we found that DBDNN-Estimator significantly outperforms in 19 and 14 data sets over baseline approaches in terms of mean absolute error (MAE) and mean squared error (MSE), respectively. The mean MAE and MSE produced by the proposed work are \(0.38\pm 0.023\) and \(2.29\pm 0.18\), respectively, which is minimum amongst benchmark techniques. We also found the Kendall and Fault Percentage Average (FPA) of the proposed model significantly better than baseline methods in 17 projects. We found the DBDNN-Estimator produces optimal results for small, moderate, and large-size software projects. The model is stable and tackles class imbalance and overfitting problems.
Graphical Abstract
Similar content being viewed by others
Data Availability and Access
The data sets used in our experiments can be found in PROMISE repository (http://promise.site.uottawa.ca/SERepository/) [4], and UCI repository (http://archive.ics.uci.edu/ml/index.php). We have also provided the codes along with data sets of our work at GitHub repository (https://github.com/sushantkumar007007/DBDNN-Estimator).
References
Pandey SK, Mishra RB, Tripathi AK. Machine learning based methods for software fault prediction: a survey. Expert Syst Appl. 2021;172: 114595.
Pachouly J, Ahirrao S, Kotecha K, Selvachandran G, Abraham A. A systematic literature review on software defect prediction using artificial intelligence: datasets, data validation methods, approaches, and tools. Eng Appl Artif Intell. 2022;111: 104773.
Catal C, Diri B. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci. 2009;179(8):1040–58.
Sayyad Shirabad J, Menzies T. The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada 2005. http://promise.site.uottawa.ca/SERepository
Nam J, Pan SJ, Kim S. Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE) 2013; p. 382–91.
He Z, Shu F, Yang Y, Li M, Wang Q. An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng. 2012;19(2):167–99.
Pandey SK, Tripathi AK. Bcv-predictor: a bug count vector predictor of a successive version of the software system. Knowl-Based Syst. 2020;105924.
Rathore SS, Kumar S. Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl-Based Syst. 2017;119:232–56.
Santosh Singh R, Sandeep K. Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl. 2017;82:357–82.
Pandey SK, Tripathi AK. Dnnattention: a deep neural network and attention based architecture for cross project defect number prediction. Knowl-Based Syst. 2021;107541.
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT 2009; p. 91–100.
Craig RD, Jaskiel SP. Systematic software testing. Artech House; 2002.
Pandey SK, Rathee D, Tripathi AK. Software defect prediction using k-pca and various kernel-based extreme learning machine: an empirical study. IET Software. 2020;14(7):768–82.
Pandey SK, Tripathi AK. In: 2021 8th International Conference on Smart Computing and Communications (ICSCC) (IEEE), 2021; p. 58–63.
Cartwright M, Shepperd M. An empirical investigation of an object-oriented software system. IEEE Trans Software Eng. 2000;26(8):786–96.
Abdi L, Hashemi S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng. 2015;28(1):238–51.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
Pandey SK, Mishra RB, Tripathi AK. Bpdet: an effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl. 2020;144: 113085.
Wang S, Liu T, Nam J, Tan L. Deep semantic feature learning for software defect prediction. IEEE Trans Softw Eng. 2018;46(12):1267–93.
Li J, He P, Zhu J, Lyu MR. Software defect prediction via convolutional neural network. In: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) 2017; p. 318–28.
Chen D, Chen X, Li H, Xie J, Mu Y. Deepcpdp: deep learning based cross-project defect prediction. IEEE Access. 2019;7:184832–48.
Chen X, Zhang D, Zhao Y, Cui Z, Ni C. Software defect number prediction: unsupervised vs supervised methods. Inf Softw Technol. 2019;106:161–81.
Le Roux N, Bengio Y. Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput. 2008;20(6):1631–49.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Shepperd M, Song Q, Sun Z, Mair C. Data quality: some comments on the Nasa software defect datasets. IEEE Trans Softw Eng. 2013;39(9):1208–15.
Neal RM. Connectionist learning of belief networks. Artif Intell. 1992;56(1):71–113.
Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.
Smolensky P. Information processing in dynamical systems: foundations of harmony theory. Tech. rep., Colorado Univ at Boulder Dept of Computer Science 1986
Welling M, Rosen-Zvi M, Hinton GE. Exponential family harmoniums with an application to information retrieval. Adv Neural Inf Process Syst. 2005; 1481–8.
Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81.
Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst. 1998;6(02):107–16.
Pascanu R, Mikolov T, Bengio Y. Understanding the exploding gradient problem. arXiv:1211.5063 2012;2
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning 2007; p. 791–8.
Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recogn. 2005;38(12):2270–85.
Eesa AS, Arabo WK. A normalization methods for backpropagation: a comparative study. Sci J Univ Zakho. 2017;5(4):319–23.
Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intell Data Anal. 2002;6(5):429–49.
Malhotra R. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput. 2015;27:504–18.
Pandey SK, Tripathi AK. An empirical study toward dealing with noise and class imbalance issues in software defect prediction. Soft Comput. 2021;25(21):13465–92.
Charte F, Rivera AJ, del Jesus MJ, Herrera F. Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing. 2015;163:3–16.
Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng. 2009;14(5):540–78.
Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell. 2006;28(4):594–611.
Shah C, Pomerantz J. Evaluating and predicting answer quality in community qa. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval 2010; p. 411–8.
Kendall MG. A new measure of rank correlation. Biometrika. 1938;30(1/2):81–93.
Yu X, Liu J, Yang Z, Jia X, Ling Q, Ye S. Learning from imbalanced data for predicting the number of software defects. In: 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE) 2017; p. 78–89.
Weyuker EJ, Ostrand TJ, Bell RM. Comparing the effectiveness of several modeling methods for fault prediction. Empir Softw Eng. 2010;15(3):277–95.
Ng AY. In Proceedings of the twenty-first international conference on Machine learning 2004; 78.
Rathore SS, Kumar S. An approach for the prediction of number of software faults based on the dynamic selection of learning techniques. IEEE Trans Reliab. 2018;68(1):216–36.
Catal C. Software fault prediction: a literature review and current trends. Expert Syst Appl. 2011;38(4):4626–36.
Garner SR, et al. Weka: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference 1995l p. 57–64.
Woolson R. Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials 2007; p. 1–3.
Cliff N. Ordinal methods for behavioral data analysis. Psychology Press; 2014.
Abdi H. Bonferroni and šidák corrections for multiple comparisons. Encyclopedia Measure Stat. 2007;3:103–7.
Rotman M, Wolf L. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021;35:9428–35.
Sommerville I. Software engineering 9th edition. ISBN-10 2011;137035152:18
Gonzalez J, Yu W. Non-linear system modeling using lstm neural networks. IFAC-PapersOnLine. 2018;51(13):485–9.
Hosseini S, Turhan B, Gunarathna D. A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng. 2017;45(2):111–47.
Herbold S, Trautsch A, Grabowski J. A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Software Eng. 2017;44(9):811–33.
Ni C, Xia X, Lo D, Chen X, Gu Q. Revisiting supervised and unsupervised methods for effort-aware cross-project defect prediction. IEEE Trans Softw Eng. 2020.
Bangash AA, Sahar H, Hindle A, Ali K. On the time-based conclusion stability of cross-project defect prediction models. Empir Softw Eng. 2020;25(6):5047–83.
Hosseini S, Turhan B, Mäntylä M. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol. 2018;95:296–312.
Tabassum S, Minku LL, Feng D, Cabral GG, Song L. An investigation of cross-project learning in online just-in-time software defect prediction. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) 2020; p. 554–65.
Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross-company software defect prediction. Inf Softw Technol. 2012;54(3):248–56.
Liu C, Yang D, Xia X, Yan M, Zhang X. A two-phase transfer learning model for cross-project defect prediction. Inf Softw Technol. 2019;107:125–36.
Herbold S, Trautsch A, Grabowski J. In Proceedings of the 40th International Conference on Software Engineering 2018; p. 063.
Li K, Xiang Z, Chen T, Tan KC. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) (IEEE), 2020; p. 573–84.
Jin C. Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Syst Appl. 2021;171: 114637.
Sun Z, Li J, Sun H, He L. Cfps: Collaborative filtering based source projects selection for cross-project defect prediction. Appl Soft Comput. 2021;99: 106940.
Amasaki S, Aman H, Yokogawa T. An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation. Empir Softw Eng. 2022;27(2):1–29.
Bal PR, Kumar S. Wr-elm: Weighted regularization extreme learning machine for imbalance learning in software fault prediction. IEEE Trans Reliab. 2020;69(4):1355–75.
Panichella A, Oliveto R, De Lucia A. Cross-project defect prediction models: L’union fait la force. In: 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE) 2014; p. 164–73.
Xia X, Lo D, Pan SJ, Nagappan N, Wang X. Hydra: Massively compositional model for cross-project defect prediction. IEEE Trans Software Eng. 2016;42(10):977–98.
Nevendra M, Singh P. Defect count prediction via metric-based convolutional neural network. Neural Comput Appl. 2021;1–26.
Bai CG, Cai KY, Hu QP, Ng SH. On the trend of remaining software defect estimation. IEEE Trans Syst Man Cybern-Part A. 2008;38(5):1129–42.
Huang Q, Ni C, Chen X, Gu Q, Cao K. Multi-project regression based approach for software defect number prediction. SEKE. 2019; 425–546.
Rathore SS, Kumar S. An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput. 2017;21(24):7417–34.
Kumar C, Yadav DK. Software defects estimation using metrics of early phases of software development life cycle. Int J Syst Assur Eng Manage. 2017;8(4):2109–17.
Bernstein A, Ekanayake J, Pinzger M. Improving defect prediction using temporal features and non linear models. Ninth international workshop on Principles of software evolution: In conjunction with the 6th ESEC/FSE joint meeting 2007; p. 11–8
D’Ambros M, Lanza M, Robbes R. An extensive comparison of bug prediction approaches. 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) 2010; p. 31–41.
Jiang T, Tan L, Kim S. Personalized defect prediction. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2013; p. 279–89.
Predicting defects using network analysis on dependency graphs. Zimmermann, Thomas and Nagappan, Nachiappan 2008; 531–40.
Koru AG, El Emam K, Zhang D, Liu H, Mathew D. Theory of relative defect proneness. Empir Softw Eng. 2008;13(5):473.
Bettenburg N, Nagappan M, Hassan AE. Think locally, act globally: improving defect and effort prediction models. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR) 2012; p. 60–9.
Kim S, Whitehead EJ, Zhang Y. Classifying software changes: clean or buggy? IEEE Trans Softw Eng. 2008;34(2):181–96.
Zhiyi H, Haidong S, Lin J, Junsheng C, Yu Y. Transfer fault diagnosis of bearing installed in different machines using enhanced deep auto-encoder. Measurement. 2020;152: 107393.
Xiao Y, Shao H, Han S, Huo Z, Wan J. Novel joint transfer network for unsupervised bearing fault diagnosis from simulation domain to experimental domain. IEEE/ASME Trans Mechatron. 2022.
Liu Y, Khoshgoftaar TM, Seliya N. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng. 2010;36(6):852–64.
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth international conference on software testing, verification and validation 2013; p. 252–61.
Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui F, Sun Y. Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab. 2018;67(2):581–97.
Shao H, Jiang H, Li X, Liang T. Rolling bearing fault detection using continuous deep belief network with locally linear embedding. Comput Ind. 2018;96:27–39.
Hua W, Chun S, Changzhen H, ZHANG Y, Xiao Y, et al. Software defect prediction via deep belief network. Chin J Electron. 2019;28(5):925–32.
Chen Y, Zhao X, Jia X. Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Select Top Appl Earth Observ Remote Sens. 2015;8(6):2381–92.
Sun X, Li T, Li Q, Huang Y, Li Y. Deep belief echo-state network and its application to time series prediction. Knowl-Based Syst. 2017;130:17–29.
Zhao Z, Jiao L, Zhao J, Gu J, Zhao J. Discriminant deep belief network for high-resolution sar image classification. Pattern Recogn. 2017;61:686–701.
O’Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M. Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci. 2013;7:178.
Deng L, Yu D, Dahl GE. Deep belief network for large vocabulary continuous speech recognition (2015). US Patent 8,972,253
Mohamed A, Dahl G, Hinton G. Deep belief networks for phone recognition. Nips workshop on deep learning for speech recognition and related applications. 2009;1(9):39.
Nayak SK, Ojha AC. Data leakage detection and prevention: Review and research directions. Mach Learn Inf Proc. 2020;203–12.
Acknowledgements
This work is supported by the IIT(BHU), India. The authors also like to thank Professor David Lo from SMU Singapore for encouraging me for this work. His valuable suggestions and critical comments increased the quality of the manuscript.
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
SKP (corresponding author): conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review & editing, and visualization. AKT: supervision, writing—review & editing, resources, formal analysis, and methodology.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pandey, S.K., Tripathi, A.K. DBDNN-Estimator: A Cross-Project Number of Fault Estimation Technique. SN COMPUT. SCI. 5, 29 (2024). https://doi.org/10.1007/s42979-023-02364-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02364-1