Skip to main content
Log in

Software-defect prediction within and across projects based on improved self-organizing data mining

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper proposes a new method for software-defect prediction based on self-organizing data mining; this method can establish a causal relationship between software metrics and defects. Defect-prediction models were established for intra-project and cross-project scenarios. For intra-project forecasting, this article establishes a self-organizing data mining model, adding a method of smooth data preprocessing to solve the problem of data imbalance. For cross-project forecasting, this article establishes a self-organizing data mining model, solves the difference between the two by finding a source-project instance with a larger correlation coefficient with the target project, and establishes a defect-prediction model for the selected source-project instance. This paper aims to achieve classification and ranking prediction. The proposed method is tested on public-defect datasets. In the classification-prediction experiment, the precision, F-measure, and AUC evaluation indicators of this method are used. In the ranking-prediction experiment, AAE and ARE evaluation by this method are optimized. The algorithm is found to be an efficient and feasible method for software-defect prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Li ZQ, Jing XY, Zhu XK (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175. https://doi.org/10.1049/iet-sen.2017.0148

    Article  Google Scholar 

  2. Khoshgoftaar TM, Allen EB (1999) A comparative study of ordering and classification of fault-prone software modules. Empir Softw Eng 4(2):159–186. https://doi.org/10.1023/A:1009876418873

    Article  Google Scholar 

  3. Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci 179(8):1040–1058. https://doi.org/10.1016/j.ins.2008.12.001

    Article  Google Scholar 

  4. Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction based on support vector machine. In Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering, Chicago, Illinois, USA, pp 213–222

  5. Yang XX (2013) Metrics-Based Software Defect Prediction, Ph.D. dissertation, University of Science and Technology of China, Hefei, CN

  6. Zhang DP, Liu GQ, Zhang K (2016) Software defect prediction model based on GMDH causal relationship. Comput Sci 43(7):171–175

    Google Scholar 

  7. Herbold S (2018) A systematic mapping study on cross-project defect prediction. Empirical Software Engineering manuscript, pp 1–78

  8. Jing XY, Wu F, Dong WX, Xu BW (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339. https://doi.org/10.1109/TSE.2016.2597849

    Article  Google Scholar 

  9. Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui FY, Sun YF (2018) Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab 67(2):581–597. https://doi.org/10.1109/TR.2018.2804922

    Article  Google Scholar 

  10. Sun Y, Jing XY, Wu F, Dong XW, Sun YF, Wang RC (2021) Semi-supervised heterogeneous defect prediction with open-source projects on GitHub. Int J Softw Eng Knowl Eng 31(6):889–916. https://doi.org/10.1142/S0218194021500273

    Article  Google Scholar 

  11. Zhu K, Zhang N, Ying S , Zhu D (2020) Within-project and cross-project just-in-time defect prediction based on denoising autoencoder and convolutional neural network. IET Softw, Doi: https://doi.org/10.1049/iet-sen.2019.0278

  12. Gong LN, Jiang SJ, Bo LL, Jiang L, Qian J (2020) A novel class-imbalance learning approach for both within-project and cross-project defect prediction. IEEE Trans Reliab 69(1):40–54. https://doi.org/10.1109/TR.2019.2895462

    Article  Google Scholar 

  13. Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496. https://doi.org/10.1109/TSE.2008.35

    Article  Google Scholar 

  14. Taghi MK, Cukic B, Seliya N (2007) An empirical assessment on program module-order models. Quality Technol Quantitative Manag 4(2):171–190. https://doi.org/10.1080/16843703.2007.11673144

    Article  MathSciNet  Google Scholar 

  15. Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660. https://doi.org/10.1016/j.jss.2007.07.040

    Article  Google Scholar 

  16. Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82(1):357–382. https://doi.org/10.1016/j.eswa.2017.04.014

    Article  Google Scholar 

  17. Chang CC (2013). Research about software defect priority prediction model based on adaboost-SVM algorithm. Degree of master dissertation. Nanjing University of Posts and Telecommunications of China, Nanjing, CN

  18. Qiao L, Li XS, Umer Q, Guo P (2020) Deep learning based software defect prediction. Neurocomputing 38:100–110. https://doi.org/10.1016/j.neucom.2019.11.067

    Article  Google Scholar 

  19. Bing Z (2015) Researches of automatic modeling based on the principle of self-organization. Ph.D. dissertation, Zhengzhou University of China, Zhengzhou, CN

  20. IVAKHNENKO A G, (1971) Polynomial theory of complex systems. IEEE Trans Syst Man Cybern 1(4):364–378. https://doi.org/10.1109/TSMC.1971.4308320

    Article  MathSciNet  Google Scholar 

  21. Li ZG (2013) The improvements of GMDH algorithm and research of the prediction and early warning on coal market system. Degree of master dissertation. Nanjing University of Aeronautics and astronautics of China, Nanjing, CN

  22. Guo FX (2012) Research on credit risk assessment method based on GMDH model and principal component logistic model. Degree of master dissertation. Qingdao University of China, Shandong, CN

  23. He CZ, LV J P, (2001) Study of self-organizing data mining theory and the complexity of economic systems. Syst Eng Theory Pract 21(12):1–5

    Google Scholar 

  24. He CZ, Zhang B, Yu H (2002) Comparison between the self-organizing data mining and artificial neural network. Syst Eng Theory Practice, 22(11)

  25. Kang YL (2006) Study of GDP increase and the influencing factors in Chengdu based on self-organization theory. Ph.D. dissertation. Southeast Jiaotong university of China, Chengdu CN

  26. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artific Intell Res 16(1):321–357. https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  27. Chen L, Fang B, Shang Z, Tang YY (2018) Tackling class overlap and imbalance problems in software defect prediction. Softw Qual J 26:97–125. https://doi.org/10.1007/s11219-016-9342-6

    Article  Google Scholar 

  28. Feng S, Keung J, Yu X, Xiao Y, Zhang M (2021) Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction. Inf Softw Technol 139(6):106662. https://doi.org/10.1016/j.infsof.2021.106662

    Article  Google Scholar 

  29. Turhan B, Menzies T, Bener AB, Stefano JD (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578. https://doi.org/10.1007/s10664-008-9103-7

    Article  Google Scholar 

  30. Chen X, Wang LP, Gu Q, Wang Z, Wang QP (2018) A survey on cross-project software defect prediction methods. Chinese J Comput 41(1):254–274. https://doi.org/10.1007/s11219-016-9342-6

    Article  Google Scholar 

  31. Ma Y, Luo GC, Xue Z, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256. https://doi.org/10.1016/j.infsof.2011.09.007

    Article  Google Scholar 

  32. NAM J, PAN S J, KIM S (2013) Transfer defect learning. In Proceedings of the 35th International Conference on Software Engineering (ICSE), San Francisco, USA, pp: 382–391. https://doi.org/10.1109/ICSE.2013.6606584

  33. Wang S, Liu T, Nam J, Tan L (2018) Deep Semantic Feature Learning for Software Defect Prediction. IEEE Trans Softw Eng. Doi: https://doi.org/10.1109/TSE.2018.2877612

  34. Li J, He P, Zhu J, Lyu MR (2017) Software Defect Prediction via Convolutional Neural Network. IEEE Proceedings of International Conference on Software Quality, Reliability and Security (QRS), Prague, Czech Republic, pp: 318–328. https://doi.org/10.1109/QRS.2017.42

  35. Qiu SJ, Xu H, Deng JH, Jiang SY, Lu L (2019) Transfer convolutional neural network for cross-project defect prediction. Appl Sci 9(13):2660. https://doi.org/10.3390/app9132660

    Article  Google Scholar 

  36. Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595. https://doi.org/10.1007/s10664-008-9079-3

    Article  Google Scholar 

  37. Read S (2017) The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction. Innovat Syst Softw Eng 13:201–217. https://doi.org/10.1007/s11334-017-0295-0

    Article  Google Scholar 

  38. Rathore SS, Kumar S (2017) Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl-Based Syst 119:232–256. https://doi.org/10.1016/j.knosys.2016.12.017

    Article  Google Scholar 

  39. Shepperd M, Song QB, Sun ZB, Mair CL (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215. https://doi.org/10.1109/TSE.2013.11

    Article  Google Scholar 

  40. Caglayan B, Kocaguneli E, Krall J, Peters F, Turhan B (2012) The PROMISE repository of empirical software engineering data. West Virginia University Department of Computer Science

  41. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering, TiA2oara, Romania

  42. Wang J, Shen BJ, Chen YT (2012) Compressed C4.5 models for software defect prediction. IEEE Proceeding of International Conference on Quality Software, Xi'an, Shaanxi, China, pp: 13–16.https://doi.org/10.1109/QSIC.2012.19

  43. Wang T, Li W (2010) Naive Bayes software defect prediction Model. In Proceedings of the International Conference on Computational Intelligence and Software Engineering, Wuhan, pp: 1–4. https://doi.org/10.1109/CISE.2010.5677057

  44. Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software Defect prediction. IEEE Trans Syst Man, Cybernet, Part C (Application and Reviews), 42(6):1806–1817. https://doi.org/10.1109/TSMCC.2012.2226152

  45. Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543. https://doi.org/10.1016/j.eswa.2009.12.056

    Article  Google Scholar 

  46. Jing, XY, Ying S, Zhang ZW, Wu SS, Liu J (2014) Dictionary learning based software defect prediction. In Proceedings International Conference on Software Engineering, IEEE Computer Society, pp 414–423. https://doi.org/10.1145/2568225.2568320

Download references

Acknowledgements

This work was supported by the National Engineering Laboratory Project for the Safety Technology of Urban Rail Transit System (Development and Reform Office High Technology [2016] No. 583).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Q., Ren, J. Software-defect prediction within and across projects based on improved self-organizing data mining. J Supercomput 78, 6147–6173 (2022). https://doi.org/10.1007/s11227-021-04113-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04113-8

Keywords

Navigation