Skip to main content
Log in

Cost-sensitive Dictionary Learning for Software Defect Prediction

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In recent years, software defect prediction has been recognized as a cost-sensitive learning problem. To deal with the unequal misclassification losses resulted by different classification errors, some cost-sensitive dictionary learning methods have been proposed recently. Generally speaking, these methods usually define the misclassification costs to measure the unequal losses and then propose to minimize the cost-sensitive reconstruction loss by embedding the cost information into the reconstruction function of dictionary learning. Although promising performance has been achieved, their cost-sensitive reconstruction functions are not well-designed. In addition, no sufficient attentions are paid to the coding coefficients which can also be helpful to reduce the reconstruction loss. To address these issues, this paper proposes a new cost-sensitive reconstruction loss function and introduces an additional cost-sensitive discrimination regularization for the coding coefficients. Both the two terms are jointly optimized in a unified cost-sensitive dictionary learning framework. By doing so, we can achieve the minimum reconstruction loss and thus obtain a more cost-sensitive dictionary for feature encoding of test data. In the experimental part, we have conducted extensive experiments on twenty-five software projects from four benchmark datasets of NASA, AEEEM, ReLink and Jureczko. The results, in comparison with ten state-of-the-art software defect prediction methods, demonstrate the effectiveness of learned cost-sensitive dictionary for software defect prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://metricsgrimoire.github.io.

  2. http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm.

  3. https://scitools.com.

  4. http://www.socr.ucla.edu/Applets.dir/F_Table.html.

References

  1. Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13

    Google Scholar 

  2. Shepperd M, Bowes D, Hall T (2014) Researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616

    Google Scholar 

  3. Li ZQ, Jing XY, Zhu XK (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175

    Google Scholar 

  4. Boehm BW, Basili VR (2005) Foundations of empirical software engineering: the legacy of Victor R.Basuli. Springer, Berlin

    Google Scholar 

  5. Boehm BW, Papaccio PN (1988) Understanding and controlling software costs. IEEE Trans Softw Eng 14(10):1462–1477

    Google Scholar 

  6. McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320

    MathSciNet  MATH  Google Scholar 

  7. Halstead MH (1977) Elements of software science. Elsevier, North-Holland

    MATH  Google Scholar 

  8. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Google Scholar 

  9. Ma Y, Zhu S, Qin K, Luo G (2014) Combining the requirement information for software defect estimation in design time. Inf Process Lett 114(9):469–474

    MathSciNet  MATH  Google Scholar 

  10. Jiang Y, Cuki B, Menzies T, Bartlow N (2008) Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th international workshop on predictor models in software engineering, pp 11–18

  11. Gray D, Bowes D, Davey N, Sun Y, Christianson B (2009) Using the support vector machine as a classification method for software defect prediction with static code metrics. In: Proceedings of international conference on engineering applications of neural networks, pp 223–234

  12. Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Soft 81(5):649–660

    Google Scholar 

  13. Wang J, Shen B, Chen Y (2012) Compressed C4.5 models for software defect prediction. In: Proceedings of 12th international conference on quality software, pp 13–16

  14. Khoshgoftaar TM, Seliya N (2002) Tree-based software quality estimation models for fault prediction. In: Proceedings of eighth IEEE symposium on software metrics, pp 203–214

  15. Wang T, Li WH (2010) Naive Bayes software defect prediction model. In: Proceedings of 2010 international conference on computational intelligence and software engineering, pp 1–4

  16. Amasaki S, Takagi Y, Mizuno O, Kikuno T (2003) A Bayesian belief network for assessing the likelihood of fault content. In: Proceedings of 14th international symposium on software reliability engineering, pp 215–226

  17. Khoshgoftaar TM, Allen EB, Hudepohl JP, Aud SJ (1997) Application of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4):902–909

    Google Scholar 

  18. Singh Y, Kaur A, Malhotra R (2008) Predicting software fault proneness model using neural network. In: Proceedings of international conference on product focused software process improvement, pp 204–214

  19. Liu MX, Miao LS, Zhang DQ (2014) Two-stage cost-sensitive learning for software defect prediction. IEEE Trans Reliab 63(2):676–686

    Google Scholar 

  20. Yang M, Zhang L, Feng X, Zhang D (2011) Fisher discrimination dictionary learning for sparse representation. In: Proceedings of 2011 international conference on computer vision, pp 543–550

  21. Liu HD, Yang M, Gao Y, Yin YL, Chen L (2014) Bilinear discriminative dictionary learning for face recognition. Pattern Recognit 47(5):1835–1845

    Google Scholar 

  22. Özakıncı R, Tarhan A (2018) Early software defect prediction: a systematic map and review. J Syst Softw 144:216–239

    Google Scholar 

  23. Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327

    Google Scholar 

  24. Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Zhang T (2019) Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf Softw Technol 106:182–200

    Google Scholar 

  25. Zhang ZW, Jing XY, Wang TJ (2017) Label propagation based semi-supervised learning for software defect prediction. Automat Softw Eng 24(1):47–69

    Google Scholar 

  26. Kondo M, Bezemer CP, Kamei Y, Hassan AE, Mizuno O (2019) The impact of feature reduction techniques on defect prediction models. Empir Softw Eng 24(4):1925–1963

    Google Scholar 

  27. Yang X, Lo D, Xia X, Sun J (2017) TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220

    Google Scholar 

  28. Xu Z, Liu J, Luo X, Zhang T (2018) Cross-version defect prediction via hybrid active learning with kernel principal component analysis. In: Proceedings of IEEE 25th international conference on software analysis, evolution and reengineering (SANER), pp. 209–220

  29. Jing XY, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 10th joint meeting on foundations of software engineering, pp 496–507

  30. Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng 24(2):602–636

    Google Scholar 

  31. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Google Scholar 

  32. Wan JW, Yang M, Chen YJ (2015) Discriminative cost sensitive Laplacian score for face recognition. Neurocomputing 152(25):333–344

    Google Scholar 

  33. Wan JW, Wang HY, Yang M (2017) Cost sensitive semi-supervised canonical correlation analysis for multi-view dimensionality reduction. Neural Process Lett 45(2):411–430

    Google Scholar 

  34. Khoshgoftaar TM, Geleyn E, Nguyen L, Bullard L (2002) Cost-sensitive boosting in software quality modeling. In: Proceedings of international symposium on high assurance systems engineering, pp 51–60

  35. Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543

    Google Scholar 

  36. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of 30th international conference on software engineering, pp 181–190

  37. Yu J, Rui Y, Tao DC (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032

    MathSciNet  MATH  Google Scholar 

  38. Yu J, Tan M, Zhang H, Tao DC, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058

  39. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Google Scholar 

  40. Zhang GQ, Sun HJ, Ji ZX, Yuan YH, Sun QS (2016) Cost-sensitive dictionary learning for face recognition. Pattern Recognit 60:613–629

    Google Scholar 

  41. Wu F, Jing XY, Yue D (2017) Multi-view discriminant dictionary learning via learning view-specific and shared structured dictionaries for image classification. Neural Process Lett 45(2):649–666

    Google Scholar 

  42. Zhang Z, Sun Y, Wang Y, Zha Z, Yan SC, Wang M (2019) Convolutional dictionary pair learning network for image representation learning. arXiv:1912.12138

  43. Liu H, Guo D, Sun F (2016) Object recognition using tactile measurements: kernel sparse coding methods. IEEE Trans Instrum Meas 65(3):656–665

    Google Scholar 

  44. Li Z, Zhang Z, Qin J, Zhang Z, Shao L (2019) Discriminative fisher embedding dictionary learning algorithm for object recognition. IEEE Trans Neural Netw Learn Syst 31(3):786–800

    MathSciNet  Google Scholar 

  45. Shrivastava A, Patel VM, Chellappa R (2015) Non-linear dictionary learning with partially labeled data. Pattern Recognit 48(11):3283–3292

    MATH  Google Scholar 

  46. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  47. Jing XY, Ying S, Zhang ZW, Wu SS, Liu J (2014) Dictionary learning based software defect prediction. In: Proceedings of the 36th international conference on software engineering, pp 414–423

  48. Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui F, Sun Y (2018) Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab 67(2):581–597

    Google Scholar 

  49. Wan JW, Yang M, Wang HY (2017) Cost sensitive matrix factorization for face recognition. In: Proceedings of intelligence data engineering and automated learning, pp 136–145

  50. Wan JW, Yang M, Gao Y, Chen YJ (2014) Pairwise costs in semisupervised discriminant analysis for face recognition. IEEE Trans Inf Forensic Secur 9(10):1569–1580

    Google Scholar 

  51. Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665

    MathSciNet  Google Scholar 

  52. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken

    MATH  Google Scholar 

  53. Rosasco L, Verri A, Santoro M, Mosci S, Villa S (2009) Iterative projection methods for structured sparsity regularization. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282

  54. Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. In: Proceedings of IEEE international conference on image processing, pp 1601–1604

  55. Shepperd M, Song QB, Sun Z, Mair C (2013) Data quality: some comments on the nasa software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215

    Google Scholar 

  56. D’Ambros M, Lanza M, Robbes R (2012) An extensive comparison of bug prediction approaches. In: Proceedings of IEEE working conference on mining software repositories, pp 31–41

  57. Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 15–25

  58. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, pp 1–10

  59. Ji H, Huang S, Wu Y, Hui Z, Zheng C (2019) A new weighted naive Bayes method based on information diffusion for software defect prediction. Softw Qual J 27(3):923–968

    Google Scholar 

  60. Wan JW, Wang Y (2019) Cost-sensitive label propagation for semi-supervised face recognition. IEEE Trans Inf Forensic Secur 14(7):1729–1743

    Google Scholar 

  61. Xu Z, Li S, Luo X, Liu J, Zhang T, Tang Y, Keung J (2019) TSTSS: a two-stage training subset selection framework for cross version defect prediction. J Syst Soft 154:59–78

    Google Scholar 

  62. Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of international joint conference on artificial intelligence, pp 973–978

  63. Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat Theory Methods 9(6):571–595

    MATH  Google Scholar 

  64. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  65. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    MATH  Google Scholar 

  66. Nemenyi PB (1963) Distribution-free multiple comparisons. PhD Thesis, Princeton University, Princeton

  67. Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous referees and the editors for their helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianwu Wan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by National Natural Science Foundation of China under Grants 61502058, 61572085 and 61976028.

Appendices

Appendix A

In this section, we discuss the convexity of \(R(A^{(i)},C)\) with respect to \(A^{(i)}\). Firstly, we define two constants including matrix \(Z_i=I-\frac{1}{n_i}E_i^i\) where \(E^i_i\in {\mathbf {R}}^{n_i\times n_i}\) with all entries being 1 and \(I\in {\mathbf {R}}^{n_i\times n_i}\) is the identity matrix, and vector \({\mathbf {p}}_i=\frac{1}{n_i}{\mathbf {v}}_i\) where \({\mathbf {v}}_i=[1,\ldots ,1]^T\in {\mathbf {R}}^{n_i}\) with all entries being 1.

With these two constants, we can rewrite the \(R(A^{(i)},C)\) in Eq. (7) as follows

$$\begin{aligned} R(A^{(i)},C)=g(i)||A^{(i)}Z_i||_F^2-\sum _{j=1}^c(C_{ij}+C_{ji})||A^{(i)}{\mathbf {p}}_i-{\mathbf {u}}^{(j)}||_2^2+\eta ||A^{(i)}||_F^2, \end{aligned}$$
(15)

where

$$\begin{aligned} \left\{ \begin{array}{l} ||A^{(i)}Z_i||_F^2=\sum _{{\mathbf {a}}_k^{(i)}\in A^{(i)}} ||{\mathbf {a}}_k^{(i)}-{\mathbf {u}}^{(i)}||_2^2\\ A^{(i)}{\mathbf {p}}_i={\mathbf {u}}^{(i)} \end{array} \right. \end{aligned}$$
(16)

According to the definition of convex function, the convexity of \(R(A^{(i)},C)\) with respect to \(A^{(i)}\) depends on whether its Hessian matrix \(\nabla ^2R(A^{(i)},C)\) is positive definite or not [67]. Specifically, by taking the derivative of \(R(A^{(i)},C)\) in Eq. (15) with respect to \(A^{(i)}\), the Hessian matrix \(\nabla ^2R(A^{(i)},C)\) is as follows

$$\begin{aligned} \nabla ^2R(A^{(i)},C)=2g(i)Z_iZ_i^T-2\sum _{j=1}^c(C_{ij}+C_{ji}){\mathbf {p}}_i{\mathbf {p}}_i^T+2\eta I. \end{aligned}$$
(17)

In order to prove the positive definite of matrix \(\nabla ^2R(A^{(i)},C)\), we substitute the constants \(Z_i\) and \({\mathbf {p}}_i\) in Eq. (17) with \(I-\frac{1}{n_i}E_i^i\) and \(\frac{1}{n_i}{\mathbf {v}}_i\) respectively, and thus obtain

$$\begin{aligned} \nabla ^2R(A^{(i)},C)=2\Big (g(i)+\eta \Big )I-2E_i^i\left( \frac{1}{n_i}g(i)+\frac{1}{n_i^2}\sum _{j=1}^c(C_{ij}+C_{ji})\right) . \end{aligned}$$
(18)

Obviously, if matrix \(\nabla ^2R(A^{(i)},C)\) is positive definite, all of its eigenvalues should be greater than 0. Due to the fact that the maximal eigenvalue of matrix \(E_i^i\) is \(n_i\) [20], the positive definite of \(\nabla ^2R(A^{(i)},C)\) can be proved if

$$\begin{aligned} \Big (g(i)+\eta \Big )-n_i\left( \frac{1}{n_i}g(i)+\frac{1}{n_i^2}\sum _{j=1}^c(C_{ij}+C_{ji})\right) >0. \end{aligned}$$
(19)

By simple derivation, we obtain the condition \(\eta >\frac{1}{n_i}\sum _{j=1}^c(C_{ij}+C_{ji})\), which can guarantee the positive definite of matrix \(\nabla ^2R(A^{(i)},C)\) and thus make the \(R(A^{(i)},C)\) be convex with respect to \({A}^{(i)}\).

Appendix B

See Tables 11, 12, 13 and 14.

Table 11 Indicator values on NASA dataset
Table 12 Indicator values on AEEEM dataset
Table 13 Indicator values on ReLink dataset
Table 14 Indicator values on Jureczko dataset

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niu, L., Wan, J., Wang, H. et al. Cost-sensitive Dictionary Learning for Software Defect Prediction. Neural Process Lett 52, 2415–2449 (2020). https://doi.org/10.1007/s11063-020-10355-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10355-z

Keywords

Navigation