Cost-sensitive Dictionary Learning for Software Defect Prediction

Niu, Liang; Wan, Jianwu; Wang, Hongyuan; Zhou, Kaiwei

doi:10.1007/s11063-020-10355-z

Cost-sensitive Dictionary Learning for Software Defect Prediction

Published: 25 September 2020

Volume 52, pages 2415–2449, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Liang Niu¹,
Jianwu Wan ORCID: orcid.org/0000-0002-4637-1515^1,2,
Hongyuan Wang¹ &
…
Kaiwei Zhou¹

485 Accesses
8 Citations
Explore all metrics

Abstract

In recent years, software defect prediction has been recognized as a cost-sensitive learning problem. To deal with the unequal misclassification losses resulted by different classification errors, some cost-sensitive dictionary learning methods have been proposed recently. Generally speaking, these methods usually define the misclassification costs to measure the unequal losses and then propose to minimize the cost-sensitive reconstruction loss by embedding the cost information into the reconstruction function of dictionary learning. Although promising performance has been achieved, their cost-sensitive reconstruction functions are not well-designed. In addition, no sufficient attentions are paid to the coding coefficients which can also be helpful to reduce the reconstruction loss. To address these issues, this paper proposes a new cost-sensitive reconstruction loss function and introduces an additional cost-sensitive discrimination regularization for the coding coefficients. Both the two terms are jointly optimized in a unified cost-sensitive dictionary learning framework. By doing so, we can achieve the minimum reconstruction loss and thus obtain a more cost-sensitive dictionary for feature encoding of test data. In the experimental part, we have conducted extensive experiments on twenty-five software projects from four benchmark datasets of NASA, AEEEM, ReLink and Jureczko. The results, in comparison with ten state-of-the-art software defect prediction methods, demonstrate the effectiveness of learned cost-sensitive dictionary for software defect prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Addressing Class Imbalance and Cost Sensitivity in Software Defect Prediction by Combining Domain Costs and Balancing Costs

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Article 16 August 2017

Discriminating features-based cost-sensitive approach for software defect prediction

Article Open access 12 July 2021

Notes

References

Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13
Google Scholar
Shepperd M, Bowes D, Hall T (2014) Researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616
Google Scholar
Li ZQ, Jing XY, Zhu XK (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175
Google Scholar
Boehm BW, Basili VR (2005) Foundations of empirical software engineering: the legacy of Victor R.Basuli. Springer, Berlin
Google Scholar
Boehm BW, Papaccio PN (1988) Understanding and controlling software costs. IEEE Trans Softw Eng 14(10):1462–1477
Google Scholar
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
MathSciNet MATH Google Scholar
Halstead MH (1977) Elements of software science. Elsevier, North-Holland
MATH Google Scholar
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Google Scholar
Ma Y, Zhu S, Qin K, Luo G (2014) Combining the requirement information for software defect estimation in design time. Inf Process Lett 114(9):469–474
MathSciNet MATH Google Scholar
Jiang Y, Cuki B, Menzies T, Bartlow N (2008) Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th international workshop on predictor models in software engineering, pp 11–18
Gray D, Bowes D, Davey N, Sun Y, Christianson B (2009) Using the support vector machine as a classification method for software defect prediction with static code metrics. In: Proceedings of international conference on engineering applications of neural networks, pp 223–234
Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Soft 81(5):649–660
Google Scholar
Wang J, Shen B, Chen Y (2012) Compressed C4.5 models for software defect prediction. In: Proceedings of 12th international conference on quality software, pp 13–16
Khoshgoftaar TM, Seliya N (2002) Tree-based software quality estimation models for fault prediction. In: Proceedings of eighth IEEE symposium on software metrics, pp 203–214
Wang T, Li WH (2010) Naive Bayes software defect prediction model. In: Proceedings of 2010 international conference on computational intelligence and software engineering, pp 1–4
Amasaki S, Takagi Y, Mizuno O, Kikuno T (2003) A Bayesian belief network for assessing the likelihood of fault content. In: Proceedings of 14th international symposium on software reliability engineering, pp 215–226
Khoshgoftaar TM, Allen EB, Hudepohl JP, Aud SJ (1997) Application of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4):902–909
Google Scholar
Singh Y, Kaur A, Malhotra R (2008) Predicting software fault proneness model using neural network. In: Proceedings of international conference on product focused software process improvement, pp 204–214
Liu MX, Miao LS, Zhang DQ (2014) Two-stage cost-sensitive learning for software defect prediction. IEEE Trans Reliab 63(2):676–686
Google Scholar
Yang M, Zhang L, Feng X, Zhang D (2011) Fisher discrimination dictionary learning for sparse representation. In: Proceedings of 2011 international conference on computer vision, pp 543–550
Liu HD, Yang M, Gao Y, Yin YL, Chen L (2014) Bilinear discriminative dictionary learning for face recognition. Pattern Recognit 47(5):1835–1845
Google Scholar
Özakıncı R, Tarhan A (2018) Early software defect prediction: a systematic map and review. J Syst Softw 144:216–239
Google Scholar
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327
Google Scholar
Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Zhang T (2019) Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf Softw Technol 106:182–200
Google Scholar
Zhang ZW, Jing XY, Wang TJ (2017) Label propagation based semi-supervised learning for software defect prediction. Automat Softw Eng 24(1):47–69
Google Scholar
Kondo M, Bezemer CP, Kamei Y, Hassan AE, Mizuno O (2019) The impact of feature reduction techniques on defect prediction models. Empir Softw Eng 24(4):1925–1963
Google Scholar
Yang X, Lo D, Xia X, Sun J (2017) TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220
Google Scholar
Xu Z, Liu J, Luo X, Zhang T (2018) Cross-version defect prediction via hybrid active learning with kernel principal component analysis. In: Proceedings of IEEE 25th international conference on software analysis, evolution and reengineering (SANER), pp. 209–220
Jing XY, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 10th joint meeting on foundations of software engineering, pp 496–507
Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng 24(2):602–636
Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Google Scholar
Wan JW, Yang M, Chen YJ (2015) Discriminative cost sensitive Laplacian score for face recognition. Neurocomputing 152(25):333–344
Google Scholar
Wan JW, Wang HY, Yang M (2017) Cost sensitive semi-supervised canonical correlation analysis for multi-view dimensionality reduction. Neural Process Lett 45(2):411–430
Google Scholar
Khoshgoftaar TM, Geleyn E, Nguyen L, Bullard L (2002) Cost-sensitive boosting in software quality modeling. In: Proceedings of international symposium on high assurance systems engineering, pp 51–60
Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543
Google Scholar
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of 30th international conference on software engineering, pp 181–190
Yu J, Rui Y, Tao DC (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
MathSciNet MATH Google Scholar
Yu J, Tan M, Zhang H, Tao DC, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Google Scholar
Zhang GQ, Sun HJ, Ji ZX, Yuan YH, Sun QS (2016) Cost-sensitive dictionary learning for face recognition. Pattern Recognit 60:613–629
Google Scholar
Wu F, Jing XY, Yue D (2017) Multi-view discriminant dictionary learning via learning view-specific and shared structured dictionaries for image classification. Neural Process Lett 45(2):649–666
Google Scholar
Zhang Z, Sun Y, Wang Y, Zha Z, Yan SC, Wang M (2019) Convolutional dictionary pair learning network for image representation learning. arXiv:1912.12138
Liu H, Guo D, Sun F (2016) Object recognition using tactile measurements: kernel sparse coding methods. IEEE Trans Instrum Meas 65(3):656–665
Google Scholar
Li Z, Zhang Z, Qin J, Zhang Z, Shao L (2019) Discriminative fisher embedding dictionary learning algorithm for object recognition. IEEE Trans Neural Netw Learn Syst 31(3):786–800
MathSciNet Google Scholar
Shrivastava A, Patel VM, Chellappa R (2015) Non-linear dictionary learning with partially labeled data. Pattern Recognit 48(11):3283–3292
MATH Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
MATH Google Scholar
Jing XY, Ying S, Zhang ZW, Wu SS, Liu J (2014) Dictionary learning based software defect prediction. In: Proceedings of the 36th international conference on software engineering, pp 414–423
Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui F, Sun Y (2018) Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab 67(2):581–597
Google Scholar
Wan JW, Yang M, Wang HY (2017) Cost sensitive matrix factorization for face recognition. In: Proceedings of intelligence data engineering and automated learning, pp 136–145
Wan JW, Yang M, Gao Y, Chen YJ (2014) Pairwise costs in semisupervised discriminant analysis for face recognition. IEEE Trans Inf Forensic Secur 9(10):1569–1580
Google Scholar
Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665
MathSciNet Google Scholar
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken
MATH Google Scholar
Rosasco L, Verri A, Santoro M, Mosci S, Villa S (2009) Iterative projection methods for structured sparsity regularization. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282
Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. In: Proceedings of IEEE international conference on image processing, pp 1601–1604
Shepperd M, Song QB, Sun Z, Mair C (2013) Data quality: some comments on the nasa software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Google Scholar
D’Ambros M, Lanza M, Robbes R (2012) An extensive comparison of bug prediction approaches. In: Proceedings of IEEE working conference on mining software repositories, pp 31–41
Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 15–25
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, pp 1–10
Ji H, Huang S, Wu Y, Hui Z, Zheng C (2019) A new weighted naive Bayes method based on information diffusion for software defect prediction. Softw Qual J 27(3):923–968
Google Scholar
Wan JW, Wang Y (2019) Cost-sensitive label propagation for semi-supervised face recognition. IEEE Trans Inf Forensic Secur 14(7):1729–1743
Google Scholar
Xu Z, Li S, Luo X, Liu J, Zhang T, Tang Y, Keung J (2019) TSTSS: a two-stage training subset selection framework for cross version defect prediction. J Syst Soft 154:59–78
Google Scholar
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of international joint conference on artificial intelligence, pp 973–978
Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat Theory Methods 9(6):571–595
MATH Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet MATH Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
MATH Google Scholar
Nemenyi PB (1963) Distribution-free multiple comparisons. PhD Thesis, Princeton University, Princeton
Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous referees and the editors for their helpful comments and suggestions.

Author information

Authors and Affiliations

School of Information Science and Engineering, Changzhou University, Changzhou, 213164, Jiangsu, People’s Republic of China
Liang Niu, Jianwu Wan, Hongyuan Wang & Kaiwei Zhou
School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, 639798, Singapore
Jianwu Wan

Authors

Liang Niu
View author publications
You can also search for this author in PubMed Google Scholar
Jianwu Wan
View author publications
You can also search for this author in PubMed Google Scholar
Hongyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kaiwei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianwu Wan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by National Natural Science Foundation of China under Grants 61502058, 61572085 and 61976028.

Appendices

Appendix A

In this section, we discuss the convexity of $R(A^{(i)},C)$ with respect to $A^{(i)}$. Firstly, we define two constants including matrix $Z_i=I-\frac{1}{n_i}E_i^i$ where $E^i_i\in {\mathbf {R}}^{n_i\times n_i}$ with all entries being 1 and $I\in {\mathbf {R}}^{n_i\times n_i}$ is the identity matrix, and vector ${\mathbf {p}}_i=\frac{1}{n_i}{\mathbf {v}}_i$ where ${\mathbf {v}}_i=[1,\ldots ,1]^T\in {\mathbf {R}}^{n_i}$ with all entries being 1.

With these two constants, we can rewrite the $R(A^{(i)},C)$ in Eq. (7) as follows

$$\begin{aligned} R(A^{(i)},C)=g(i)||A^{(i)}Z_i||_F^2-\sum _{j=1}^c(C_{ij}+C_{ji})||A^{(i)}{\mathbf {p}}_i-{\mathbf {u}}^{(j)}||_2^2+\eta ||A^{(i)}||_F^2, \end{aligned}$$

(15)

where

$$\begin{aligned} \left\{ \begin{array}{l} ||A^{(i)}Z_i||_F^2=\sum _{{\mathbf {a}}_k^{(i)}\in A^{(i)}} ||{\mathbf {a}}_k^{(i)}-{\mathbf {u}}^{(i)}||_2^2\\ A^{(i)}{\mathbf {p}}_i={\mathbf {u}}^{(i)} \end{array} \right. \end{aligned}$$

(16)

According to the definition of convex function, the convexity of $R(A^{(i)},C)$ with respect to $A^{(i)}$ depends on whether its Hessian matrix $\nabla ^2R(A^{(i)},C)$ is positive definite or not [67]. Specifically, by taking the derivative of $R(A^{(i)},C)$ in Eq. (15) with respect to $A^{(i)}$, the Hessian matrix $\nabla ^2R(A^{(i)},C)$ is as follows

$$\begin{aligned} \nabla ^2R(A^{(i)},C)=2g(i)Z_iZ_i^T-2\sum _{j=1}^c(C_{ij}+C_{ji}){\mathbf {p}}_i{\mathbf {p}}_i^T+2\eta I. \end{aligned}$$

(17)

In order to prove the positive definite of matrix $\nabla ^2R(A^{(i)},C)$, we substitute the constants $Z_i$ and ${\mathbf {p}}_i$ in Eq. (17) with $I-\frac{1}{n_i}E_i^i$ and $\frac{1}{n_i}{\mathbf {v}}_i$ respectively, and thus obtain

$$\begin{aligned} \nabla ^2R(A^{(i)},C)=2\Big (g(i)+\eta \Big )I-2E_i^i\left( \frac{1}{n_i}g(i)+\frac{1}{n_i^2}\sum _{j=1}^c(C_{ij}+C_{ji})\right) . \end{aligned}$$

(18)

Obviously, if matrix $\nabla ^2R(A^{(i)},C)$ is positive definite, all of its eigenvalues should be greater than 0. Due to the fact that the maximal eigenvalue of matrix $E_i^i$ is $n_i$ [20], the positive definite of $\nabla ^2R(A^{(i)},C)$ can be proved if

$$\begin{aligned} \Big (g(i)+\eta \Big )-n_i\left( \frac{1}{n_i}g(i)+\frac{1}{n_i^2}\sum _{j=1}^c(C_{ij}+C_{ji})\right) >0. \end{aligned}$$

(19)

By simple derivation, we obtain the condition $\eta >\frac{1}{n_i}\sum _{j=1}^c(C_{ij}+C_{ji})$, which can guarantee the positive definite of matrix $\nabla ^2R(A^{(i)},C)$ and thus make the $R(A^{(i)},C)$ be convex with respect to ${A}^{(i)}$.

Appendix B

See Tables 11, 12, 13 and 14.

Table 11 Indicator values on NASA dataset

Full size table

Table 12 Indicator values on AEEEM dataset

Full size table

Table 13 Indicator values on ReLink dataset

Full size table

Table 14 Indicator values on Jureczko dataset

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niu, L., Wan, J., Wang, H. et al. Cost-sensitive Dictionary Learning for Software Defect Prediction. Neural Process Lett 52, 2415–2449 (2020). https://doi.org/10.1007/s11063-020-10355-z

Download citation

Accepted: 16 September 2020
Published: 25 September 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11063-020-10355-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-sensitive Dictionary Learning for Software Defect Prediction

Abstract

Access this article

Similar content being viewed by others

Addressing Class Imbalance and Cost Sensitivity in Software Defect Prediction by Combining Domain Costs and Balancing Costs

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Discriminating features-based cost-sensitive approach for software defect prediction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost-sensitive Dictionary Learning for Software Defect Prediction

Abstract

Access this article

Similar content being viewed by others

Addressing Class Imbalance and Cost Sensitivity in Software Defect Prediction by Combining Domain Costs and Balancing Costs

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Discriminating features-based cost-sensitive approach for software defect prediction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation