A software defect prediction method with metric compensation based on feature selection and transfer learning

Chen, Jinfu; Wang, Xiaoli; Cai, Saihua; Xu, Jiaping; Chen, Jingyi; Chen, Haibo

doi:10.1631/FITEE.2100468

A software defect prediction method with metric compensation based on feature selection and transfer learning

一种基于特征选择与迁移学习的度量补偿软件缺陷预测方法

Research Article
Published: 04 April 2022

Volume 23, pages 715–731, (2022)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Jinfu Chen (陈锦富) ORCID: orcid.org/0000-0002-3124-5452^1,2,
Xiaoli Wang (王小丽)^1,2,
Saihua Cai (蔡赛华) ORCID: orcid.org/0000-0003-0743-1156^1,2,
Jiaping Xu (徐家平)¹,
Jingyi Chen (陈静怡)¹ &
…
Haibo Chen (陈海波)¹

247 Accesses
3 Citations
Explore all metrics

Abstract

Cross-project software defect prediction solves the problem of insufficient training data for traditional defect prediction, and overcomes the challenge of applying models learned from multiple different source projects to target project. At the same time, two new problems emerge: (1) too many irrelevant and redundant features in the model training process will affect the training efficiency and thus decrease the prediction accuracy of the model; (2) the distribution of metric values will vary greatly from project to project due to the development environment and other factors, resulting in lower prediction accuracy when the model achieves cross-project prediction. In the proposed method, the Pearson feature selection method is introduced to address data redundancy, and the metric compensation based transfer learning technique is used to address the problem of large differences in data distribution between the source project and target project. In this paper, we propose a software defect prediction method with metric compensation based on feature selection and transfer learning. The experimental results show that the model constructed with this method achieves better results on area under the receiver operating characteristic curve (AUC) value and F1-measure metric.

摘要

跨项目软件缺陷预测解决了传统缺陷预测中训练数据不足的问题, 克服了将多个不同源项目中学习的模型应用于目标项目的挑战。与此同时, 出现两个新问题: (1) 模型训练过程中过多无关和冗余特征影响训练效率, 降低了模型预测精度; (2) 由于开发环境等因素, 度量值的分布因项目而异, 当模型用于跨项目预测时, 预测精度较低。本文引入皮尔逊特征选择方法解决数据冗余问题, 采用基于迁移学习的度量补偿技术解决源项目和目标项目之间数据分布差异较大的问题。提出一种基于特征选择和迁移学习的度量补偿软件缺陷预测方法。实验结果表明, 用该方法构建的模型在AUC (接收器工作特性曲线下面积) 值和F1度量指标上取得较好结果。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-Project Software Defect Prediction Based on Feature Selection and Transfer Learning

Feature Representation and Feature Matching for Heterogeneous Defect Prediction

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Article 16 August 2017

References

Amasaki S, Kawata K, Yokogawa T, 2015. Improving cross-project defect prediction methods with data simplification. Proc 41^st Euromicro Conf on Software Engineering and Advanced Applications, p.96–103. https://doi.org/10.1109/SEAA.2015.25
Briand LC, Melo WL, Wüst J, 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng, 28(7):706–720. https://doi.org/10.1109/TSE.2002.1019484
Article Google Scholar
Cai JC, Xu K, Zhu YH, et al., 2020. Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl Energy, 262:114566. https://doi.org/10.1016/j.apenergy.2020.114566
Article Google Scholar
Chen JY, Yang YT, Hu KK, et al., 2019. Multiview transfer learning for software defect prediction. IEEE Access, 7:8901–8916. https://doi.org/10.1109/ACCESS.2018.2890733
Article Google Scholar
Chen JY, Hu KK, Yu Y, et al., 2020. Software visualization and deep transfer learning for effective software defect prediction. Proc ACM/IEEE 42^nd Int Conf on Software Engineering, p.578–589. https://doi.org/10.1145/3377811.3380389
Chen X, Zhao YQ, Wang QP, et al., 2018. MULTI: multi-objective effort-aware just-in-time software defect prediction. Inform Softw Technol, 93:1–13. https://doi.org/10.1016/j.infsof.2017.08.004
Article Google Scholar
Fukushima T, Kamei Y, McIntosh S, et al., 2014. An empirical study of just-in-time defect prediction using cross-project models. Proc 11^th Working Conf on Mining Software Repositories, p.172–181. https://doi.org/10.1145/2597073.2597075
Grimm LG, Nesselroade KP Jr, 2018. Statistical Applications for the Behavioral and Social Sciences (2^nd Ed.). John Wiley & Sons, Hoboken, USA.
Google Scholar
Guo YC, Shepperd M, Li N, 2018. Bridging effort-aware prediction and strong classification: a just-in-time software defect prediction study. Proc 40^th Int Conf on Software Engineering: Companion Proceeedings, p.325–326. https://doi.org/10.1145/3183440.3194992
Habibi PA, Amrizal V, Bahaweres RB, 2018. Cross-project defect prediction for web application using naive Bayes (case study: petstore web application). Proc Int Workshop on Big Data and Information Security, p.13–18. https://doi.org/10.1109/IWBIS.2018.8471701
Hall T, Beecham S, Bowes D, et al., 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 38(6):1276–1304. https://doi.org/10.1109/TSE.2011.103
Article Google Scholar
He P, Li B, Liu X, et al., 2015. An empirical study on software defect prediction with a simplified metric set. Inform Softw Technol, 59:170–190. https://doi.org/10.1016/j.infsof.2014.11.006
Article Google Scholar
Herbold S, Trautsch A, Grabowski J, 2018. A comparative study to benchmark cross-project defect prediction approaches. Proc 40^th Int Conf on Software Engineering, p.1063. https://doi.org/10.1145/3180155.3182542
Iqbal T, Cao Y, Kong QQ, et al., 2020. Learning with out-of-distribution data for audio classification. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.636–640. https://doi.org/10.1109/ICASSP40776.2020.9054444
Kamei Y, Fukushima T, McIntosh S, et al., 2016. Studying just-in-time defect prediction using cross-project models. Empir Softw Eng, 21(5):2072–2106. https://doi.org/10.1007/s10664-015-9400-x
Article Google Scholar
Li K, Xiang ZL, Chen T, et al., 2020a. BILO-CPDP: bi-level programming for automated model discovery in cross-project defect prediction. Proc 35^th IEEE/ACM Int Conf on Automated Software Engineering, p.573–584. https://doi.org/10.1145/3324884.3416617
Li K, Xiang ZL, Chen T, et al., 2020b. Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. Proc ACM/IEEE 42^nd Int Conf on Software Engineering, p.566–577. https://doi.org/10.1145/3377811.3380360
Liu C, Yang D, Xia X, et al., 2019. A two-phase transfer learning model for cross-project defect prediction. Inform Softw Technol, 107:125–136. https://doi.org/10.1016/j.infsof.2018.11.005
Article Google Scholar
Lv WD, 2019. Method and application of data defect analysis based on linear discriminant regression of far subspace. Cluster Comput, 22(2):4277–4282. https://doi.org/10.1007/s10586-018-1861-4
Article Google Scholar
Madeyski L, Jureczko M, 2015. Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J, 23(3):393–422. https://doi.org/10.1007/s11219-014-9241-7
Article Google Scholar
Malhotra R, 2015. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput, 27:504–518. https://doi.org/10.1016/j.asoc.2014.11.023
Article Google Scholar
Marian Z, Mircea IG, Czibula IG, et al., 2016. A novel approach for software defect prediction using fuzzy decision trees. Proc 18^th Int Symp on Symbolic and Numeric Algorithms for Scientific Computing, p.240–247. https://doi.org/10.1109/SYNASC.2016.046
McBride R, Wang K, Ren ZY, et al., 2019. Cost-sensitive learning to rank. Proc 33^rd AAAI Conf on Artificial Intelligence, p.4570–4577. https://doi.org/10.1609/aaai.v33i01.33014570
Nam J, Pan SJ, Kim S, 2013. Transfer defect learning. Proc 35^th Int Conf on Software Engineering, p.382–391. https://doi.org/10.1109/ICSE.2013.6606584
Peng ML, Zhang Q, Xing XY, et al., 2019. Trainable undersampling for class-imbalance learning. Proc 33^rd AAAI Conf on Artificial Intelligence, p.4707–4714. https://doi.org/10.1609/aaai.v33i01.33014707
Purnami SW, Trapsilasiwi RK, 2017. SMOTE-least square support vector machine for classification of multiclass imbalanced data. Proc 9^th Int Conf on Machine Learning and Computing, p.107–111. https://doi.org/10.1145/3055635.3056581
Rahman F, Devanbu P, 2013. How, and why, process metrics are better. Proc 35^th Int Conf on Software Engineering, p.432–441. https://doi.org/10.1109/ICSE.2013.6606589
Ryu D, Choi O, Baik J, 2014. Improving prediction robustness of VAB-SVM for cross-project defect prediction. Proc IEEE 17^th Int Conf on Computational Science and Engineering, p.994–999. https://doi.org/10.1109/CSE.2014.198
Ryu D, Choi O, Baik J, 2016. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng, 21(1):43–71. https://doi.org/10.1007/s10664-014-9346-4
Article Google Scholar
Ryu D, Jang JI, Baik J, 2017. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, 25(1):235–272. https://doi.org/10.1007/s11219-015-9287-1
Article Google Scholar
Saidi R, Bouaguel W, Essoussi N, 2019. Hybrid feature selection method based on the genetic algorithm and Pearson correlation coefficient. In: Hassanien AE (Ed.), Machine Learning Paradigms: Theory and Application. Springer, Cham, p.3–24. https://doi.org/10.1007/978-3-030-02357-7_1
Chapter Google Scholar
Shippey T, Bowes D, Hall T, 2019. Automatically identifying code features for software defect prediction: using AST N-grams. Inform Softw Technol, 106:142–160. https://doi.org/10.1016/j.infsof.2018.10.001
Article Google Scholar
Shuai B, Li HF, Li MJ, et al., 2013. Software defect prediction using dynamic support vector machine. Proc 9^th Int Conf on Computational Intelligence and Security, p.260–263. https://doi.org/10.1109/CIS.2013.61
Siers MJ, Islam Z, 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inform Syst, 51:62–71. https://doi.org/10.1016/j.is.2015.02.006
Article Google Scholar
Tabassum S, Minku LL, Feng DY, et al., 2020. An investigation of cross-project learning in online just-in-time software defect prediction. Proc ACM/IEEE 42^nd Int Conf on Software Engineering, p.554–565. https://doi.org/10.1145/3377811.3380403
Thejas GS, Garg R, Iyengar SS, et al., 2021. Metric and accuracy ranked feature inclusion: hybrids of filter and wrapper feature selection approaches. IEEE Access, 9:128687–128701. https://doi.org/10.1109/ACCESS.2021.3112169
Article Google Scholar
Tsuda N, Washizaki H, Honda K, et al., 2019. WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. Proc IEEE/ACM 41^st Int Conf on Software Engineering: Software Engineering in Practice, p.312–321. https://doi.org/10.1109/ICSE-SEIP.2019.00045
Wahono RS, 2015. A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J Softw Eng, 1(1):1–16.
Google Scholar
Wan ZY, Xia X, Hassan AE, et al., 2020. Perceptions, expectations, and challenges in defect prediction. IEEE Trans Softw Eng, 46(11):1241–1266. https://doi.org/10.1109/TSE.2018.2877678
Article Google Scholar
Wang HJ, Khoshgoftaar TM, Napolitano A, 2010. A comparative study of ensemble feature selection techniques for software defect prediction. Proc 9^th Int Conf on Machine Learning and Applications, p.135–140. https://doi.org/10.1109/ICMLA.2010.27
Watanabe S, Kaiya H, Kaijiri K, 2008. Adapting a fault prediction model to allow inter languagereuse. Proc 4^th Int Workshop on Predictor Models in Software Engineering, p.19–24. https://doi.org/10.1145/1370788.1370794
Wu F, Jing XY, Dong XW, et al., 2017. Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. Proc IEEE/ACM 39^th Int Conf on Software Engineering Companion, p.195–197. https://doi.org/10.1109/ICSE-C.2017.72
Yang XL, Lo D, Xia X, et al., 2017. TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inform Softw Technol, 87:206–220. https://doi.org/10.1016/j.infsof.2017.03.007
Article Google Scholar
Yu JL, Benesty J, Huang GP, et al., 2015. Optimal single-channel noise reduction filtering matrices from the Pearson correlation coefficient perspective. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.201–205. https://doi.org/10.1109/ICASSP.2015.7177960

Download references

Author information

Authors and Affiliations

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, China
Jinfu Chen (陈锦富), Xiaoli Wang (王小丽), Saihua Cai (蔡赛华), Jiaping Xu (徐家平), Jingyi Chen (陈静怡) & Haibo Chen (陈海波)
Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Jiangsu University, Zhenjiang, 212013, China
Jinfu Chen (陈锦富), Xiaoli Wang (王小丽) & Saihua Cai (蔡赛华)

Authors

Jinfu Chen (陈锦富)
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Wang (王小丽)
View author publications
You can also search for this author in PubMed Google Scholar
Saihua Cai (蔡赛华)
View author publications
You can also search for this author in PubMed Google Scholar
Jiaping Xu (徐家平)
View author publications
You can also search for this author in PubMed Google Scholar
Jingyi Chen (陈静怡)
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Chen (陈海波)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jinfu CHEN and Saihua CAI designed the research. Xiaoli WANG, Saihua CAI, and Jiaping XU processed the data. Jinfu CHEN, Xiaoli WANG, and Saihua CAI drafted the paper. Xiaoli WANG, Jiaping XU, Jingyi CHEN, and Haibo CHEN finished the experiments. Jingyi CHEN and Haibo CHEN helped organize the paper. Jinfu CHEN, Xiaoli WANG, and Saihua CAI revised and finalized the paper.

Corresponding author

Correspondence to Saihua Cai (蔡赛华).

Additional information

Compliance with ethics guidelines

Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, and Haibo CHEN declare that they have no conflict of interest.

Project supported by the National Natural Science Foundation of China (Nos. 62172194 and U1836116), the National Key R&D Program of China (No. 2020YFB1005500), the Leadingedge Technology Program of Jiangsu Provincial Natural Science Foundation, China (No. BK20202001), the China Postdoctoral Science Foundation (No. 2021M691310), the Postdoctoral Science Foundation of Jiangsu Province, China (No. 2021K636C), and the Future Network Scientific Research Fund Project, China (No. FNSRFP-2021-YB-50)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Wang, X., Cai, S. et al. A software defect prediction method with metric compensation based on feature selection and transfer learning. Front Inform Technol Electron Eng 23, 715–731 (2022). https://doi.org/10.1631/FITEE.2100468

Download citation

Received: 30 September 2021
Accepted: 05 February 2022
Published: 04 April 2022
Issue Date: May 2022
DOI: https://doi.org/10.1631/FITEE.2100468

Key words

关键词

CLC number

TP311.5

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A software defect prediction method with metric compensation based on feature selection and transfer learning

Abstract

摘要

Access this article

Similar content being viewed by others

Cross-Project Software Defect Prediction Based on Feature Selection and Transfer Learning

Feature Representation and Feature Matching for Heterogeneous Defect Prediction

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Navigation

A software defect prediction method with metric compensation based on feature selection and transfer learning

Abstract

摘要

Access this article

Similar content being viewed by others

Cross-Project Software Defect Prediction Based on Feature Selection and Transfer Learning

Feature Representation and Feature Matching for Heterogeneous Defect Prediction

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Search

Navigation