Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction

Zhang, Wenzhou; Li, Weiwei; Jia, Xiuyi

doi:10.1007/978-3-030-16145-3_23

Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction

Wenzhou Zhang¹⁹,
Weiwei Li²⁰ &
Xiuyi Jia^19,21

Conference paper
First Online: 22 March 2019

2359 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11440))

Abstract

In recent years, just-in-time (JIT) defect prediction has gained considerable interest as it enables developers to identify risky changes at check-in time. Previous studies tried to conduct research from both supervised and unsupervised perspectives. Since the label of change is hard to acquire, it would be more desirable for applications if a prediction model doesn’t highly rely on the label information. However, the performance of the unsupervised models proposed by previous work isn’t good in terms of precision and F1 due to the lack of supervised information. To overcome this weakness, we try to study the JIT defect prediction from the semi-supervised perspective, which only requires a few labeled data for training. In this paper, we propose an Effort-Aware Tri-Training (EATT) semi-supervised model for JIT defect prediction based on sample selection. We compare EATT with the state-of-the-art supervised and unsupervised models with respect to different labeled rates. The experimental results on six open-source projects demonstrate that EATT performs better than existing supervised and unsupervised models for effort-aware JIT defect prediction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
ACC denotes the recall of defect-inducing changes when using 20% of the entire effort to inspect the top ranked changes.
2.
\(P_{opt}\) is the normalized version of the effort-aware performance indicator based on the concept of the “code-churn-based” Alberg diagram. More details could be found in Sect. 4.5.
3.
The data and code used in this paper are available at https://github.com/NJUST-IDAM/EATT .

References

Angluin, D., Laird, P.D.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1987)
Google Scholar
Arshad, A., Riaz, S., Jiao, L., Murthy, A.: Semi-supervised deep fuzzy c-mean clustering for software fault prediction. IEEE Access 6, 25675–25685 (2018)
Article Google Scholar
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of COLT, pp. 92–100 (1998)
Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2006)
Article Google Scholar
Chen, X., Zhao, Y., Wang, Q., Yuan, Z.: MULTI: multi-objective effort-aware just-in-time software defect prediction. Inf. Softw. Tech. 93, 1–13 (2018)
Article Google Scholar
Fu, W., Menzies, T.: Revisiting unsupervised learning for defect prediction. In: ESEC/FSE, pp. 72–83 (2017)
Google Scholar
Hata, H., Mizuno, O., Kikuno, T.: Bug prediction based on fine-grained module histories. In: ICSE, pp. 200–210 (2012)
Google Scholar
Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: ICSME, pp. 159–170 (2017)
Google Scholar
Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J., Devine, L.: Exploring methods for evaluating group differences on the NSSE and other surveys: are the t-test and Cohen’s d indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research (2006)
Google Scholar
Jiang, Y., Li, M., Zhou, Z.: Software defect detection with rocus. J. Comput. Sci. Technol. 26(2), 328–342 (2011)
Article Google Scholar
Kamei, Y., Fukushima, T., McIntosh, S., Yamashita, K., Ubayashi, N., Hassan, A.E.: Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21(5), 2072–2106 (2016)
Article Google Scholar
Kamei, Y., et al.: A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39(6), 757–773 (2013)
Article Google Scholar
Li, M., Zhang, H., Wu, R., Zhou, Z.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)
Article Google Scholar
Li, W., Huang, Z., Li, Q.: Three-way decisions based software defect prediction. Knowl.-Based Syst. 91, 263–274 (2016)
Article Google Scholar
Li, Z., Jing, X., Zhu, X.: Progress on approaches to software defect prediction. IET Softw. 12(3), 161–175 (2018)
Article Google Scholar
Liu, J., Zhou, Y., Yang, Y., Lu, H., Xu, B.: Code churn: a neglected metric in effort-aware just-in-time defect prediction. In: ESEM, pp. 11–19 (2017)
Google Scholar
Lu, H., Cukic, B., Culp, M.V.: An iterative semi-supervised approach to software fault prediction. In: PROMISE, pp. 15:1–15:10 (2011)
Google Scholar
Lu, H., Cukic, B., Culp, M.V.: Software defect prediction using semi-supervised learning with dimension reduction. In: ASE, pp. 314–317 (2012)
Google Scholar
Mockus, A., Weiss, D.M.: Predicting risk of software changes. Bell Labs Tech. J. 5(2), 169–180 (2000)
Article Google Scholar
Song, Q., Jia, Z., Shepperd, M.J., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37(3), 356–370 (2011)
Article Google Scholar
Yang, X., Lo, D., Xia, X., Sun, J.: TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Tech. 87, 206–220 (2017)
Article Google Scholar
Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: QRS, pp. 17–26 (2015)
Google Scholar
Yang, Y., et al.: Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: FSE, pp. 157–168 (2016)
Google Scholar
Zhang, Z., Jing, X., Wang, T.: Label propagation based semi-supervised learning for software defect prediction. Autom. Softw. Eng. 24(1), 47–69 (2017)
Article Google Scholar
Zhou, Z., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Article Google Scholar
Zhou, Z., Li, M.: Semi-supervised learning by disagreement. Knowl. Inf. Syst. 24(3), 415–439 (2010)
Article MathSciNet Google Scholar
Zhu, X.: Semi-supervised learning. In: Encyclopedia of Machine Learning and Data Mining, pp. 1142–1147 (2017)
Google Scholar

Download references

Acknowledgment

This paper is supported by the National Natural Science Foundations of China (Grant Nos. 61773208, 71671086), the Natural Science Foundation of Jiangsu Province (Grant No. BK20170809) and the China Postdoctoral Science Foundation (Grant No. 2018YFB1003902).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Wenzhou Zhang & Xiuyi Jia
College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
Weiwei Li
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Xiuyi Jia

Authors

Wenzhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiuyi Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiuyi Jia .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W., Li, W., Jia, X. (2019). Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11440. Springer, Cham. https://doi.org/10.1007/978-3-030-16145-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-16145-3_23
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16144-6
Online ISBN: 978-3-030-16145-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics