Skip to main content

Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11440))

Abstract

In recent years, just-in-time (JIT) defect prediction has gained considerable interest as it enables developers to identify risky changes at check-in time. Previous studies tried to conduct research from both supervised and unsupervised perspectives. Since the label of change is hard to acquire, it would be more desirable for applications if a prediction model doesn’t highly rely on the label information. However, the performance of the unsupervised models proposed by previous work isn’t good in terms of precision and F1 due to the lack of supervised information. To overcome this weakness, we try to study the JIT defect prediction from the semi-supervised perspective, which only requires a few labeled data for training. In this paper, we propose an Effort-Aware Tri-Training (EATT) semi-supervised model for JIT defect prediction based on sample selection. We compare EATT with the state-of-the-art supervised and unsupervised models with respect to different labeled rates. The experimental results on six open-source projects demonstrate that EATT performs better than existing supervised and unsupervised models for effort-aware JIT defect prediction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    ACC denotes the recall of defect-inducing changes when using 20% of the entire effort to inspect the top ranked changes.

  2. 2.

    \(P_{opt}\) is the normalized version of the effort-aware performance indicator based on the concept of the “code-churn-based” Alberg diagram. More details could be found in Sect. 4.5.

  3. 3.

    The data and code used in this paper are available at https://github.com/NJUST-IDAM/EATT .

References

  1. Angluin, D., Laird, P.D.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1987)

    Google Scholar 

  2. Arshad, A., Riaz, S., Jiao, L., Murthy, A.: Semi-supervised deep fuzzy c-mean clustering for software fault prediction. IEEE Access 6, 25675–25685 (2018)

    Article  Google Scholar 

  3. Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of COLT, pp. 92–100 (1998)

    Google Scholar 

  4. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2006)

    Article  Google Scholar 

  5. Chen, X., Zhao, Y., Wang, Q., Yuan, Z.: MULTI: multi-objective effort-aware just-in-time software defect prediction. Inf. Softw. Tech. 93, 1–13 (2018)

    Article  Google Scholar 

  6. Fu, W., Menzies, T.: Revisiting unsupervised learning for defect prediction. In: ESEC/FSE, pp. 72–83 (2017)

    Google Scholar 

  7. Hata, H., Mizuno, O., Kikuno, T.: Bug prediction based on fine-grained module histories. In: ICSE, pp. 200–210 (2012)

    Google Scholar 

  8. Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: ICSME, pp. 159–170 (2017)

    Google Scholar 

  9. Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J., Devine, L.: Exploring methods for evaluating group differences on the NSSE and other surveys: are the t-test and Cohen’s d indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research (2006)

    Google Scholar 

  10. Jiang, Y., Li, M., Zhou, Z.: Software defect detection with rocus. J. Comput. Sci. Technol. 26(2), 328–342 (2011)

    Article  Google Scholar 

  11. Kamei, Y., Fukushima, T., McIntosh, S., Yamashita, K., Ubayashi, N., Hassan, A.E.: Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21(5), 2072–2106 (2016)

    Article  Google Scholar 

  12. Kamei, Y., et al.: A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39(6), 757–773 (2013)

    Article  Google Scholar 

  13. Li, M., Zhang, H., Wu, R., Zhou, Z.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)

    Article  Google Scholar 

  14. Li, W., Huang, Z., Li, Q.: Three-way decisions based software defect prediction. Knowl.-Based Syst. 91, 263–274 (2016)

    Article  Google Scholar 

  15. Li, Z., Jing, X., Zhu, X.: Progress on approaches to software defect prediction. IET Softw. 12(3), 161–175 (2018)

    Article  Google Scholar 

  16. Liu, J., Zhou, Y., Yang, Y., Lu, H., Xu, B.: Code churn: a neglected metric in effort-aware just-in-time defect prediction. In: ESEM, pp. 11–19 (2017)

    Google Scholar 

  17. Lu, H., Cukic, B., Culp, M.V.: An iterative semi-supervised approach to software fault prediction. In: PROMISE, pp. 15:1–15:10 (2011)

    Google Scholar 

  18. Lu, H., Cukic, B., Culp, M.V.: Software defect prediction using semi-supervised learning with dimension reduction. In: ASE, pp. 314–317 (2012)

    Google Scholar 

  19. Mockus, A., Weiss, D.M.: Predicting risk of software changes. Bell Labs Tech. J. 5(2), 169–180 (2000)

    Article  Google Scholar 

  20. Song, Q., Jia, Z., Shepperd, M.J., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37(3), 356–370 (2011)

    Article  Google Scholar 

  21. Yang, X., Lo, D., Xia, X., Sun, J.: TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Tech. 87, 206–220 (2017)

    Article  Google Scholar 

  22. Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: QRS, pp. 17–26 (2015)

    Google Scholar 

  23. Yang, Y., et al.: Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: FSE, pp. 157–168 (2016)

    Google Scholar 

  24. Zhang, Z., Jing, X., Wang, T.: Label propagation based semi-supervised learning for software defect prediction. Autom. Softw. Eng. 24(1), 47–69 (2017)

    Article  Google Scholar 

  25. Zhou, Z., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  26. Zhou, Z., Li, M.: Semi-supervised learning by disagreement. Knowl. Inf. Syst. 24(3), 415–439 (2010)

    Article  MathSciNet  Google Scholar 

  27. Zhu, X.: Semi-supervised learning. In: Encyclopedia of Machine Learning and Data Mining, pp. 1142–1147 (2017)

    Google Scholar 

Download references

Acknowledgment

This paper is supported by the National Natural Science Foundations of China (Grant Nos. 61773208, 71671086), the Natural Science Foundation of Jiangsu Province (Grant No. BK20170809) and the China Postdoctoral Science Foundation (Grant No. 2018YFB1003902).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiuyi Jia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, W., Li, W., Jia, X. (2019). Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11440. Springer, Cham. https://doi.org/10.1007/978-3-030-16145-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16145-3_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16144-6

  • Online ISBN: 978-3-030-16145-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics