skip to main content
research-article

Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction

Published: 18 April 2024 Publication History

Abstract

The aim of Just-In-Time (JIT) defect prediction is to predict software changes that are prone to defects in a project in a timely manner, thereby improving the efficiency of software development and ensuring software quality. Identifying changes that introduce bugs is a critical task in just-in-time defect prediction, and researchers have introduced the SZZ approach and its variants to label these changes. However, it has been shown that different SZZ algorithms introduce noise to the dataset to a certain extent, which may reduce the predictive performance of the model. To address this limitation, we propose the Confident Learning Imbalance (CLI) model. The model identifies and excludes samples whose labels may be corrupted by estimating the joint distribution of noisy labels and true labels, and mitigates the impact of noisy data on the performance of the prediction model. The CLI consists of two components: identifying noisy data (Confident Learning Component) and generating a predicted probability matrix for imbalanced data (Imbalanced Data Probabilistic Prediction Component). The IDPP component generates precise predicted probabilities for each instance in the training set, while the CL component uses the generated predicted probability matrix and noise labels to clean up the noise and build a classification model. We evaluate the performance of our model through extensive experiments on a total of 126,526 changes from ten Apache open source projects, and the results show that our model outperforms the baseline methods.

References

[1]
2021. CLI Details. (2021). Retrieved from https://github.com/Andyldm/CLI/
[2]
Dana Angluin and Philip D. Laird. 1987. Learning from noisy examples. Machine Learning 2, 4 (1987), 343–370. DOI:
[3]
Yan M Xia X. Cai L., Fan Y. R. 2019. Just-in-time software defect prediction: Literature review. Ruan Jian Xue Bao/Journal of Software 30, 5 (2019), 1288–1307. Retrieved from http://www.jos.org.cn/1000-9825/5713.html
[4]
Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E. Hassan. 2017. A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering 43, 7 (2017), 641–657. DOI:
[5]
Marco D’Ambros, Michele Lanza, and Romain Robbes. 2012. Evaluating defect prediction approaches: A benchmark and an extensive comparison. Empirical Software Engineering 17, 4-5 (2012), 531–577. DOI:
[6]
Falessi Davide, Ahluwalia Aalok, and Penta Massimiliano Di. 2022. The impact of dormant defects on defect prediction: A study of 19 apache projects. ACM Transactions on Software Engineering and Methodology (2022).
[7]
Charles Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI 2001. Bernhard Nebel (Ed.), Morgan Kaufmann, 973–978.
[8]
Yuanrui Fan, Xin Xia, Daniel Alencar da Costa, David Lo, Ahmed E. Hassan, and Shanping Li. 2021. The impact of mislabeled changes by SZZ on just-in-time defect prediction. IEEE Transactions on Software Engineering 47, 8 (2021), 1559–1586. DOI:
[9]
Jiawei Han and Micheline Kamber. 2006. Data Mining: Concepts and Techniques, Second Edition. Elsevier.
[10]
Jiangfan Han, Ping Luo, and Xiaogang Wang. 2019. Deep self-learning from noisy labels. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019. IEEE, 5137–5146. DOI:
[11]
Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1263–1284. DOI:
[12]
Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019. Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc (Eds.), IEEE/ACM, 34–45. DOI:
[13]
Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. CC2Vec: Distributed representations of code changes. In Proceedings of the ICSE ’20: 42nd International Conference on Software Engineering. Gregg Rothermel and Doo-Hwan Bae (Eds.), ACM, 518–529. DOI:
[14]
Jinchi Huang, Lie Qu, Rongfei Jia, and Binqiang Zhao. 2019. O2U-Net: A simple noisy label detection approach for deep neural networks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019. IEEE, 3325–3333. DOI:
[15]
Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013. Ewen Denney, Tevfik Bultan, and Andreas Zeller (Eds.), IEEE, 279–289. DOI:
[16]
Yasutaka Kamei, Shinsuke Matsumoto, Akito Monden, Ken-ichi Matsumoto, Bram Adams, and Ahmed E. Hassan. 2010. Revisiting common bug prediction findings using effort-aware models. In Proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM 2010). Timisoara, Romania, Radu Marinescu, Michele Lanza, and Andrian Marcus (Eds.), IEEE Computer Society, 1–10. DOI:
[17]
Yasutaka Kamei, Akito Monden, Shinsuke Matsumoto, Takeshi Kakimoto, and Ken-ichi Matsumoto. 2007. The effects of over and under sampling on fault-prone module detection. In Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007. ACM/IEEE Computer Society, 196–204. DOI:
[18]
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEETransactions on Software Engineering 39, 6 (2013), 757–773. DOI:
[19]
Taghi M. Khoshgoftaar, Xiaojing Yuan, and Edward B. Allen. 2000. Balancing misclassification rates in classification-tree models of software quality. Empirical Software Engineering 5, 4 (2000), 313–330.
[20]
Sunghun Kim, E. James Whitehead Jr., and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering 34, 2 (2008), 181–196. DOI:
[21]
Sunghun Kim, Hongyu Zhang, Rongxin Wu, and Liang Gong. 2011. Dealing with noise in defect prediction. ACM International Conference on Software Engineering (2011).
[22]
Sunghun Kim, Thomas Zimmermann, Kai Pan, and E. James Whitehead Jr.2006. Automatic identification of bug-introducing changes. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE 2006). IEEE Computer Society, 81–90. DOI:
[23]
Akif Günes Koru, Dongsong Zhang, Khaled El Emam, and Hongfang Liu. 2009. An investigation into the functional form of the size-defect relationship for software modules. IEEE Transactions on Software Engineering 35, 2 (2009), 293–304. DOI:
[24]
Zachary C. Lipton, Yu-Xiang Wang, and Alexander J. Smola. 2018. Detecting and correcting for label shift with black box predictors. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018.Jennifer G. Dy and Andreas Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, PMLR, 3128–3136. Retrieved from http://proceedings.mlr.press/v80/lipton18a.html
[25]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008). IEEE Computer Society, 413–422. DOI:
[26]
Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayse Basar Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering 17, 4 (2010), 375–407. DOI:
[27]
Roberto Minelli, Andrea Mocci, and Michele Lanza. 2015. I know what you did last summer: An investigation of how developers spend their time. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension, ICPC 2015. Andrea De Lucia, Christian Bird, and Rocco Oliveto (Eds.), IEEE Computer Society, 25–35. DOI:
[28]
Audris Mockus and David M. Weiss. 2000. Predicting risk of software changes. Bell Labs Technical Journal 5, 2 (2000), 169–180. DOI:
[29]
Edmilson Campos Neto, Daniel Alencar da Costa, and Uirá Kulesza. 2018. The impact of refactoring changes on the SZZ algorithm: An empirical study. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018. Rocco Oliveto, Massimiliano Di Penta, and David C. Shepherd (Eds.), IEEE Computer Society, 380–390. DOI:
[30]
Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang. 2019. Confident learning: Estimating uncertainty in dataset labels. arXiv:1911.00068. Retrieved from https://arxiv.org/abs/1911.00068
[31]
Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2021. JITLine: A simpler, better, faster, finer-grained just-in-time defect prediction. In Proceedings of the 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021. IEEE, 369–379. DOI:
[32]
Sophia Quach, Maxime Lamothe, Yasutaka Kamei, and Weiyi Shang. 2021. An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs. Empirical Software Engineering 26, 4 (2021), 71. DOI:
[33]
G. Rodríguez-Pérez, M. Nagappan, and G. Robles. 2020. Watch out for extrinsic bugs! a case study of their impact in just-in-time bug prediction models on the openstack project. IEEE Transactions on Software Engineering (2020).
[34]
Hongbo Shi and Yali Lü. 2007. Investigation of the effects of factor analysis based dimension reduction on classification performance. Zhongbei Daxue Xuebao (Ziran Kexue Ban)/Journal of North University of China (Natural Science Edition) 28, 6 (2007), 662–677.
[35]
Liu Shiran, Zhaoqiang Guo, Yanhui Li, Chuanqi Wang, Lin Chen, Zhongbin Sun, and Yuming Zhou.2022. An extensive empirical study of inconsistent labels in multi-version-project defect datasets. arXiv.org (2022).
[36]
Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes? ACM SIGSOFT Software Engineering Notes 30, 4 (2005), 1–5. DOI:
[37]
Swanson and E. Burton. Software Maintenance Management:. Software Maintenance Management:.
[38]
Burak Turhan, Tim Menzies, Ayse Basar Bener, and Justin S. Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14, 5 (2009), 540–578. DOI:
[39]
Colin Wei, Jason D. Lee, Qiang Liu, and Tengyu Ma. 2018. On the margin theory of feedforward neural networks. arXiv:1810.05369. Retrieved from https://arxiv.org/abs/1810.05369
[40]
Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87 (2017), 206–220. DOI:
[41]
Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security, QRS 2015. IEEE, 17–26. DOI:
[42]
Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016.Thomas Zimmermann, Jane Cleland-Huang, and Zhendong Su (Eds.), ACM, 157–168. DOI:
[43]
Xingquan Zhu and Xindong Wu. 2004. Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review 22, 3 (2004), 177–210.
[44]
Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th International Conference on Software Engineering (ICSE 2008). Wilhelm Schäfer, Matthew B. Dwyer, and Volker Gruhn (Eds.), ACM, 531–540. DOI:

Cited By

View all
  • (2024)An Empirical Study on Just-in-time Conformal Defect PredictionProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644928(88-99)Online publication date: 15-Apr-2024

Index Terms

  1. Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 4
    May 2024
    940 pages
    EISSN:1557-7392
    DOI:10.1145/3613665
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 April 2024
    Online AM: 11 December 2023
    Accepted: 01 December 2023
    Revised: 29 August 2023
    Received: 14 February 2022
    Published in TOSEM Volume 33, Issue 4

    Check for updates

    Author Tags

    1. Just-in-time defect prediction
    2. SZZ tools
    3. confident learning
    4. imbalance

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Dalian Excellent Young Project
    • Postgraduate Education Reform Project of Liaoning Province
    • Fundamental Research Funds for the Central Universities
    • Dalian Maritime University Teacher Development Project
    • China Higher Education Association 2023 Higher Education Scientific Research

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)367
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An Empirical Study on Just-in-time Conformal Defect PredictionProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644928(88-99)Online publication date: 15-Apr-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media