skip to main content
10.1145/3475716.3475790acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Continuous Software Bug Prediction

Published:11 October 2021Publication History

ABSTRACT

Background: Many software bug prediction models have been proposed and evaluated on a set of well-known benchmark datasets. We conducted pilot studies on the widely used benchmark datasets and observed common issues among them. Specifically, most of existing benchmark datasets consist of randomly selected historical versions of software projects, which poses non-trivial threats to the validity of existing bug prediction studies since the real-world software projects often evolve continuously. Yet how to conduct software bug prediction in the real-world continuous software development scenarios is not well studied.

Aims: In this paper, to bridge the gap between current software bug prediction practice and real-world continuous software development, we propose new approaches to conduct bug prediction in real-world continuous software development regarding model building, updating, and evaluation.

Method: For model building, we propose ConBuild, which leverages distributional characteristics of bug prediction data to guide the training version selection. For model updating, we propose ConUpdate, which leverages the evolution of distributional characteristics of bug prediction data between versions to guide the reuse or update of bug prediction models in continuous software development. For model evaluation, we propose ConEA, which leverages the evolution of buggy probability of files between versions to conduct effort-aware evaluation.

Results: Experiments on 120 continuously release versions that span across six large-scale open-source software systems show the practical value of our approaches.

Conclusions: This paper provides new insights and guidelines for conducting software bug prediction in the context of continuous software development.

References

  1. Syed Nadeem Ahsan, Javed Ferzund, and Franz Wotawa. 2009. Program File Bug Fix Effort Estimation Using Machine Learning Methods for OSS.. In SEKE'09. 129--134.Google ScholarGoogle Scholar
  2. Sousuke Amasaki. 2017. On Applicability of Cross-project Defect Prediction Method for Multi-Versions Projects. In PROMISE'17. 93--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sousuke Amasaki. 2018. Cross-Version Defect Prediction using Cross-Project Defect Prediction Approaches: Does it work?. In PROMISE'18. 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei, and Naoyasu Ubayashi. 2016. Investigating the effects of balanced training and testing datasets on effort-aware fault prediction models. In COMPSAC'16, Vol. 1. 154--163.Google ScholarGoogle ScholarCross RefCross Ref
  5. Kwabena Ebo Bennin, Koji Toda, Yasutaka Kamei, Jacky Keung, Akito Monden, and Naoyasu Ubayashi. 2016. Empirical evaluation of cross-release effort-aware defect prediction models. In QRS'16. 214--221.Google ScholarGoogle ScholarCross RefCross Ref
  6. George G Cabral, Leandro L Minku, Emad Shihab, and Suhaib Mujahid. 2019. Class imbalance evolution and verification latency in just-in-time software defect prediction. In ICSE'19. 666--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Maria Caulo and Giuseppe Scanniello. 2019. On the Use of Commit Messages to Support the Creation of Datasets for Fault Prediction: an Empirical Assessment. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 193--198.Google ScholarGoogle ScholarCross RefCross Ref
  8. Marco D'Ambros, Michele Lanza, and Romain Robbes. 2010. An extensive comparison of bug prediction approaches. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). 31--41.Google ScholarGoogle ScholarCross RefCross Ref
  9. Marco D'Ambros, Michele Lanza, and Romain Robbes. 2012. Evaluating defect prediction approaches: a benchmark and an extensive comparison. EMSE'12 17, 4-5 (2012), 531--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jayalath Ekanayake, Jonas Tappolet, Harald C Gall, and Abraham Bernstein. 2009. Tracking concept drift of software projects using defect prediction quality. In MSR'09. 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Karim O. Elish and Mahmoud O. Elish. 2008. Predicting defect-prone software modules using support vector machines. JSS'08 81, 5 (2008), 649--660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Justin R Erenkrantz. 2003. Release management within open source projects. In Proc. 3rd. Workshop on Open Source Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michael Fischer, Martin Pinzger, and Harald Gall. 2003. Populating a release history database from version control and bug tracking systems. In International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings. IEEE, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1.Google ScholarGoogle Scholar
  15. Emanuel Giger, Marco D'Ambros, Martin Pinzger, and Harald C Gall. 2012. Method-level bug prediction. In ESEM'12. 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Emanuel Giger, Martin Pinzger, and Harald Gall. 2010. Predicting the fix time of bugs. In RSSE'10. 52--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Maurice H Halstead. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rachel Harrison, Steve J Counsell, and Reuben V Nithi. 1998. An evaluation of the MOOD set of object-oriented software metrics. TSE'98 24, 6 (1998), 491--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ahmed E. Hassan. 2009. Predicting Faults Using the Complexity of Code Changes. In ICSE'09. 78--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhimin He, F. Peters, T. Menzies, and Ye Yang. 2013. Learning from Open-Source Projects: An Empirical Study on Defect Prediction. In ESEM'13. 45--54.Google ScholarGoogle ScholarCross RefCross Ref
  21. Zhimin He, Fengdi Shu, Ye Yang, Mingshu Li, and Qing Wang. 2012. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering 19, 2 (2012), 167--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Seyedrebvar Hosseini, Burak Turhan, and Mika Mäntylä. 2018. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. IST'18 95 (2018), 296--312.Google ScholarGoogle ScholarCross RefCross Ref
  23. Qiao Huang, Xin Xia, and David Lo. 2017. Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In ICSME'17. 159--170.Google ScholarGoogle ScholarCross RefCross Ref
  24. Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In ASE'13. 279--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xiao-Yuan Jing, Shi Ying, Zhi-Wu Zhang, Shan-Shan Wu, and Jin Liu. 2014. Dictionary learning based software defect prediction. In ICSE'14. 414--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Marian Jureczko and Lech Madeyski. 2010. Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering. 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21, 5 (2016), 2072--2106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2012. A large-scale empirical study of just-in-time quality assurance. TSE'12 39, 6 (2012), 757--773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yomi Kastro and Ayşe Basar Bener. 2008. A defect prediction method for software versioning. Software Quality Journal 16, 4 (2008), 543--562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mijung Kim, Jaechang Nam, Jaehyuk Yeon, Soonhwang Choi, and Sunghun Kim. 2015. REMI: defect prediction for efficient API testing. In FSE'15. 990--993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Masanari Kondo, Daniel M German, Osamu Mizuno, and Eun-Hye Choi. 2019. The impact of context metrics on just-in-time defect prediction. EMSE'19 (2019), 1--50.Google ScholarGoogle Scholar
  32. Rahul Krishna, Amritanshu Agrawal, Akond Rahman, Alexander Sobran, and Tim Menzies. 2018. What is the connection between issues, bugs, and enhancements?: Lessons learned from 800+ software projects. In ICSE-SEIP'18. 306--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Taek Lee, Jaechang Nam, DongGyun Han, Sunghun Kim, and Hoh Peter In. 2011. Micro interaction metrics for defect prediction. In FSE '11. 311--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Erich L Lehmann and Joseph P Romano. 2006. Testing statistical hypotheses. Springer Science & Business Media.Google ScholarGoogle Scholar
  35. Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, and E James Whitehead. 2013. Does bug prediction support human developers? findings from a google case study. In ICSE'13. 372--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Myron Lipow. 1982. Number of faults per line of code. IEEE Transactions on software Engineering 4 (1982), 437--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yibin Liu, Yanhui Li, Jianbo Guo, Yuming Zhou, and Baowen Xu. 2018. Connecting software metrics across versions to predict defects. In SANER'18. 232--243.Google ScholarGoogle ScholarCross RefCross Ref
  38. Kanti V Mardia. 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 3 (1970), 519--530.Google ScholarGoogle ScholarCross RefCross Ref
  39. Matias Martinez and Martin Monperrus. 2015. Mining software repair models for reasoning on the search space of automated program fixing. EMSE'15 20, 1 (2015), 176--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Thomas J McCabe. 1976. A complexity measure. TSE 76 4 (1976), 308--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shane McIntosh and Yasutaka Kamei. 2017. Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. TSE'17 44, 5 (2017), 412--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Thilo Mende and Rainer Koschke. 2010. Effort-aware defect prediction models. In CSMR'10. 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Andrew Meneely, Pete Rotella, and Laurie Williams. [n.d.]. Does adding manpower also affect quality? an empirical, longitudinal analysis. In FSE'11. 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayşe Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. ASE'10 17, 4 (2010), 375--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In ICSE'08. 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jerome L Myers, Arnold D Well, and Robert F Lorch Jr. 2013. Research design and statistical analysis.Google ScholarGoogle Scholar
  47. Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In ICSE'05. 284--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Nachiappan Nagappan and Thomas Ball. 2007. Using software dependencies and churn metrics to predict field failures: An empirical case study. In ESEM'07. 364--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In ICSE'06. 452--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jaechang Nam, Wei Fu, Sunghun Kim, Tim Menzies, and Lin Tan. 2017. Heterogeneous defect prediction. TSE'17 (2017).Google ScholarGoogle Scholar
  51. Jaechang Nam and Sunghun Kim. 2015. CLAMI: Defect Prediction on Unlabeled Datasets. In ASE'15. 452--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mangasarian Olvi and Glenn Fung. 2000. Data selection for support vector machine classifiers. Technical Report.Google ScholarGoogle Scholar
  53. Thomas J Ostrand, Elaine J Weyuker, and Robert M Bell. 2005. Predicting the location and number of faults in large software systems. TSE'05 31, 4 (2005), 340--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In FSE'08. 2--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jens C Pruessner, Clemens Kirschbaum, Gunther Meinlschmid, and Dirk H Hellhammer. 2003. Two formulas for computation of the area under the curve represent measures of total hormone concentration versus time-dependent change. Psychoneuroendocrinology 28, 7 (2003), 916--931.Google ScholarGoogle ScholarCross RefCross Ref
  56. Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In ICSE'13. 432--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. [n.d.]. Sample size vs. bias in defect prediction. In FSE'13. 147--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Pete Rotella and Sunita Chulani. [n.d.]. Implementing quality metrics and goals at the corporate level. In MSR'11. 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2015. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. JCST'15 30, 5 (2015), 969--980.Google ScholarGoogle ScholarCross RefCross Ref
  60. Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering 39, 9 (2013), 1208--1215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Emad Shihab, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2013. Is lines of code a good measure of effort in effort-aware models? IST'13 55, 11 (2013), 1981--1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Thomas Shippey, David Bowes, and Tracy Hall. 2019. Automatically identifying code features for software defect prediction: Using AST N-grams. IST'19 106 (2019), 142--160.Google ScholarGoogle ScholarCross RefCross Ref
  63. Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?. In FSE'05, Vol. 30. 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online Defect Prediction for Imbalanced Data. In ICSE'15. 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Chakkrit Tantithamthavorn and Ahmed E Hassan. [n.d.]. An experience report on defect modelling in practice: Pitfalls and challenges. In ICSE-SEIP'18. 286--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In ICSE'16. 321--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Burak Turhan, Tim Menzies, Ayşe B Bener, and Justin Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. EMSE'09 14, 5 (2009), 540--578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Junjie Wang, Song Wang, Jianfeng Chen, Tim Menzies, Qiang Cui, Miao Xie, and Qing Wang. 2019. Characterizing crowds to better optimize worker recommendation in crowdsourced testing. IEEE Transactions on Software Engineering (2019).Google ScholarGoogle ScholarCross RefCross Ref
  69. Junjie Wang, Song Wang, Qiang Cui, and Qing Wang. 2016. Local-based active classification of test report to assist crowdsourced testing. In ASE'16. 190--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Song Wang, Chetan Bansal, Nachiappan Nagappan, and Adithya Abraham Philip. 2019. Leveraging change intents for characterizing and identifying large-review-effort changes. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering. 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Song Wang, Taiyue Liu, Jaechang Nam, and Lin Tan. 2018. Deep semantic feature learning for software defect prediction. IEEE Transactions on Software Engineering 46, 12 (2018), 1267--1293.Google ScholarGoogle ScholarCross RefCross Ref
  72. Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In ICSE'16. 297--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Rongxin Wu, Hongyu Zhang, Sunghun Kim, and Shing-Chi Cheung. 2011. Relink: recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 15--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In QRS'15. 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Yibiao Yang, Mark Harman, Jens Krinke, Syed Islam, David Binkley, Yuming Zhou, and Baowen Xu. 2016. An empirical study on dependence clusters for effort-aware fault-proneness prediction. In ASE'16. 296--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In FSE'16. 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn. 2019. Mining software defects: should we consider affected releases?. In ICSE'19. 654--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2014. Towards building a universal defect prediction model. In MSR'04. 182--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Feng Zhang, Quan Zheng, Ying Zou, and Ahmed E Hassan. 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In ICSE'16. 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Jie Zhang, Jiajing Wu, Chuan Chen, Zibin Zheng, and Michael R Lyu. 2020. CDS: A Cross-Version Software Defect Prediction Model With Data Selection. IEEE Access 8 (2020), 110059--110072.Google ScholarGoogle ScholarCross RefCross Ref
  82. Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In ICSE'08. 531--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. [n.d.]. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In FSE'09. 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Thomas Zimmermann, Nachiappan Nagappan, and Laurie Williams. 2010. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In ICST'10. 421--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting defects for eclipse. In PROMISE'07. 9--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D Ernst, and Lu Zhang. 2019. An empirical study of fault localization families and their combinations. TSE'19 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Continuous Software Bug Prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
        October 2021
        368 pages
        ISBN:9781450386654
        DOI:10.1145/3475716

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader