research-article

Continuous Software Bug Prediction

Authors:
Song Wang

York University, Toronto, Canada

York University, Toronto, Canada
View Profile

,
Junjie Wang

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Jaechang Nam

Handong Global University, Pohang, South Korea

Handong Global University, Pohang, South Korea
View Profile

,
Nachiappan Nagappan

Microsoft Research, Redmond, USA

Microsoft Research, Redmond, USA
View Profile

ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)October 2021Article No.: 14Pages 1–12https://doi.org/10.1145/3475716.3475790

Published:11 October 2021Publication History

ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Pages 1–12

ABSTRACT

Background: Many software bug prediction models have been proposed and evaluated on a set of well-known benchmark datasets. We conducted pilot studies on the widely used benchmark datasets and observed common issues among them. Specifically, most of existing benchmark datasets consist of randomly selected historical versions of software projects, which poses non-trivial threats to the validity of existing bug prediction studies since the real-world software projects often evolve continuously. Yet how to conduct software bug prediction in the real-world continuous software development scenarios is not well studied.

Aims: In this paper, to bridge the gap between current software bug prediction practice and real-world continuous software development, we propose new approaches to conduct bug prediction in real-world continuous software development regarding model building, updating, and evaluation.

Method: For model building, we propose ConBuild, which leverages distributional characteristics of bug prediction data to guide the training version selection. For model updating, we propose ConUpdate, which leverages the evolution of distributional characteristics of bug prediction data between versions to guide the reuse or update of bug prediction models in continuous software development. For model evaluation, we propose ConEA, which leverages the evolution of buggy probability of files between versions to conduct effort-aware evaluation.

Results: Experiments on 120 continuously release versions that span across six large-scale open-source software systems show the practical value of our approaches.

Conclusions: This paper provides new insights and guidelines for conducting software bug prediction in the context of continuous software development.

References

Syed Nadeem Ahsan, Javed Ferzund, and Franz Wotawa. 2009. Program File Bug Fix Effort Estimation Using Machine Learning Methods for OSS.. In SEKE'09. 129--134.Google Scholar
Sousuke Amasaki. 2017. On Applicability of Cross-project Defect Prediction Method for Multi-Versions Projects. In PROMISE'17. 93--96. Google ScholarDigital Library
Sousuke Amasaki. 2018. Cross-Version Defect Prediction using Cross-Project Defect Prediction Approaches: Does it work?. In PROMISE'18. 32--41. Google ScholarDigital Library
Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei, and Naoyasu Ubayashi. 2016. Investigating the effects of balanced training and testing datasets on effort-aware fault prediction models. In COMPSAC'16, Vol. 1. 154--163.Google ScholarCross Ref
Kwabena Ebo Bennin, Koji Toda, Yasutaka Kamei, Jacky Keung, Akito Monden, and Naoyasu Ubayashi. 2016. Empirical evaluation of cross-release effort-aware defect prediction models. In QRS'16. 214--221.Google ScholarCross Ref
George G Cabral, Leandro L Minku, Emad Shihab, and Suhaib Mujahid. 2019. Class imbalance evolution and verification latency in just-in-time software defect prediction. In ICSE'19. 666--676. Google ScholarDigital Library
Maria Caulo and Giuseppe Scanniello. 2019. On the Use of Commit Messages to Support the Creation of Datasets for Fault Prediction: an Empirical Assessment. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 193--198.Google ScholarCross Ref
Marco D'Ambros, Michele Lanza, and Romain Robbes. 2010. An extensive comparison of bug prediction approaches. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). 31--41.Google ScholarCross Ref
Marco D'Ambros, Michele Lanza, and Romain Robbes. 2012. Evaluating defect prediction approaches: a benchmark and an extensive comparison. EMSE'12 17, 4-5 (2012), 531--577. Google ScholarDigital Library
Jayalath Ekanayake, Jonas Tappolet, Harald C Gall, and Abraham Bernstein. 2009. Tracking concept drift of software projects using defect prediction quality. In MSR'09. 51--60. Google ScholarDigital Library
Karim O. Elish and Mahmoud O. Elish. 2008. Predicting defect-prone software modules using support vector machines. JSS'08 81, 5 (2008), 649--660. Google ScholarDigital Library
Justin R Erenkrantz. 2003. Release management within open source projects. In Proc. 3rd. Workshop on Open Source Software Engineering. Google ScholarDigital Library
Michael Fischer, Martin Pinzger, and Harald Gall. 2003. Populating a release history database from version control and bug tracking systems. In International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings. IEEE, 23--32. Google ScholarDigital Library
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1.Google Scholar
Emanuel Giger, Marco D'Ambros, Martin Pinzger, and Harald C Gall. 2012. Method-level bug prediction. In ESEM'12. 171--180. Google ScholarDigital Library
Emanuel Giger, Martin Pinzger, and Harald Gall. 2010. Predicting the fix time of bugs. In RSSE'10. 52--56. Google ScholarDigital Library
Maurice H Halstead. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc. Google ScholarDigital Library
Rachel Harrison, Steve J Counsell, and Reuben V Nithi. 1998. An evaluation of the MOOD set of object-oriented software metrics. TSE'98 24, 6 (1998), 491--496. Google ScholarDigital Library
Ahmed E. Hassan. 2009. Predicting Faults Using the Complexity of Code Changes. In ICSE'09. 78--88. Google ScholarDigital Library
Zhimin He, F. Peters, T. Menzies, and Ye Yang. 2013. Learning from Open-Source Projects: An Empirical Study on Defect Prediction. In ESEM'13. 45--54.Google ScholarCross Ref
Zhimin He, Fengdi Shu, Ye Yang, Mingshu Li, and Qing Wang. 2012. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering 19, 2 (2012), 167--199. Google ScholarDigital Library
Seyedrebvar Hosseini, Burak Turhan, and Mika Mäntylä. 2018. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. IST'18 95 (2018), 296--312.Google ScholarCross Ref
Qiao Huang, Xin Xia, and David Lo. 2017. Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In ICSME'17. 159--170.Google ScholarCross Ref
Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In ASE'13. 279--289. Google ScholarDigital Library
Xiao-Yuan Jing, Shi Ying, Zhi-Wu Zhang, Shan-Shan Wu, and Jin Liu. 2014. Dictionary learning based software defect prediction. In ICSE'14. 414--423. Google ScholarDigital Library
Marian Jureczko and Lech Madeyski. 2010. Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering. 9. Google ScholarDigital Library
Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21, 5 (2016), 2072--2106. Google ScholarDigital Library
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2012. A large-scale empirical study of just-in-time quality assurance. TSE'12 39, 6 (2012), 757--773. Google ScholarDigital Library
Yomi Kastro and Ayşe Basar Bener. 2008. A defect prediction method for software versioning. Software Quality Journal 16, 4 (2008), 543--562. Google ScholarDigital Library
Mijung Kim, Jaechang Nam, Jaehyuk Yeon, Soonhwang Choi, and Sunghun Kim. 2015. REMI: defect prediction for efficient API testing. In FSE'15. 990--993. Google ScholarDigital Library
Masanari Kondo, Daniel M German, Osamu Mizuno, and Eun-Hye Choi. 2019. The impact of context metrics on just-in-time defect prediction. EMSE'19 (2019), 1--50.Google Scholar
Rahul Krishna, Amritanshu Agrawal, Akond Rahman, Alexander Sobran, and Tim Menzies. 2018. What is the connection between issues, bugs, and enhancements?: Lessons learned from 800+ software projects. In ICSE-SEIP'18. 306--315. Google ScholarDigital Library
Taek Lee, Jaechang Nam, DongGyun Han, Sunghun Kim, and Hoh Peter In. 2011. Micro interaction metrics for defect prediction. In FSE '11. 311--321. Google ScholarDigital Library
Erich L Lehmann and Joseph P Romano. 2006. Testing statistical hypotheses. Springer Science & Business Media.Google Scholar
Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, and E James Whitehead. 2013. Does bug prediction support human developers? findings from a google case study. In ICSE'13. 372--381. Google ScholarDigital Library
Myron Lipow. 1982. Number of faults per line of code. IEEE Transactions on software Engineering 4 (1982), 437--439. Google ScholarDigital Library
Yibin Liu, Yanhui Li, Jianbo Guo, Yuming Zhou, and Baowen Xu. 2018. Connecting software metrics across versions to predict defects. In SANER'18. 232--243.Google ScholarCross Ref
Kanti V Mardia. 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 3 (1970), 519--530.Google ScholarCross Ref
Matias Martinez and Martin Monperrus. 2015. Mining software repair models for reasoning on the search space of automated program fixing. EMSE'15 20, 1 (2015), 176--205. Google ScholarDigital Library
Thomas J McCabe. 1976. A complexity measure. TSE 76 4 (1976), 308--320. Google ScholarDigital Library
Shane McIntosh and Yasutaka Kamei. 2017. Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. TSE'17 44, 5 (2017), 412--428. Google ScholarDigital Library
Thilo Mende and Rainer Koschke. 2010. Effort-aware defect prediction models. In CSMR'10. 107--116. Google ScholarDigital Library
Andrew Meneely, Pete Rotella, and Laurie Williams. [n.d.]. Does adding manpower also affect quality? an empirical, longitudinal analysis. In FSE'11. 81--90. Google ScholarDigital Library
Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayşe Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. ASE'10 17, 4 (2010), 375--407. Google ScholarDigital Library
Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In ICSE'08. 181--190. Google ScholarDigital Library
Jerome L Myers, Arnold D Well, and Robert F Lorch Jr. 2013. Research design and statistical analysis.Google Scholar
Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In ICSE'05. 284--292. Google ScholarDigital Library
Nachiappan Nagappan and Thomas Ball. 2007. Using software dependencies and churn metrics to predict field failures: An empirical case study. In ESEM'07. 364--373. Google ScholarDigital Library
Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In ICSE'06. 452--461. Google ScholarDigital Library
Jaechang Nam, Wei Fu, Sunghun Kim, Tim Menzies, and Lin Tan. 2017. Heterogeneous defect prediction. TSE'17 (2017).Google Scholar
Jaechang Nam and Sunghun Kim. 2015. CLAMI: Defect Prediction on Unlabeled Datasets. In ASE'15. 452--463. Google ScholarDigital Library
Mangasarian Olvi and Glenn Fung. 2000. Data selection for support vector machine classifiers. Technical Report.Google Scholar
Thomas J Ostrand, Elaine J Weyuker, and Robert M Bell. 2005. Predicting the location and number of faults in large software systems. TSE'05 31, 4 (2005), 340--355. Google ScholarDigital Library
Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In FSE'08. 2--12. Google ScholarDigital Library
Jens C Pruessner, Clemens Kirschbaum, Gunther Meinlschmid, and Dirk H Hellhammer. 2003. Two formulas for computation of the area under the curve represent measures of total hormone concentration versus time-dependent change. Psychoneuroendocrinology 28, 7 (2003), 916--931.Google ScholarCross Ref
Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In ICSE'13. 432--441. Google ScholarDigital Library
Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. [n.d.]. Sample size vs. bias in defect prediction. In FSE'13. 147--157. Google ScholarDigital Library
Pete Rotella and Sunita Chulani. [n.d.]. Implementing quality metrics and goals at the corporate level. In MSR'11. 113--122. Google ScholarDigital Library
Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2015. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. JCST'15 30, 5 (2015), 969--980.Google ScholarCross Ref
Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering 39, 9 (2013), 1208--1215. Google ScholarDigital Library
Emad Shihab, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2013. Is lines of code a good measure of effort in effort-aware models? IST'13 55, 11 (2013), 1981--1993. Google ScholarDigital Library
Thomas Shippey, David Bowes, and Tracy Hall. 2019. Automatically identifying code features for software defect prediction: Using AST N-grams. IST'19 106 (2019), 142--160.Google ScholarCross Ref
Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?. In FSE'05, Vol. 30. 1--5. Google ScholarDigital Library
Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online Defect Prediction for Imbalanced Data. In ICSE'15. 99--108. Google ScholarDigital Library
Chakkrit Tantithamthavorn and Ahmed E Hassan. [n.d.]. An experience report on defect modelling in practice: Pitfalls and challenges. In ICSE-SEIP'18. 286--295. Google ScholarDigital Library
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In ICSE'16. 321--332. Google ScholarDigital Library
Burak Turhan, Tim Menzies, Ayşe B Bener, and Justin Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. EMSE'09 14, 5 (2009), 540--578. Google ScholarDigital Library
Junjie Wang, Song Wang, Jianfeng Chen, Tim Menzies, Qiang Cui, Miao Xie, and Qing Wang. 2019. Characterizing crowds to better optimize worker recommendation in crowdsourced testing. IEEE Transactions on Software Engineering (2019).Google ScholarCross Ref
Junjie Wang, Song Wang, Qiang Cui, and Qing Wang. 2016. Local-based active classification of test report to assist crowdsourced testing. In ASE'16. 190--201. Google ScholarDigital Library
Song Wang, Chetan Bansal, Nachiappan Nagappan, and Adithya Abraham Philip. 2019. Leveraging change intents for characterizing and identifying large-review-effort changes. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering. 46--55. Google ScholarDigital Library
Song Wang, Taiyue Liu, Jaechang Nam, and Lin Tan. 2018. Deep semantic feature learning for software defect prediction. IEEE Transactions on Software Engineering 46, 12 (2018), 1267--1293.Google ScholarCross Ref
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In ICSE'16. 297--308. Google ScholarDigital Library
Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann. Google ScholarDigital Library
Rongxin Wu, Hongyu Zhang, Sunghun Kim, and Shing-Chi Cheung. 2011. Relink: recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 15--25. Google ScholarDigital Library
Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In QRS'15. 17--26. Google ScholarDigital Library
Yibiao Yang, Mark Harman, Jens Krinke, Syed Islam, David Binkley, Yuming Zhou, and Baowen Xu. 2016. An empirical study on dependence clusters for effort-aware fault-proneness prediction. In ASE'16. 296--307. Google ScholarDigital Library
Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In FSE'16. 157--168. Google ScholarDigital Library
Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn. 2019. Mining software defects: should we consider affected releases?. In ICSE'19. 654--665. Google ScholarDigital Library
Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2014. Towards building a universal defect prediction model. In MSR'04. 182--191. Google ScholarDigital Library
Feng Zhang, Quan Zheng, Ying Zou, and Ahmed E Hassan. 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In ICSE'16. 309--320. Google ScholarDigital Library
Jie Zhang, Jiajing Wu, Chuan Chen, Zibin Zheng, and Michael R Lyu. 2020. CDS: A Cross-Version Software Defect Prediction Model With Data Selection. IEEE Access 8 (2020), 110059--110072.Google ScholarCross Ref
Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In ICSE'08. 531--540. Google ScholarDigital Library
Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. [n.d.]. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In FSE'09. 91--100. Google ScholarDigital Library
Thomas Zimmermann, Nachiappan Nagappan, and Laurie Williams. 2010. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In ICST'10. 421--428. Google ScholarDigital Library
Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting defects for eclipse. In PROMISE'07. 9--9. Google ScholarDigital Library
Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D Ernst, and Lu Zhang. 2019. An empirical study of fault localization families and their combinations. TSE'19 (2019).Google ScholarDigital Library

Index Terms

Continuous Software Bug Prediction
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Empirical software validation
      2. Software defect analysis
        Software testing and debugging

Recommendations

Progress on approaches to software defect prediction

Software defect prediction is one of the most popular research topics in software engineering. It aims to predict defect‐prone software modules before defects are discovered, therefore it can be used to better prioritise software quality assurance effort. ...
Read More
Issues-Driven features for software fault prediction
Abstract Context:
Software systems are an integral part of almost every modern industry. Unfortunately, the more complex the software, the more likely it will fail. A promising strategy is applying fault prediction models to predict ...
Read More
Empirical Evaluation of Hunk Metrics as Bug Predictors
IWSM '09 /Mensura '09: Proceedings of the International Conferences on Software Process and Product Measurement

Reducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
October 2021
368 pages
ISBN:9781450386654
DOI:10.1145/3475716
General Chair:
Filippo Lanubile
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Empirical software engineering
continuous software development
software defect prediction
software quality
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 261
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Continuous Software Bug Prediction

ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Progress on approaches to software defect prediction

Issues-Driven features for software fault prediction

Empirical Evaluation of Hunk Metrics as Bug Predictors