ABSTRACT
Background: Many software bug prediction models have been proposed and evaluated on a set of well-known benchmark datasets. We conducted pilot studies on the widely used benchmark datasets and observed common issues among them. Specifically, most of existing benchmark datasets consist of randomly selected historical versions of software projects, which poses non-trivial threats to the validity of existing bug prediction studies since the real-world software projects often evolve continuously. Yet how to conduct software bug prediction in the real-world continuous software development scenarios is not well studied.
Aims: In this paper, to bridge the gap between current software bug prediction practice and real-world continuous software development, we propose new approaches to conduct bug prediction in real-world continuous software development regarding model building, updating, and evaluation.
Method: For model building, we propose ConBuild, which leverages distributional characteristics of bug prediction data to guide the training version selection. For model updating, we propose ConUpdate, which leverages the evolution of distributional characteristics of bug prediction data between versions to guide the reuse or update of bug prediction models in continuous software development. For model evaluation, we propose ConEA, which leverages the evolution of buggy probability of files between versions to conduct effort-aware evaluation.
Results: Experiments on 120 continuously release versions that span across six large-scale open-source software systems show the practical value of our approaches.
Conclusions: This paper provides new insights and guidelines for conducting software bug prediction in the context of continuous software development.
- Syed Nadeem Ahsan, Javed Ferzund, and Franz Wotawa. 2009. Program File Bug Fix Effort Estimation Using Machine Learning Methods for OSS.. In SEKE'09. 129--134.Google Scholar
- Sousuke Amasaki. 2017. On Applicability of Cross-project Defect Prediction Method for Multi-Versions Projects. In PROMISE'17. 93--96. Google ScholarDigital Library
- Sousuke Amasaki. 2018. Cross-Version Defect Prediction using Cross-Project Defect Prediction Approaches: Does it work?. In PROMISE'18. 32--41. Google ScholarDigital Library
- Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei, and Naoyasu Ubayashi. 2016. Investigating the effects of balanced training and testing datasets on effort-aware fault prediction models. In COMPSAC'16, Vol. 1. 154--163.Google ScholarCross Ref
- Kwabena Ebo Bennin, Koji Toda, Yasutaka Kamei, Jacky Keung, Akito Monden, and Naoyasu Ubayashi. 2016. Empirical evaluation of cross-release effort-aware defect prediction models. In QRS'16. 214--221.Google ScholarCross Ref
- George G Cabral, Leandro L Minku, Emad Shihab, and Suhaib Mujahid. 2019. Class imbalance evolution and verification latency in just-in-time software defect prediction. In ICSE'19. 666--676. Google ScholarDigital Library
- Maria Caulo and Giuseppe Scanniello. 2019. On the Use of Commit Messages to Support the Creation of Datasets for Fault Prediction: an Empirical Assessment. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). 193--198.Google ScholarCross Ref
- Marco D'Ambros, Michele Lanza, and Romain Robbes. 2010. An extensive comparison of bug prediction approaches. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). 31--41.Google ScholarCross Ref
- Marco D'Ambros, Michele Lanza, and Romain Robbes. 2012. Evaluating defect prediction approaches: a benchmark and an extensive comparison. EMSE'12 17, 4-5 (2012), 531--577. Google ScholarDigital Library
- Jayalath Ekanayake, Jonas Tappolet, Harald C Gall, and Abraham Bernstein. 2009. Tracking concept drift of software projects using defect prediction quality. In MSR'09. 51--60. Google ScholarDigital Library
- Karim O. Elish and Mahmoud O. Elish. 2008. Predicting defect-prone software modules using support vector machines. JSS'08 81, 5 (2008), 649--660. Google ScholarDigital Library
- Justin R Erenkrantz. 2003. Release management within open source projects. In Proc. 3rd. Workshop on Open Source Software Engineering. Google ScholarDigital Library
- Michael Fischer, Martin Pinzger, and Harald Gall. 2003. Populating a release history database from version control and bug tracking systems. In International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings. IEEE, 23--32. Google ScholarDigital Library
- Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1.Google Scholar
- Emanuel Giger, Marco D'Ambros, Martin Pinzger, and Harald C Gall. 2012. Method-level bug prediction. In ESEM'12. 171--180. Google ScholarDigital Library
- Emanuel Giger, Martin Pinzger, and Harald Gall. 2010. Predicting the fix time of bugs. In RSSE'10. 52--56. Google ScholarDigital Library
- Maurice H Halstead. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc. Google ScholarDigital Library
- Rachel Harrison, Steve J Counsell, and Reuben V Nithi. 1998. An evaluation of the MOOD set of object-oriented software metrics. TSE'98 24, 6 (1998), 491--496. Google ScholarDigital Library
- Ahmed E. Hassan. 2009. Predicting Faults Using the Complexity of Code Changes. In ICSE'09. 78--88. Google ScholarDigital Library
- Zhimin He, F. Peters, T. Menzies, and Ye Yang. 2013. Learning from Open-Source Projects: An Empirical Study on Defect Prediction. In ESEM'13. 45--54.Google ScholarCross Ref
- Zhimin He, Fengdi Shu, Ye Yang, Mingshu Li, and Qing Wang. 2012. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering 19, 2 (2012), 167--199. Google ScholarDigital Library
- Seyedrebvar Hosseini, Burak Turhan, and Mika Mäntylä. 2018. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. IST'18 95 (2018), 296--312.Google ScholarCross Ref
- Qiao Huang, Xin Xia, and David Lo. 2017. Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In ICSME'17. 159--170.Google ScholarCross Ref
- Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In ASE'13. 279--289. Google ScholarDigital Library
- Xiao-Yuan Jing, Shi Ying, Zhi-Wu Zhang, Shan-Shan Wu, and Jin Liu. 2014. Dictionary learning based software defect prediction. In ICSE'14. 414--423. Google ScholarDigital Library
- Marian Jureczko and Lech Madeyski. 2010. Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering. 9. Google ScholarDigital Library
- Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21, 5 (2016), 2072--2106. Google ScholarDigital Library
- Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2012. A large-scale empirical study of just-in-time quality assurance. TSE'12 39, 6 (2012), 757--773. Google ScholarDigital Library
- Yomi Kastro and Ayşe Basar Bener. 2008. A defect prediction method for software versioning. Software Quality Journal 16, 4 (2008), 543--562. Google ScholarDigital Library
- Mijung Kim, Jaechang Nam, Jaehyuk Yeon, Soonhwang Choi, and Sunghun Kim. 2015. REMI: defect prediction for efficient API testing. In FSE'15. 990--993. Google ScholarDigital Library
- Masanari Kondo, Daniel M German, Osamu Mizuno, and Eun-Hye Choi. 2019. The impact of context metrics on just-in-time defect prediction. EMSE'19 (2019), 1--50.Google Scholar
- Rahul Krishna, Amritanshu Agrawal, Akond Rahman, Alexander Sobran, and Tim Menzies. 2018. What is the connection between issues, bugs, and enhancements?: Lessons learned from 800+ software projects. In ICSE-SEIP'18. 306--315. Google ScholarDigital Library
- Taek Lee, Jaechang Nam, DongGyun Han, Sunghun Kim, and Hoh Peter In. 2011. Micro interaction metrics for defect prediction. In FSE '11. 311--321. Google ScholarDigital Library
- Erich L Lehmann and Joseph P Romano. 2006. Testing statistical hypotheses. Springer Science & Business Media.Google Scholar
- Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, and E James Whitehead. 2013. Does bug prediction support human developers? findings from a google case study. In ICSE'13. 372--381. Google ScholarDigital Library
- Myron Lipow. 1982. Number of faults per line of code. IEEE Transactions on software Engineering 4 (1982), 437--439. Google ScholarDigital Library
- Yibin Liu, Yanhui Li, Jianbo Guo, Yuming Zhou, and Baowen Xu. 2018. Connecting software metrics across versions to predict defects. In SANER'18. 232--243.Google ScholarCross Ref
- Kanti V Mardia. 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 3 (1970), 519--530.Google ScholarCross Ref
- Matias Martinez and Martin Monperrus. 2015. Mining software repair models for reasoning on the search space of automated program fixing. EMSE'15 20, 1 (2015), 176--205. Google ScholarDigital Library
- Thomas J McCabe. 1976. A complexity measure. TSE 76 4 (1976), 308--320. Google ScholarDigital Library
- Shane McIntosh and Yasutaka Kamei. 2017. Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. TSE'17 44, 5 (2017), 412--428. Google ScholarDigital Library
- Thilo Mende and Rainer Koschke. 2010. Effort-aware defect prediction models. In CSMR'10. 107--116. Google ScholarDigital Library
- Andrew Meneely, Pete Rotella, and Laurie Williams. [n.d.]. Does adding manpower also affect quality? an empirical, longitudinal analysis. In FSE'11. 81--90. Google ScholarDigital Library
- Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayşe Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. ASE'10 17, 4 (2010), 375--407. Google ScholarDigital Library
- Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In ICSE'08. 181--190. Google ScholarDigital Library
- Jerome L Myers, Arnold D Well, and Robert F Lorch Jr. 2013. Research design and statistical analysis.Google Scholar
- Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In ICSE'05. 284--292. Google ScholarDigital Library
- Nachiappan Nagappan and Thomas Ball. 2007. Using software dependencies and churn metrics to predict field failures: An empirical case study. In ESEM'07. 364--373. Google ScholarDigital Library
- Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In ICSE'06. 452--461. Google ScholarDigital Library
- Jaechang Nam, Wei Fu, Sunghun Kim, Tim Menzies, and Lin Tan. 2017. Heterogeneous defect prediction. TSE'17 (2017).Google Scholar
- Jaechang Nam and Sunghun Kim. 2015. CLAMI: Defect Prediction on Unlabeled Datasets. In ASE'15. 452--463. Google ScholarDigital Library
- Mangasarian Olvi and Glenn Fung. 2000. Data selection for support vector machine classifiers. Technical Report.Google Scholar
- Thomas J Ostrand, Elaine J Weyuker, and Robert M Bell. 2005. Predicting the location and number of faults in large software systems. TSE'05 31, 4 (2005), 340--355. Google ScholarDigital Library
- Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In FSE'08. 2--12. Google ScholarDigital Library
- Jens C Pruessner, Clemens Kirschbaum, Gunther Meinlschmid, and Dirk H Hellhammer. 2003. Two formulas for computation of the area under the curve represent measures of total hormone concentration versus time-dependent change. Psychoneuroendocrinology 28, 7 (2003), 916--931.Google ScholarCross Ref
- Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In ICSE'13. 432--441. Google ScholarDigital Library
- Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. [n.d.]. Sample size vs. bias in defect prediction. In FSE'13. 147--157. Google ScholarDigital Library
- Pete Rotella and Sunita Chulani. [n.d.]. Implementing quality metrics and goals at the corporate level. In MSR'11. 113--122. Google ScholarDigital Library
- Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2015. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. JCST'15 30, 5 (2015), 969--980.Google ScholarCross Ref
- Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering 39, 9 (2013), 1208--1215. Google ScholarDigital Library
- Emad Shihab, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2013. Is lines of code a good measure of effort in effort-aware models? IST'13 55, 11 (2013), 1981--1993. Google ScholarDigital Library
- Thomas Shippey, David Bowes, and Tracy Hall. 2019. Automatically identifying code features for software defect prediction: Using AST N-grams. IST'19 106 (2019), 142--160.Google ScholarCross Ref
- Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?. In FSE'05, Vol. 30. 1--5. Google ScholarDigital Library
- Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online Defect Prediction for Imbalanced Data. In ICSE'15. 99--108. Google ScholarDigital Library
- Chakkrit Tantithamthavorn and Ahmed E Hassan. [n.d.]. An experience report on defect modelling in practice: Pitfalls and challenges. In ICSE-SEIP'18. 286--295. Google ScholarDigital Library
- Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In ICSE'16. 321--332. Google ScholarDigital Library
- Burak Turhan, Tim Menzies, Ayşe B Bener, and Justin Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. EMSE'09 14, 5 (2009), 540--578. Google ScholarDigital Library
- Junjie Wang, Song Wang, Jianfeng Chen, Tim Menzies, Qiang Cui, Miao Xie, and Qing Wang. 2019. Characterizing crowds to better optimize worker recommendation in crowdsourced testing. IEEE Transactions on Software Engineering (2019).Google ScholarCross Ref
- Junjie Wang, Song Wang, Qiang Cui, and Qing Wang. 2016. Local-based active classification of test report to assist crowdsourced testing. In ASE'16. 190--201. Google ScholarDigital Library
- Song Wang, Chetan Bansal, Nachiappan Nagappan, and Adithya Abraham Philip. 2019. Leveraging change intents for characterizing and identifying large-review-effort changes. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering. 46--55. Google ScholarDigital Library
- Song Wang, Taiyue Liu, Jaechang Nam, and Lin Tan. 2018. Deep semantic feature learning for software defect prediction. IEEE Transactions on Software Engineering 46, 12 (2018), 1267--1293.Google ScholarCross Ref
- Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In ICSE'16. 297--308. Google ScholarDigital Library
- Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann. Google ScholarDigital Library
- Rongxin Wu, Hongyu Zhang, Sunghun Kim, and Shing-Chi Cheung. 2011. Relink: recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 15--25. Google ScholarDigital Library
- Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In QRS'15. 17--26. Google ScholarDigital Library
- Yibiao Yang, Mark Harman, Jens Krinke, Syed Islam, David Binkley, Yuming Zhou, and Baowen Xu. 2016. An empirical study on dependence clusters for effort-aware fault-proneness prediction. In ASE'16. 296--307. Google ScholarDigital Library
- Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In FSE'16. 157--168. Google ScholarDigital Library
- Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn. 2019. Mining software defects: should we consider affected releases?. In ICSE'19. 654--665. Google ScholarDigital Library
- Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2014. Towards building a universal defect prediction model. In MSR'04. 182--191. Google ScholarDigital Library
- Feng Zhang, Quan Zheng, Ying Zou, and Ahmed E Hassan. 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In ICSE'16. 309--320. Google ScholarDigital Library
- Jie Zhang, Jiajing Wu, Chuan Chen, Zibin Zheng, and Michael R Lyu. 2020. CDS: A Cross-Version Software Defect Prediction Model With Data Selection. IEEE Access 8 (2020), 110059--110072.Google ScholarCross Ref
- Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In ICSE'08. 531--540. Google ScholarDigital Library
- Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. [n.d.]. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In FSE'09. 91--100. Google ScholarDigital Library
- Thomas Zimmermann, Nachiappan Nagappan, and Laurie Williams. 2010. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In ICST'10. 421--428. Google ScholarDigital Library
- Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting defects for eclipse. In PROMISE'07. 9--9. Google ScholarDigital Library
- Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D Ernst, and Lu Zhang. 2019. An empirical study of fault localization families and their combinations. TSE'19 (2019).Google ScholarDigital Library
Index Terms
- Continuous Software Bug Prediction
Recommendations
Progress on approaches to software defect prediction
Software defect prediction is one of the most popular research topics in software engineering. It aims to predict defect‐prone software modules before defects are discovered, therefore it can be used to better prioritise software quality assurance effort. ...
Issues-Driven features for software fault prediction
Abstract Context:Software systems are an integral part of almost every modern industry. Unfortunately, the more complex the software, the more likely it will fail. A promising strategy is applying fault prediction models to predict ...
Empirical Evaluation of Hunk Metrics as Bug Predictors
IWSM '09 /Mensura '09: Proceedings of the International Conferences on Software Process and Product MeasurementReducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers ...
Comments