ABSTRACT
Over the years, a plethora of works has proposed more and more sophisticated machine learning techniques to improve fault prediction models. However, past studies using product metrics from closed-source projects, found a ceiling effect in the performance of fault prediction models. On the other hand, other studies have shown that process metrics are significantly better than product metrics for fault prediction. In our case study therefore we build models that include both product and process metrics taken together. We find that the ceiling effect found in prior studies exists even when we consider process metrics. We then qualitatively investigate the bug reports, source code files, and commit information for the bugs in the files that are false-negative in our fault prediction models trained using product and process metrics. Surprisingly, our qualitative analysis shows that bugs related to false-negative files and true-positive files are similar in terms of root causes, impact and affected components, and consequently such similarities might be exploited to enhance fault prediction models.
- Erik Arisholm, Lionel C. Briand, and Eivind B. Johannessen. 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83, 1 (1 2010), 2--17.Google ScholarDigital Library
- N. Bettenburg, M. Nagappan, and A. E. Hassan. 2012. Think locally, act globally: Improving defect and effort prediction models. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 60--69.Google Scholar
- Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In Proceedings - International Symposium on Software Reliability Engineering, ISSRE. IEEE, 109--119.Google ScholarDigital Library
- Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarDigital Library
- Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google Scholar
- G. Carrozza, D. Cotroneo, R. Natella, R. Pietrantuono, and S. Russo. 2013. Analysis and Prediction of Mandelbugs in an Industrial Software System. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 262--271. Google ScholarDigital Library
- Cagatay Catal and Banu Diri. 2009. A systematic review of software fault prediction studies. Expert Systems with Applications 36, 4 (5 2009), 7346--7354.Google Scholar
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research (2002), 321--357.Google Scholar
- Tse-Hsun Chen, Meiyappan Nagappan, Emad Shihab, and Ahmed E Hassan. 2014. An empirical study of dormant bugs. In Proceedings of the 11th Working Conference on Mining Software Repositories, Vol. undefined. ACM Press, New York, New York, USA, 82--91.Google ScholarDigital Library
- Domenico Cotroneo, Roberto Natella, and Roberto Pietrantuono. 2013. Predicting Aging-related Bugs Using Software Complexity Metrics. Perform. Eval. 70, 3 (March 2013), 163--178. Google ScholarDigital Library
- Marco DâĂŹAmbros, Michele Lanza, and Romain Robbes. 2011. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering 17, 4--5 (8 2011), 531--577.Google Scholar
- L. Erlikh. 2000. Leveraging legacy system dollars for e-business. IT Professional 2, 3(0 2000), 17--23.Google Scholar
- Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for Software Analytics: is it Really Necessary? Information and Software Technology (4 2016). Google ScholarDigital Library
- Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. (5 2015), 789--800.Google Scholar
- Emanuel Giger, Marco D'Ambros, Martin Pinzger, and Harald C. Gall. 2012. Method-level bug prediction. In Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM '12. ACM Press, New York, New York, USA, 171. Google ScholarDigital Library
- Emanuel Giger, Martin Pinzger, and Harald C. Gall. 2011. Comparing fine-grained source code changes and code churn for bug prediction. In Proceeding of the 8th working conference on Mining software repositories - MSR '11. ACM Press, New York, New York, USA, 83. Google ScholarDigital Library
- T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy. 2000. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering 26, 7 (7 2000), 653--661.Google ScholarDigital Library
- Lan Guo, Yan Ma, Bojan Cukic, and Harshinder Singh. 2004. Robust Prediction of Fault-Proneness by Random Forests. In Proceedings - International Symposium on Software Reliability Engineering, ISSRE. IEEE, 417--428.Google Scholar
- T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering 38, 6 (11 2012), 1276--1304.Google ScholarDigital Library
- A.E. Hassan and R.C. Holt. 2005. The top ten list: dynamic fault prediction. In 21st IEEE International Conference on Software Maintenance (ICSM'05). IEEE, 263--272. Google ScholarDigital Library
- Ahmed E. Hassan. 2009. Predicting faults using the complexity of code changes. In Proceedings - International Conference on Software Engineering. IEEE, 78--88.Google ScholarDigital Library
- Kim Herzig, Sascha Just, Andreas Rau, and Andreas Zeller. 2013. Predicting defects using change genealogies. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 118--127. Google ScholarCross Ref
- Kim Sebastian Herzig. 2013. Mining and untangling change genealogies. Ph.D. Dissertation. SaarbruÌĹcken, UniversitaÌĹt des Saarlandes, Diss., 2013.Google Scholar
- N. Jovanovic, C. Kruegel, and E. Kirda. 2006. Pixy: a static analysis tool for detecting Web application vulnerabilities. In 2006 IEEE Symposium on Security and Privacy (S&P'06). IEEE, 6 pp. 263. Google ScholarDigital Library
- Yasutaka Kamei, Shinsuke Matsumoto, Akito Monden, Ken-ichi Matsumoto, Bram Adams, and Ahmed E. Hassan. 2010. Revisiting common bug prediction findings using effort-aware models. In 2010 IEEE International Conference on Software Maintenance. IEEE, 1--10. Google ScholarDigital Library
- Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (6 2013), 757--773.Google ScholarDigital Library
- Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis 53, 11 (9 2009), 3735--3745.Google Scholar
- Sunghun Kim, E. James Whitehead, and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering 34, 2 (3 2008), 181--196.Google ScholarDigital Library
- Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. 2007. Predicting Faults from Cached History. In 29th International Conference on Software Engineering (ICSE'07). IEEE, 489--498. Google ScholarDigital Library
- Max Kuhn. 2014. Futility Analysis in the Cross-Validation of Machine Learning Models. (5 2014), 22.Google Scholar
- Max Kuhn. 2016. Caret package. (2016). http://cran.r-project.org/package=caretGoogle Scholar
- S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. 2008. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering 34, 4 (7 2008), 485--496.Google ScholarDigital Library
- Yi Liu, Taghi M. Khoshgoftaar, and Naeem Seliya. 2010. Evolutionary Optimization of Software Quality Modeling with Multiple Repositories. IEEE Transactions on Software Engineering 36, 6 (11 2010), 852--864.Google ScholarDigital Library
- Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca a. Popa, and Yuanyuan Zhou. 2007. MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. ACM SIGOPS Operating Systems Review 41, 6 (10 2007), 103.Google ScholarDigital Library
- Wanwangying Ma, Lin Chen, Yibiao Yang, Yuming Zhou, and Baowen Xu. 2015. Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology (9 2015).Google Scholar
- Thilo Mende. 2010. Replication of defect prediction studies. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering - PROMISE '10. ACM Press, New York, New York, USA, 1. Google ScholarDigital Library
- Thilo Mende and Rainer Koschke. 2009. Revisiting the evaluation of defect prediction models. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE '09. ACM Press, New York, New York, USA, 1.Google ScholarDigital Library
- T Mende and R Koschke. 2010. Effort-Aware Defect Prediction Models. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 107--116. Google ScholarDigital Library
- Thilo Mende, Rainer Koschke, and Marek Leszak. 2009. Evaluating Defect Prediction Models for a Large Evolving Software System. In 2009 13th European Conference on Software Maintenance and Reengineering. IEEE, 247--250. Google ScholarDigital Library
- Tim Menzies, Jeremy Greenwald, and Art Frank. 2007. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering 33, 1 (1 2007), 2--13.Google Scholar
- Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayŧe Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering 17, 4 (5 2010), 375--407.Google Scholar
- Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, and Yue Jiang. 2008. Implications of ceiling effects in defect predictors. In Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE '08. ACM Press, New York, New York, USA, 47. Google ScholarDigital Library
- Audris Mockus and David M. Weiss. 2000. Predicting risk of software changes. Bell Labs Technical Journal 5, 2(8 2000), 169--180.Google Scholar
- Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Limits of Static Analysis for Malware Detection. In Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007). IEEE, 421--430. Google Scholar
- Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 13th international conference on Software engineering - ICSE '08. ACM Press, New York, New York, USA, 181. Google ScholarDigital Library
- Nachiappan. Nagappan and Thomas. Ball. 2005. Static analysis tools as early indicators of pre-release defect density. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. ACM Press, New York, New York, USA, 580.Google Scholar
- N. Nagappan and T. Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. IEEe, 284--292.Google Scholar
- Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In Proceeding of the 28th international conference on Software engineering - ICSE '06. ACM Press, New York, New York, USA, 452. Google ScholarDigital Library
- Rahul Premraj and Kim Herzig. 2011. Network Versus Code Metrics to Predict Defects: A Replication Study. In 2011 International Symposium on Empirical Software Engineering and Measurement. IEEE, 215--224. Google ScholarDigital Library
- Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 432--441. Google ScholarCross Ref
- Foyzur Rahman, Sameer Khatri, Earl T. Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering - ICSE 2014. ACM Press, New York, New York, USA, 424--434. Google ScholarDigital Library
- Foyzur Rahman, Daryl Posnett, and Premkumar Devanbu. 2012. Recalling the "imprecision" of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12. ACM Press, New York, New York, USA, 1. Google ScholarDigital Library
- Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. 2013. Sample size vs. bias in defect prediction. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2013. ACM Press, New York, New York, USA, 147.Google ScholarDigital Library
- Emad Shihab, Ahmed E. Hassan, Bram Adams, and Zhen Ming Jiang. 2012. An industrial study on the risk of software changes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12. ACM Press, New York, New York, USA, 1. Google ScholarDigital Library
- Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2011. High-impact defects: a study of breakage and surprise defects. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM Press, New York, New York, USA, 300. Google ScholarDigital Library
- Rainer Storn and Kenneth Price. 1997. Differential Evolution - A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization 11, 4 (1997), 341--359. Google ScholarDigital Library
- Lin Tan, Chen Liu, Zhenmin Li, Xuanhui Wang, Yuanyuan Zhou, and Chengxiang Zhai. 2013. Bug characteristics in open source software. Empirical Software Engineering 19, 6 (6 2013), 1665--1705.Google Scholar
- Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. Automated Parameter Optimization of Classification techniques for Defect Prediction Models. In Proc. of the International Conference on Software Engineering (ICSE). To appear.Google ScholarDigital Library
- Luis Torgo. 2013. Package DMwR. (2013). https://cran.r-project.org/package=DMwRGoogle Scholar
- Ayse Tosun and Ayse Bener. 2009. Reducing false alarms in software defect prediction by decision threshold optimization. In 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE, 477--480. Google ScholarDigital Library
- Harold Valdivia-Garcia and Meiyappan Nagappan. 2017. The Characteristics of False-Negatives in File-level Fault Prediction. (2017). https://github.com/harold-valdivia-garcia/fp-in-bug-pred/blob/master/false-negative-dp-appx.pdfGoogle Scholar
- Xin Xia, David Lo, Sinno Jialin Pan, Nachiappan Nagappan, and Xinyu Wang. 2016. HYDRA: Massively Compositional Model for Cross-Project Defect Prediction. IEEE Transactions on Software Engineering 42, 10 (2016), 977--998. Google ScholarDigital Library
- Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87 (2017), 206 - 220. Google ScholarDigital Library
- Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2011. Security versus performance bugs. In Proceeding of the 8th working conference on Mining software repositories - MSR '11. ACM Press, New York, New York, USA, 93. Google ScholarDigital Library
- Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In Proceedings of the 13th international conference on Software engineering - ICSE '08. ACM Press, New York, New York, USA, 531. Google ScholarDigital Library
- Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM Press, New York, New York, USA, 91. Google ScholarDigital Library
- Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting Defects for Eclipse. In Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007). LEEE, 9--9. Google ScholarDigital Library
Index Terms
- The Characteristics of False-Negatives in File-level Fault Prediction
Recommendations
Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project
ESEM '10: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and MeasurementResearch studying the quality of software applications continues to grow rapidly with researchers building regression models that combine a large number of metrics. However, these models are hard to deploy in practice due to the cost associated with ...
Characterizing and predicting blocking bugs in open source projects
MSR 2014: Proceedings of the 11th Working Conference on Mining Software RepositoriesAs software becomes increasingly important, its quality becomes an increasingly important issue. Therefore, prior work focused on software quality and proposed many prediction models to identify the location of software bugs, to estimate their fixing-...
An exploratory study of bug prediction at the method level
Abstract Context:During the past decades, researchers have proposed numerous studies to predict bugs at different granularity levels, such as the file level, package level, module level, etc. However, the prediction models at the ...
Comments