skip to main content
10.1145/3127005.3127013acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

The Characteristics of False-Negatives in File-level Fault Prediction

Published:08 November 2017Publication History

ABSTRACT

Over the years, a plethora of works has proposed more and more sophisticated machine learning techniques to improve fault prediction models. However, past studies using product metrics from closed-source projects, found a ceiling effect in the performance of fault prediction models. On the other hand, other studies have shown that process metrics are significantly better than product metrics for fault prediction. In our case study therefore we build models that include both product and process metrics taken together. We find that the ceiling effect found in prior studies exists even when we consider process metrics. We then qualitatively investigate the bug reports, source code files, and commit information for the bugs in the files that are false-negative in our fault prediction models trained using product and process metrics. Surprisingly, our qualitative analysis shows that bugs related to false-negative files and true-positive files are similar in terms of root causes, impact and affected components, and consequently such similarities might be exploited to enhance fault prediction models.

References

  1. Erik Arisholm, Lionel C. Briand, and Eivind B. Johannessen. 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83, 1 (1 2010), 2--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Bettenburg, M. Nagappan, and A. E. Hassan. 2012. Think locally, act globally: Improving defect and effort prediction models. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 60--69.Google ScholarGoogle Scholar
  3. Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In Proceedings - International Symposium on Software Reliability Engineering, ISSRE. IEEE, 109--119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google ScholarGoogle Scholar
  6. G. Carrozza, D. Cotroneo, R. Natella, R. Pietrantuono, and S. Russo. 2013. Analysis and Prediction of Mandelbugs in an Industrial Software System. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 262--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cagatay Catal and Banu Diri. 2009. A systematic review of software fault prediction studies. Expert Systems with Applications 36, 4 (5 2009), 7346--7354.Google ScholarGoogle Scholar
  8. Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research (2002), 321--357.Google ScholarGoogle Scholar
  9. Tse-Hsun Chen, Meiyappan Nagappan, Emad Shihab, and Ahmed E Hassan. 2014. An empirical study of dormant bugs. In Proceedings of the 11th Working Conference on Mining Software Repositories, Vol. undefined. ACM Press, New York, New York, USA, 82--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Domenico Cotroneo, Roberto Natella, and Roberto Pietrantuono. 2013. Predicting Aging-related Bugs Using Software Complexity Metrics. Perform. Eval. 70, 3 (March 2013), 163--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marco DâĂŹAmbros, Michele Lanza, and Romain Robbes. 2011. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering 17, 4--5 (8 2011), 531--577.Google ScholarGoogle Scholar
  12. L. Erlikh. 2000. Leveraging legacy system dollars for e-business. IT Professional 2, 3(0 2000), 17--23.Google ScholarGoogle Scholar
  13. Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for Software Analytics: is it Really Necessary? Information and Software Technology (4 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. (5 2015), 789--800.Google ScholarGoogle Scholar
  15. Emanuel Giger, Marco D'Ambros, Martin Pinzger, and Harald C. Gall. 2012. Method-level bug prediction. In Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM '12. ACM Press, New York, New York, USA, 171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Emanuel Giger, Martin Pinzger, and Harald C. Gall. 2011. Comparing fine-grained source code changes and code churn for bug prediction. In Proceeding of the 8th working conference on Mining software repositories - MSR '11. ACM Press, New York, New York, USA, 83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy. 2000. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering 26, 7 (7 2000), 653--661.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lan Guo, Yan Ma, Bojan Cukic, and Harshinder Singh. 2004. Robust Prediction of Fault-Proneness by Random Forests. In Proceedings - International Symposium on Software Reliability Engineering, ISSRE. IEEE, 417--428.Google ScholarGoogle Scholar
  19. T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering 38, 6 (11 2012), 1276--1304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A.E. Hassan and R.C. Holt. 2005. The top ten list: dynamic fault prediction. In 21st IEEE International Conference on Software Maintenance (ICSM'05). IEEE, 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ahmed E. Hassan. 2009. Predicting faults using the complexity of code changes. In Proceedings - International Conference on Software Engineering. IEEE, 78--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kim Herzig, Sascha Just, Andreas Rau, and Andreas Zeller. 2013. Predicting defects using change genealogies. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 118--127. Google ScholarGoogle ScholarCross RefCross Ref
  23. Kim Sebastian Herzig. 2013. Mining and untangling change genealogies. Ph.D. Dissertation. SaarbruÌĹcken, UniversitaÌĹt des Saarlandes, Diss., 2013.Google ScholarGoogle Scholar
  24. N. Jovanovic, C. Kruegel, and E. Kirda. 2006. Pixy: a static analysis tool for detecting Web application vulnerabilities. In 2006 IEEE Symposium on Security and Privacy (S&P'06). IEEE, 6 pp. 263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yasutaka Kamei, Shinsuke Matsumoto, Akito Monden, Ken-ichi Matsumoto, Bram Adams, and Ahmed E. Hassan. 2010. Revisiting common bug prediction findings using effort-aware models. In 2010 IEEE International Conference on Software Maintenance. IEEE, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (6 2013), 757--773.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis 53, 11 (9 2009), 3735--3745.Google ScholarGoogle Scholar
  28. Sunghun Kim, E. James Whitehead, and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering 34, 2 (3 2008), 181--196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. 2007. Predicting Faults from Cached History. In 29th International Conference on Software Engineering (ICSE'07). IEEE, 489--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Max Kuhn. 2014. Futility Analysis in the Cross-Validation of Machine Learning Models. (5 2014), 22.Google ScholarGoogle Scholar
  31. Max Kuhn. 2016. Caret package. (2016). http://cran.r-project.org/package=caretGoogle ScholarGoogle Scholar
  32. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. 2008. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering 34, 4 (7 2008), 485--496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yi Liu, Taghi M. Khoshgoftaar, and Naeem Seliya. 2010. Evolutionary Optimization of Software Quality Modeling with Multiple Repositories. IEEE Transactions on Software Engineering 36, 6 (11 2010), 852--864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca a. Popa, and Yuanyuan Zhou. 2007. MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. ACM SIGOPS Operating Systems Review 41, 6 (10 2007), 103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wanwangying Ma, Lin Chen, Yibiao Yang, Yuming Zhou, and Baowen Xu. 2015. Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology (9 2015).Google ScholarGoogle Scholar
  36. Thilo Mende. 2010. Replication of defect prediction studies. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering - PROMISE '10. ACM Press, New York, New York, USA, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Thilo Mende and Rainer Koschke. 2009. Revisiting the evaluation of defect prediction models. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE '09. ACM Press, New York, New York, USA, 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T Mende and R Koschke. 2010. Effort-Aware Defect Prediction Models. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Thilo Mende, Rainer Koschke, and Marek Leszak. 2009. Evaluating Defect Prediction Models for a Large Evolving Software System. In 2009 13th European Conference on Software Maintenance and Reengineering. IEEE, 247--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tim Menzies, Jeremy Greenwald, and Art Frank. 2007. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering 33, 1 (1 2007), 2--13.Google ScholarGoogle Scholar
  41. Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayŧe Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering 17, 4 (5 2010), 375--407.Google ScholarGoogle Scholar
  42. Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, and Yue Jiang. 2008. Implications of ceiling effects in defect predictors. In Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE '08. ACM Press, New York, New York, USA, 47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Audris Mockus and David M. Weiss. 2000. Predicting risk of software changes. Bell Labs Technical Journal 5, 2(8 2000), 169--180.Google ScholarGoogle Scholar
  44. Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Limits of Static Analysis for Malware Detection. In Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007). IEEE, 421--430. Google ScholarGoogle Scholar
  45. Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 13th international conference on Software engineering - ICSE '08. ACM Press, New York, New York, USA, 181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Nachiappan. Nagappan and Thomas. Ball. 2005. Static analysis tools as early indicators of pre-release defect density. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. ACM Press, New York, New York, USA, 580.Google ScholarGoogle Scholar
  47. N. Nagappan and T. Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. IEEe, 284--292.Google ScholarGoogle Scholar
  48. Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In Proceeding of the 28th international conference on Software engineering - ICSE '06. ACM Press, New York, New York, USA, 452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Rahul Premraj and Kim Herzig. 2011. Network Versus Code Metrics to Predict Defects: A Replication Study. In 2011 International Symposium on Empirical Software Engineering and Measurement. IEEE, 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 432--441. Google ScholarGoogle ScholarCross RefCross Ref
  51. Foyzur Rahman, Sameer Khatri, Earl T. Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering - ICSE 2014. ACM Press, New York, New York, USA, 424--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Foyzur Rahman, Daryl Posnett, and Premkumar Devanbu. 2012. Recalling the "imprecision" of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12. ACM Press, New York, New York, USA, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. 2013. Sample size vs. bias in defect prediction. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2013. ACM Press, New York, New York, USA, 147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Emad Shihab, Ahmed E. Hassan, Bram Adams, and Zhen Ming Jiang. 2012. An industrial study on the risk of software changes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12. ACM Press, New York, New York, USA, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2011. High-impact defects: a study of breakage and surprise defects. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM Press, New York, New York, USA, 300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Rainer Storn and Kenneth Price. 1997. Differential Evolution - A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization 11, 4 (1997), 341--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Lin Tan, Chen Liu, Zhenmin Li, Xuanhui Wang, Yuanyuan Zhou, and Chengxiang Zhai. 2013. Bug characteristics in open source software. Empirical Software Engineering 19, 6 (6 2013), 1665--1705.Google ScholarGoogle Scholar
  58. Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. Automated Parameter Optimization of Classification techniques for Defect Prediction Models. In Proc. of the International Conference on Software Engineering (ICSE). To appear.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Luis Torgo. 2013. Package DMwR. (2013). https://cran.r-project.org/package=DMwRGoogle ScholarGoogle Scholar
  60. Ayse Tosun and Ayse Bener. 2009. Reducing false alarms in software defect prediction by decision threshold optimization. In 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE, 477--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Harold Valdivia-Garcia and Meiyappan Nagappan. 2017. The Characteristics of False-Negatives in File-level Fault Prediction. (2017). https://github.com/harold-valdivia-garcia/fp-in-bug-pred/blob/master/false-negative-dp-appx.pdfGoogle ScholarGoogle Scholar
  62. Xin Xia, David Lo, Sinno Jialin Pan, Nachiappan Nagappan, and Xinyu Wang. 2016. HYDRA: Massively Compositional Model for Cross-Project Defect Prediction. IEEE Transactions on Software Engineering 42, 10 (2016), 977--998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87 (2017), 206 - 220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2011. Security versus performance bugs. In Proceeding of the 8th working conference on Mining software repositories - MSR '11. ACM Press, New York, New York, USA, 93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In Proceedings of the 13th international conference on Software engineering - ICSE '08. ACM Press, New York, New York, USA, 531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM Press, New York, New York, USA, 91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting Defects for Eclipse. In Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007). LEEE, 9--9. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Characteristics of False-Negatives in File-level Fault Prediction

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            PROMISE: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering
            November 2017
            120 pages
            ISBN:9781450353052
            DOI:10.1145/3127005

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 November 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            PROMISE Paper Acceptance Rate12of25submissions,48%Overall Acceptance Rate64of125submissions,51%
          • Article Metrics

            • Downloads (Last 12 months)1
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader