skip to main content
10.1145/3127005.3127013acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

The Characteristics of False-Negatives in File-level Fault Prediction

Published: 08 November 2017 Publication History

Abstract

Over the years, a plethora of works has proposed more and more sophisticated machine learning techniques to improve fault prediction models. However, past studies using product metrics from closed-source projects, found a ceiling effect in the performance of fault prediction models. On the other hand, other studies have shown that process metrics are significantly better than product metrics for fault prediction. In our case study therefore we build models that include both product and process metrics taken together. We find that the ceiling effect found in prior studies exists even when we consider process metrics. We then qualitatively investigate the bug reports, source code files, and commit information for the bugs in the files that are false-negative in our fault prediction models trained using product and process metrics. Surprisingly, our qualitative analysis shows that bugs related to false-negative files and true-positive files are similar in terms of root causes, impact and affected components, and consequently such similarities might be exploited to enhance fault prediction models.

References

[1]
Erik Arisholm, Lionel C. Briand, and Eivind B. Johannessen. 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83, 1 (1 2010), 2--17.
[2]
N. Bettenburg, M. Nagappan, and A. E. Hassan. 2012. Think locally, act globally: Improving defect and effort prediction models. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 60--69.
[3]
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In Proceedings - International Symposium on Software Reliability Engineering, ISSRE. IEEE, 109--119.
[4]
Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32.
[5]
Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.
[6]
G. Carrozza, D. Cotroneo, R. Natella, R. Pietrantuono, and S. Russo. 2013. Analysis and Prediction of Mandelbugs in an Industrial Software System. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 262--271.
[7]
Cagatay Catal and Banu Diri. 2009. A systematic review of software fault prediction studies. Expert Systems with Applications 36, 4 (5 2009), 7346--7354.
[8]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research (2002), 321--357.
[9]
Tse-Hsun Chen, Meiyappan Nagappan, Emad Shihab, and Ahmed E Hassan. 2014. An empirical study of dormant bugs. In Proceedings of the 11th Working Conference on Mining Software Repositories, Vol. undefined. ACM Press, New York, New York, USA, 82--91.
[10]
Domenico Cotroneo, Roberto Natella, and Roberto Pietrantuono. 2013. Predicting Aging-related Bugs Using Software Complexity Metrics. Perform. Eval. 70, 3 (March 2013), 163--178.
[11]
Marco DâĂŹAmbros, Michele Lanza, and Romain Robbes. 2011. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering 17, 4--5 (8 2011), 531--577.
[12]
L. Erlikh. 2000. Leveraging legacy system dollars for e-business. IT Professional 2, 3(0 2000), 17--23.
[13]
Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for Software Analytics: is it Really Necessary? Information and Software Technology (4 2016).
[14]
Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. (5 2015), 789--800.
[15]
Emanuel Giger, Marco D'Ambros, Martin Pinzger, and Harald C. Gall. 2012. Method-level bug prediction. In Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM '12. ACM Press, New York, New York, USA, 171.
[16]
Emanuel Giger, Martin Pinzger, and Harald C. Gall. 2011. Comparing fine-grained source code changes and code churn for bug prediction. In Proceeding of the 8th working conference on Mining software repositories - MSR '11. ACM Press, New York, New York, USA, 83.
[17]
T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy. 2000. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering 26, 7 (7 2000), 653--661.
[18]
Lan Guo, Yan Ma, Bojan Cukic, and Harshinder Singh. 2004. Robust Prediction of Fault-Proneness by Random Forests. In Proceedings - International Symposium on Software Reliability Engineering, ISSRE. IEEE, 417--428.
[19]
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering 38, 6 (11 2012), 1276--1304.
[20]
A.E. Hassan and R.C. Holt. 2005. The top ten list: dynamic fault prediction. In 21st IEEE International Conference on Software Maintenance (ICSM'05). IEEE, 263--272.
[21]
Ahmed E. Hassan. 2009. Predicting faults using the complexity of code changes. In Proceedings - International Conference on Software Engineering. IEEE, 78--88.
[22]
Kim Herzig, Sascha Just, Andreas Rau, and Andreas Zeller. 2013. Predicting defects using change genealogies. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 118--127.
[23]
Kim Sebastian Herzig. 2013. Mining and untangling change genealogies. Ph.D. Dissertation. SaarbruÌĹcken, UniversitaÌĹt des Saarlandes, Diss., 2013.
[24]
N. Jovanovic, C. Kruegel, and E. Kirda. 2006. Pixy: a static analysis tool for detecting Web application vulnerabilities. In 2006 IEEE Symposium on Security and Privacy (S&P'06). IEEE, 6 pp. 263.
[25]
Yasutaka Kamei, Shinsuke Matsumoto, Akito Monden, Ken-ichi Matsumoto, Bram Adams, and Ahmed E. Hassan. 2010. Revisiting common bug prediction findings using effort-aware models. In 2010 IEEE International Conference on Software Maintenance. IEEE, 1--10.
[26]
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (6 2013), 757--773.
[27]
Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis 53, 11 (9 2009), 3735--3745.
[28]
Sunghun Kim, E. James Whitehead, and Yi Zhang. 2008. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering 34, 2 (3 2008), 181--196.
[29]
Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. 2007. Predicting Faults from Cached History. In 29th International Conference on Software Engineering (ICSE'07). IEEE, 489--498.
[30]
Max Kuhn. 2014. Futility Analysis in the Cross-Validation of Machine Learning Models. (5 2014), 22.
[31]
Max Kuhn. 2016. Caret package. (2016). http://cran.r-project.org/package=caret
[32]
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. 2008. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering 34, 4 (7 2008), 485--496.
[33]
Yi Liu, Taghi M. Khoshgoftaar, and Naeem Seliya. 2010. Evolutionary Optimization of Software Quality Modeling with Multiple Repositories. IEEE Transactions on Software Engineering 36, 6 (11 2010), 852--864.
[34]
Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca a. Popa, and Yuanyuan Zhou. 2007. MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. ACM SIGOPS Operating Systems Review 41, 6 (10 2007), 103.
[35]
Wanwangying Ma, Lin Chen, Yibiao Yang, Yuming Zhou, and Baowen Xu. 2015. Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology (9 2015).
[36]
Thilo Mende. 2010. Replication of defect prediction studies. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering - PROMISE '10. ACM Press, New York, New York, USA, 1.
[37]
Thilo Mende and Rainer Koschke. 2009. Revisiting the evaluation of defect prediction models. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE '09. ACM Press, New York, New York, USA, 1.
[38]
T Mende and R Koschke. 2010. Effort-Aware Defect Prediction Models. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 107--116.
[39]
Thilo Mende, Rainer Koschke, and Marek Leszak. 2009. Evaluating Defect Prediction Models for a Large Evolving Software System. In 2009 13th European Conference on Software Maintenance and Reengineering. IEEE, 247--250.
[40]
Tim Menzies, Jeremy Greenwald, and Art Frank. 2007. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering 33, 1 (1 2007), 2--13.
[41]
Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayŧe Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering 17, 4 (5 2010), 375--407.
[42]
Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, and Yue Jiang. 2008. Implications of ceiling effects in defect predictors. In Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE '08. ACM Press, New York, New York, USA, 47.
[43]
Audris Mockus and David M. Weiss. 2000. Predicting risk of software changes. Bell Labs Technical Journal 5, 2(8 2000), 169--180.
[44]
Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Limits of Static Analysis for Malware Detection. In Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007). IEEE, 421--430.
[45]
Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 13th international conference on Software engineering - ICSE '08. ACM Press, New York, New York, USA, 181.
[46]
Nachiappan. Nagappan and Thomas. Ball. 2005. Static analysis tools as early indicators of pre-release defect density. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. ACM Press, New York, New York, USA, 580.
[47]
N. Nagappan and T. Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. IEEe, 284--292.
[48]
Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In Proceeding of the 28th international conference on Software engineering - ICSE '06. ACM Press, New York, New York, USA, 452.
[49]
Rahul Premraj and Kim Herzig. 2011. Network Versus Code Metrics to Predict Defects: A Replication Study. In 2011 International Symposium on Empirical Software Engineering and Measurement. IEEE, 215--224.
[50]
Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 432--441.
[51]
Foyzur Rahman, Sameer Khatri, Earl T. Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering - ICSE 2014. ACM Press, New York, New York, USA, 424--434.
[52]
Foyzur Rahman, Daryl Posnett, and Premkumar Devanbu. 2012. Recalling the "imprecision" of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12. ACM Press, New York, New York, USA, 1.
[53]
Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. 2013. Sample size vs. bias in defect prediction. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2013. ACM Press, New York, New York, USA, 147.
[54]
Emad Shihab, Ahmed E. Hassan, Bram Adams, and Zhen Ming Jiang. 2012. An industrial study on the risk of software changes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12. ACM Press, New York, New York, USA, 1.
[55]
Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2011. High-impact defects: a study of breakage and surprise defects. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM Press, New York, New York, USA, 300.
[56]
Rainer Storn and Kenneth Price. 1997. Differential Evolution - A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization 11, 4 (1997), 341--359.
[57]
Lin Tan, Chen Liu, Zhenmin Li, Xuanhui Wang, Yuanyuan Zhou, and Chengxiang Zhai. 2013. Bug characteristics in open source software. Empirical Software Engineering 19, 6 (6 2013), 1665--1705.
[58]
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. Automated Parameter Optimization of Classification techniques for Defect Prediction Models. In Proc. of the International Conference on Software Engineering (ICSE). To appear.
[59]
Luis Torgo. 2013. Package DMwR. (2013). https://cran.r-project.org/package=DMwR
[60]
Ayse Tosun and Ayse Bener. 2009. Reducing false alarms in software defect prediction by decision threshold optimization. In 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE, 477--480.
[61]
Harold Valdivia-Garcia and Meiyappan Nagappan. 2017. The Characteristics of False-Negatives in File-level Fault Prediction. (2017). https://github.com/harold-valdivia-garcia/fp-in-bug-pred/blob/master/false-negative-dp-appx.pdf
[62]
Xin Xia, David Lo, Sinno Jialin Pan, Nachiappan Nagappan, and Xinyu Wang. 2016. HYDRA: Massively Compositional Model for Cross-Project Defect Prediction. IEEE Transactions on Software Engineering 42, 10 (2016), 977--998.
[63]
Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87 (2017), 206 - 220.
[64]
Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2011. Security versus performance bugs. In Proceeding of the 8th working conference on Mining software repositories - MSR '11. ACM Press, New York, New York, USA, 93.
[65]
Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In Proceedings of the 13th international conference on Software engineering - ICSE '08. ACM Press, New York, New York, USA, 531.
[66]
Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM Press, New York, New York, USA, 91.
[67]
Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting Defects for Eclipse. In Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007). LEEE, 9--9.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering
November 2017
120 pages
ISBN:9781450353052
DOI:10.1145/3127005
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code Metrics
  2. Post-release Defects
  3. Process Metrics

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PROMISE

Acceptance Rates

PROMISE Paper Acceptance Rate 12 of 25 submissions, 48%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 104
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media