Skip to main content
Log in

Do bugs foreshadow vulnerabilities? An in-depth study of the chromium project

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

As developers face an ever-increasing pressure to engineer secure software, researchers are building an understanding of security-sensitive bugs (i.e. vulnerabilities). Research into mining software repositories has greatly increased our understanding of software quality via empirical study of bugs. Conceptually, however, vulnerabilities differ from bugs: they represent an abuse of functionality as opposed to insufficient functionality commonly associated with traditional, non-security bugs. We performed an in-depth analysis of the Chromium project to empirically examine the relationship between bugs and vulnerabilities. We mined 374,686 bugs and 703 post-release vulnerabilities over five Chromium releases that span six years of development. We used logistic regression analysis, ranking analysis, bug type classifications, developer experience, and vulnerability severity metrics to examine the overarching question: are bugs and vulnerabilities in the same files? While we found statistically significant correlations between pre-release bugs and post-release vulnerabilities, we found the association to be weak. Number of features, source lines of code, and pre-release security bugs are, in general, more closely associated with post-release vulnerabilities than any of our non-security bug categories. In further analysis, we examined sub-types of bugs, such as stability-related bugs, and the associations did not improve. Even the files with the most severe vulnerabilities (by measure of CVSS or bounty payouts) did not show strong correlations with number of bugs. These results indicate that bugs and vulnerabilities are empirically dissimilar groups, motivating the need for security engineering research to target vulnerabilities specifically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. 1 https://nvd.nist.gov/.

  2. 2 https://cve.mitre.org/

  3. 3 http://googlechromereleases.blogspot.com

References

  • Algarni A, Malaiya Y (2014) Software vulnerability markets: Discoverers and buyers. Inter J Comp Inf Scie Engin 8(3):71–81

    Google Scholar 

  • Allodi L, Massacci F (2012a) A Preliminary Analysis of Vulnerability Scores for Attacks in Wild. In: Proceedings of the 2012 ACM Workshop on Building analysis datasets and gathering experience returns for security - BADGERS ’12, p 17. doi:http://dx.doi.org/10.1145/2382416.2382427

  • Allodi L, Massacci F (2012b) A Preliminary Analysis of Vulnerability Scores for Attacks in Wild: The EKITS and SYM Datasets. In: Proceedings of the 2012 ACM Workshop on Building analysis datasets and gathering experience returns for security, ACM, pp 17–24

  • Allodi L, Massacci F (2014) Comparing vulnerability severity and exploits using case-control studies. ACM Trans Inf Syst Secur 17(1):1

    Article  Google Scholar 

  • Allodi L, Shim W, Massacci F (2013) Quantitative assessment of risk reduction with cybercrime black market monitoring. In: Security and Privacy Workshops (SPW), 2013 IEEE, IEEE, pp 165–172

  • Bird C, Menzies T, Zimmermann T (2015) The Art and Science of Analyzing Software Data: Analysis Patterns. Elsevier Science

  • Bozorgi M, Saul LK, Savage S, Voelker GM (2010) Beyond Heuristics: Learning to Classify Vulnerabilities and Predict Exploits. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 105–114. doi:http://dx.doi.org/10.1145/1835804.1835821

  • Burnham KP, Anderson DR (2004) Multimodel inference understanding aic and bic in model selection. Sociol Methods Res 33(2):261–304

    Article  MathSciNet  Google Scholar 

  • Chen TY, Kuo FC, Merkel R (2004) On the statistical properties of the f-measure. In: Quality Software, 2004. QSIC 2004. Proceedings. Fourth International Conference on, pp 146–153. doi:http://dx.doi.org/10.1109/QSIC.2004.1357955

  • Chen TH, Thomas S, Nagappan M, Hassan A (2012) Explaining software defects using topic models. In: Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, pp 189–198. doi:http://dx.doi.org/10.1109/MSR.2012.6224280

  • Cohen J (1992) Statistical power analysis. Curr Dir Psychol Sci:98–101

  • Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic press

  • Cruz A, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. doi:http://dx.doi.org/10.1109/ESEM.2009.5316002, pp Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on, pp 460–463

  • Finifter M, Akhawe D, Wagner D (2013) An Empirical Study of Vulnerability Rewards Programs. In: USENIX Security, vol 13

  • Gegick M, Williams L, Osborne J, Vouk M (2008) Prioritizing software security fortification throughcode-level metrics. In: Proceedings of the 4th ACM workshop on Quality of protection, ACM, pp 31–38

  • Gegick M, Rotella P, Williams L (2009) Predicting attack-prone components. In: Software Testing Verification and Validation, 2009. ICST ’09. International Conference on, pp 181–190. doi:http://dx.doi.org/10.1109/ICST.2009.36

  • Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Ecol Model 135(2):147–186

    Article  Google Scholar 

  • Krishnamurthy S, Tripathi A K (2006) Bounty programs in free/libre/open source software. BITZER Jurgen, The Economics of Open Source Software Development, Lavoisier, Paris

  • Krsul IV (1998) Software vulnerability analysis, PhD thesis, Purdue University

  • Meneely A, Srinivasan H, Musa A, Rodriguez Tejeda A, Mokary M, Spates B (2013) When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In: Empirical Software Engineering and Measurement, 2013 ACM / IEEE International Symposium on, pp 65–74. doi:http://dx.doi.org/10.1109/ESEM.2013.19

  • Meneely A, Tejeda ACR, Spates B, Trudeau S, Neuberger D, Whitlock K, Ketant C, Davis K (2014) An empirical investigation of socio-technical code review metrics and security vulnerabilities. In: Proceedings of the 6th International Workshop on Social Software Engineering, ACM, New York, NY, USA, SSE 2014, pp 37–44. doi:http://dx.doi.org/10.1145/2661685.2661687

  • Miller C (2007) The legitimate vulnerability market: Inside the secretive world of 0-day exploit sales. In: In Sixth Workshop on the Economics of Information Security, Citeseer

  • Mitropoulos D, Gousios G, Spinellis D (2012) Measuring the occurrence of security-related bugs through software evolution. In: Informatics (PCI), 2012 16th Panhellenic Conference on, pp 117–122. doi:http://dx.doi.org/10.1109/PCi.2012.15

  • Mitropoulos D, Karakoidas V, Louridas P, Gousios G, Spinellis D (2013) Dismal code: Studying the evolution of security bugs. In: Proceedings of the LASER 2013 (LASER 2013), USENIX, Arlington, VA, pp 37–48. https://www.usenix.org/laser2013/program/mitropoulos

  • Mukaka M (2012) A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3):69–71

    Google Scholar 

  • Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on Computer and communications security, ACM, pp 529–540

  • Poncin W, Serebrenik A, van den Brand M (2011) Process mining software repositories. In: Software Maintenance and Reengineering (CSMR), 2011 15th European Conference on, pp 5–14. doi:http://dx.doi.org/10.1109/CSMR.2011.5

  • R Core Team (2015) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

  • Radianti J, Gonzalez JJ (2007) Understanding hidden information security threats: The vulnerability black market. In: System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference on, IEEE, pp 156c–156c

  • Raftery AE (1995) Bayesian model selection in social research. Sociol Methodol 25:111–164

    Article  Google Scholar 

  • Ruscio J (2008) A probability-based measure of effect size: Robustness to base rates and other factors. Psychol Methods 13(1):19

    Article  Google Scholar 

  • Schneidewind NF (1992) Methodology for validating software metrics. Software Engineering. Trans IEEE 18(5):410–422

    Google Scholar 

  • Shihab E, Mockus A, Kamei Y, Adams B, Hassan AE (2011) High-impact defects: a study of breakage and surprise defects. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, ACM, pp 300–310

  • Shin Y, Meneely A, Williams L, Osborne J (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787. doi:http://dx.doi.org/10.1109/TSE.2010.81

    Article  Google Scholar 

  • Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, ichi Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: Proc. of the 37th Int’l Conf. on Software Engineering (ICSE), p To appear

  • Tegarden D, Sheetz S, Monarchi D (1992) Effectiveness of traditional software metrics for object-oriented systems. In: System Sciences, 1992. Proceedings of the Twenty-Fifth Hawaii International Conference on, vol iv, pp 359–368 vol.4. doi:http://dx.doi.org/10.1109/HICSS.1992.183365

  • Younis AA, Malaiya YK (2015) Comparing and Evaluating CVSS Base Metrics and Microsoft Rating System. In: 2015 IEEE International Conference on Software Quality, Reliability and Security (QRS), IEEE, pp 252–261

  • Younis A, Malaiya YK, Ray I (2015) Assessing vulnerability exploitability risk using software properties. Softw Qual J. doi:http://dx.doi.org/10.1007/s11219-015-9274-6

  • Zakzanis KK (2001) Statistics to tell the truth, the whole truth, and nothing but the truth: formulae, illustrative numerical examples, and heuristic interpretation of effect size analyses for neuropsychological researchers. Arch Clin Neuropsychol 16(7):653–667

    Article  Google Scholar 

  • Zimmermann T, Nagappan N, Williams L (2010) Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: 2010 Third International Conference on Software Testing, Verification and Validation (ICST), IEEE, pp 421–428

Download references

Acknowledgments

This research is supported by the National Science Foundation (grant CCF-1441444). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We thank the Software Archeology group at RIT for their valuable contributions to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuthan Munaiah.

Additional information

Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei.

Appendix:

Appendix:

Table 18 Model goodness of fit metrics for base, reference, and bug category group models built with data from five Chromium releases
Table 19 Model performance metrics for base, reference, and bug category group models built with data from four Chromium releases
Table 20 MWW test results for review experience metrics in five Chromium releases
Table 21 Model goodness of fit metrics for review experience models built with data from five Chromium releases
Table 22 Model performance comparison between bugs and review experience models built with data from four Chromium releases

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Munaiah, N., Camilo, F., Wigham, W. et al. Do bugs foreshadow vulnerabilities? An in-depth study of the chromium project. Empir Software Eng 22, 1305–1347 (2017). https://doi.org/10.1007/s10664-016-9447-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9447-3

Keywords

Navigation