Abstract
To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-source projects—the Linux kernel, Mozilla, and Apache. We manually study these bugs in three dimensions—root causes, impacts, and components. We further study the correlation between categories in different dimensions, and the trend of different types of bugs. The findings include: (1) semantic bugs are the dominant root cause. As software evolves, semantic bugs increase, while memory-related bugs decrease, calling for more research effort to address semantic bugs; (2) the Linux kernel operating system (OS) has more concurrency bugs than its non-OS counterparts, suggesting more effort into detecting concurrency bugs in operating system code; and (3) reported security bugs are increasing, and the majority of them are caused by semantic bugs, suggesting more support to help developers diagnose and fix security bugs, especially semantic security bugs. In addition, to reduce the manual effort in building bug benchmarks for evaluating bug detection and diagnosis tools, we use machine learning techniques to classify 109,014 bugs automatically.








Similar content being viewed by others
References
ASF bugzilla (2010) http://issues.apache.org/bugzilla. Accessed September 2010
Bayes Net (2013) http://www.cs.waikato.ac.nz/~remco/weka_bn/
Coverity (2013) Automated error prevention and source code analysis. http://www.coverity.com
Debugging Memory Leaks (2013) https://wiki.mozilla.org/Performance:Leak_Tools
Kernel Bug Tracker (2010) http://bugzilla.kernel.org/. Accessed February 2010
Mozilla.org Bugzilla (2010) https://bugzilla.mozilla.org. Accessed September 2010
National Vulnerability Database (2011) http://nvd.nist.gov. Accessed 31 December 2011
Nvd Common Vulnerability Scoring System (2013) http://nvd.nist.gov/cvss.cfm?version=2
Support vector machines using sequential minimal optimization (2013) http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html
Valgrind (2013) http://www.valgrind.org/
Ahmed L, Serge D, Quinten S, Tim V (2011) Comparing mining algorithms for predicting the severity of a reported bug. In: Proceedings of the 15th European conference on software maintenance and reengineering, pp 249–258
Amir M, Tao X (2005) Helping users avoid bugs in gui applications. In: Proceedings of the 27th international conference on software engineering, pp 107–116
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, pp 361–370
Aranda J, Venolia G (2009) The secret life of bugs: Going past the errors and omissions in software repositories. In: Proceedings of the 31st international conference on software engineering. IEEE, pp 298–308
Avizienis A, Laprie JC, Randell B, Landwehr CE (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secur Comput 1(1):11–33
Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun. ACM 27(1):42–52
Beizer B (1990) Software testing techniques, 2nd edn. Van Nostrand Reinhold Co., New York, NY, USA
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ESEC/FSE ’09, pp 121–130
Bougie G, Treude C, German DM, Storey MA (2010) A comparative exploration of FreeBSD bug lifetimes. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 106–109
Canfora G, Cerulo L, Cimitile M, Di Penta M (2011) Social interactions around cross-system bug fixings: the case of freebsd and openbsd. In: Proceedings of the 8th working conference on mining software repositories, pp 143–152
Chang CC, Lin CJ (2001) Libsvm—a library for support vector machines. The Weka classifier works with version 2.82 of LIBSVM. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Chillarege R, Kao WL, Condit RG (1991) Defect type and its impact on the growth curve. In: Proceedings of the 13th international conference on software engineering, pp 246–255
Chou A, Yang J, Chelf B, Hallem S, Engler DR (2001) An empirical study of operating system errors. In: Proceedings of the 18th ACM symposium on operating systems principles, pp 73–88
Compton BT, Withrow C (1990) Prediction and control of ADA software defects. J Syst Softw 12(3):199–207
Cowan C (2003) Software security for open-source systems. IEEE Secur Priv 1(1):38–45
Cowan C, Wagle P, Pu C, Beattie S, Walpole J (2000) Buffer overflows: attacks and defenses for the vulnerability of the decade. In: Proceedings of the DARPA information survivability conference and exposition
Cubranic D, Murphy GC (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering and knowledge engineering, pp 92–97
Dallmeier V, Zimmermann T (2007) Extraction of bug localization benchmarks from history. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 433–436
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577
Edwards A, Tucker S, Demsky B (2010) Afid: an automated approach to collecting software faults. Autom Softw Eng 17(3):347–372
Emad S, Akinori I, Yasutaka K, Walid MI, Masao O, Bram A, Ahmed EH, Ken-ichi M (2010) Predicting re-opened bugs: a case study on the eclipse project. In: Proceedings of the 2010 17th working conference on reverse engineering, pp 249–258
Endres A (1975) An analysis of errors and their causes in system programs. In: Proceedings of the international conference on reliable software, pp 327–336
Engler D, Chen DY, Chou A (2001) Bugs as deviant behavior: a general approach to inferring errors in systems code. In: Proceedings of the 18th ACM symposium on operating systems principles, pp 57–72
Ernst MD, Cockrell J, Griswold WG, Notkin D (2001) Dynamically discovering likely program invariants to support program evolution. IEEE Trans Softw Eng 27(2):99–123
Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814
Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: an industrial case study. In: Proceedings of the 7th working conference on mining software repositories, pp 11–20
Germán DM (2006) An empirical study of fine-grained software modifications. Empir Softw Eng 11(3):369–393
Glass R (1981) Persistent software errors. IEEE Trans Softw Eng 7(2):162–168
Graves T, Karr A, Marron J, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Gu W, Kalbarczyk Z, Iyer RK, Yang ZY (2003) Characterization of linux kernel behavior under errors. In: Proceedings of the 2003 international conference on dependable systems and networks, pp 459–468
Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: 15th international symposium on software reliability engineering, pp 417–428
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE ’10, vol 1, pp 495–504
Hafiz M (2010) Security on demand. Ph.D. thesis
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) TheWEKA data mining software: an update. SIGKDD Explorations, vol 11, issue 1
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers
Hastings R, Joyce B (1992) Purify: fast detection of memory leaks and access errors. In: Proceedings of the winter USENIX conference, pp 125–136
Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a simplification of the bug report form in eclipse. In: Proceedings of the 2008 international workshop on mining software repositories, MSR ’08. ACM Press, pp 145–148
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 34–43
Joachims T (2002) Learning to classify text using support vector machines. Kluwer Academic Publishers
Kaâniche M, Kanoun K, Cukier M, de Bastos Martini MR (1994) Software reliability analysis of three successive generations of a switching system. In: Proceedings of the 1st European dependable computing conference on dependable compproceedings of the first European dependable computing conference on dependable computing, pp 473–490
Kim S, Zimmermann T, Pan K, Whitehead E Jr (2006) Automatic identification of bug-introducing changes. In: 21st IEEE/ACM international conference on automated software engineering (ASE’06), pp 81–90
Kpodjedo S, Ricca F, Galinier P, Guéhéneuc YG, Antoniol G (2011) Design evolution metrics for defect prediction in object oriented systems. Empir Softw Eng 16(1):141–175
Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: 6th symposium on operating systems design and implementation, pp 289–302
Li Z, Tan L, Wang X, Lu S, Zhou Y, Zhai C (2006) Have things changed now? An empirical study of bug characteristics in modern open source software. In: Proceedings of the 1st workshop on architectural and system support for improving software dependability, ASID ’06
Lu S (2008) Understanding, detecting and exposing concurrency bugs. Ph.D. thesis
Lu S, Li Z, Qin F, Tan L, Zhou P, Zhou Y (2005) BugBench: a benchmark for evaluating bug detection tools. In: Workshop on the evaluation of software defect detection tools
Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th international conference on architectural support for programming languages and operating systems, pp 329–339
Massacci F, Neuhaus S, Nguyen VH (2011) After-life vulnerabilities: a study on firefox evolution, its vulnerabilities, and fixes. In: Proceedings of the 3rd international conference on engineering secure software and systems, pp 195–208
Matter D, Kuhn A, Nierstrasz O (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the 2009 6th IEEE international working conference on mining Software repositories, pp 131–140
Memon AM (2002) GUI testing: pitfalls and process. Computer 35(8):87–88
Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol (TOSEM) 11(3):309–346
Moller KH, Paulish DJ (1993) An empirical investigation of software fault distribution. In: Proceedings of the 1st international software metrics symposium, pp 82–90
Neuhaus S, Zimmermann T (2010) Security trend analysis with cve topic models. In: Proceedings of the 2010 IEEE 21st international symposium on software reliability engineering, pp 111–120
Ostrand T, Weyuker E (2002) The distribution of faults in a large industrial software system. In: Proceedings of the 2002 ACM SIGSOFT international symposium on software testing and analysis, pp 55–64
Ostrand TJ, Weyuker EJ (1984) Collecting and categorizing software error data in an industrial environment. J Syst Softw 4(4):289–300
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355
Ozment A, Schechter SE (2006) Milk or wine: does software security improve with age? In: Proceedings of the 15th conference on USENIX security symposium, vol 15
Pamela B, Iulian N (2010) Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: Proceedings of the 2010 IEEE international conference on Software maintenance
Pamela B, Iulian N (2011) Bug-fix time prediction models: can we do better? In: Proceedings of the 8th workin conference on mining software repositories
Panjer LD (2007) Predicting Eclipse bug lifetimes. In: Proceedings of the Fourth International Workshop on mining software repositories, pp 29–32
woo Park J, woong Lee M, Kim J, won Hwang S, Kim S (2011) CosTriage: a cost-aware triage algorithm for bug reporting systems. In: In Proceedings of 25th conference on artificial intelligence
Park S, Lu S, Zhou Y (2009a) CTrigger: exposing atomicity violation bugs from their hiding places. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, pp 25–36
Park S, Zhou Y, Xiong W, Yin Z, Kaushik R, Lee KH, Lu S (2009b) PRES: probabilistic replay with execution sketching on multiprocessors. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, pp 177–192
Payne C (2002) On the security of open source software. Inf Syst J 12(1):61–78
Philipp S, Juergen R, Philippe C (2008) Mining bug repositories—a quality assessment. In: Proceedings of the 2008 international conference on computational intelligence for modelling control and automation
Pighin M, Marzona A (2003) An empirical analysis of fault persistence through software releases. In: Proceedings of the international symposium on empirical software engineering, p 206
Podgurski A, Leon D, Francis P, Masri W, Minch M, Sun J, Wang B (2003) Automated support for classifying software failure reports. In: Proceedings of the 23th international conference on software engineering, pp 465–475
Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering, pp 499–510
Sahoo SK, Criswell J, Adve V (2010) An empirical study of reported bugs in server software with implications for automated bug diagnosis. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering—ICSE ’10, vol 1. ACM Press, p 485
Shin Y, Bell RM, Ostrand TJ, Weyuker EJ (2012) On the use of calling structure information to improve fault prediction. Empir Softw Eng 17(4–5):390–423
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR ’05: Proceedings of the 2005 international workshop on mining software repositories, pp 1–5
Sullivan M, Chillarege R (1991) Software defects and their impact on system availability—a study of field failures in operating systems. In: 21st int. symp. on fault-tolerant computing, pp 2–9
Sullivan M, Chillarege R (1992) A comparison of software defects in database management systems and operating systems. In: 22nd annual international symposium on fault-tolerant computing, pp 475–484
Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, pp 45–54
Swift MM, Bershad BN, Levy HM (2003) Improving the reliability of commodity operating systems. In: Proceedings of the ACM SIGOPS symposium on operating systems principles, pp 207–222
Syed A, Franz W (2010) Impact analysis of scrs using single and multi-label machine learning classification. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement
Tan L, Yuan D, Krishna G, Zhou Y (2007) /* iComment: bugs or bad comments? */. In: Proceedings of the 21st ACM symposium on operating systems principles
Tan L, Zhou Y, Padioleau Y (2011) aComment: mining annotations from comments and code to detect interrupt-related concurrency bugs. In: Proceedings of the 33rd international conference on software engineering
Tang Y, Tang Y, Gao Q, Gao Q, Qin F, Qin F (2008) LeakSurvivor: towards safely tolerating memory leaks for garbage-collected languages. In: USENIX 2008 annual technical conference on annual technical conference, pp 307–320
Tian Y, Lawall J, Lo D (2008) Identifying linux bug fixing patches. In: Proceedings of the international conference on software engineering
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470
Weyuker EJ, Ostrand TJ, Bell RM (2010) Comparing the effectiveness of several modeling methods for fault prediction. Empir Softw Eng 15(3):277–295
Wu R, Zhang H, Kim S, Cheung SC (2011) ReLink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 15–25
Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs? In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 26–36
Ying ATT, Murphy GC, Ng RT, Chu-Carroll M (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586
Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories, pp 93–102
Acknowledgements
We thank Luyang Wang and Yaoqiang Li for classifying some bug reports. We thank Shan Lu for the early discussion and feedback. The work is partially supported by the National Science and Engineering Research Council of Canada, the United States National Science Foundation, the United States Department of Energy, a Google gift grant, and an Intel gift grant.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Thomas Zimmermann
Appendices
Appendices
1.1 A Combining Two Data Sets
To leverage the randomly sampled bug reports studied in our prior work in 2005 (Li et al. 2006), each of the two bug report samples from Mozilla and Apache Bugzilla databases is combined from two random samples. The combination is performed in the following way to maintain the pure randomness of sampling. The goal is to ensure that the combined set of fixed bug reports is no different from a random sample of fixed bug reports on the entire Bugzilla databases now.
Figure 9 illustrates the combination approach. We randomly sampled 2X % of fixed bug reports in one Bugzilla database by the cutoff date of our prior work, referred to as Date1. Now we randomly select half of the 2X % of fixed bug reports, referred to as Set1; the other half is discarded. Note that Set1 is a random sample of X% of bug reports fixed by Date1. On our new sampling date (Table 1), denoted as Date2, we sample another X % of the fixed bug reports that were opened after Date1 and before Date2, denoted as Set2. We keep only half of the bug reports fixed by Date1 so that the sampled bug reports before Date1 and sampled bug reports after Date1 are in proportion to the bug reports belong to the two time ranges.
Combining Two Data Sets. Not Fixed denotes any status other than Fixed, e.g., reopened, invalid, etc. Old Data Set is the data set used in our previous work (Li et al. 2006) and New Data Set is the data set used in this paper
The status of a bug report may have changed since Date1 in the following two ways: (1) a fixed bug report by Date1 is no longer fixed by Date2; or (2) a unfixed bug report by Date1 is fixed by Date2. To compensate for these two scenarios, we identify all the fixed bug reports in Set1 that are no longer marked as fixed on Date2, denoted as Set3. Bug reports in Set3 should not be included in our sample, because if we take a random sample of fixed bug reports on Date2, those bug reports would not be sampled as they are not fixed. From the bug reports that are unfixed by Date1 but are fixed by Date2, we randomly sample X % of them, denoted as Set4. Our final random sample is the union of Set1, Set2, and Set4 with Set3 excluded.
Table 16 lists sizes of the four data sets for the three software projects. No combining is needed for the Linux kernel since it was sampled in 2010.
There is no difference between our combined sample and a sample randomly picked from the fixed bug reports on Date2, as either is a random sample of X% on the population. Therefore, the combined sample is representative of fixed bug reports in the Bugzilla database. In addition, our results show the distributions of these sets are similar, meaning that the variance in different data sets is small, which increases the confidence and reproducibility of our results.
Developer may update bug reports after the bugs are fixed. Therefore, we check all bugs in Set1 to find out whether the later activities affect our classifications of the bugs. Fortunately, only 5 of those bug reports in Mozilla and none in Apache have activities after Date1. We manually read these 5 bug reports again; and find that those activities change the product, the QA contact, or the component of the bug reports, and do not change our original classifications. The component field used in the bug reports is finer-grained than the definition of component in Table 3. Therefore, the finer-grained component change in bug reports does not affect the higher-level component used in this paper.
1.2 B Bug Examples
1.3 C Detailed Numbers for the Figures
Rights and permissions
About this article
Cite this article
Tan, L., Liu, C., Li, Z. et al. Bug characteristics in open source software. Empir Software Eng 19, 1665–1705 (2014). https://doi.org/10.1007/s10664-013-9258-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-013-9258-8