Skip to main content
Log in

Bug characteristics in open source software

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-source projects—the Linux kernel, Mozilla, and Apache. We manually study these bugs in three dimensions—root causes, impacts, and components. We further study the correlation between categories in different dimensions, and the trend of different types of bugs. The findings include: (1) semantic bugs are the dominant root cause. As software evolves, semantic bugs increase, while memory-related bugs decrease, calling for more research effort to address semantic bugs; (2) the Linux kernel operating system (OS) has more concurrency bugs than its non-OS counterparts, suggesting more effort into detecting concurrency bugs in operating system code; and (3) reported security bugs are increasing, and the majority of them are caused by semantic bugs, suggesting more support to help developers diagnose and fix security bugs, especially semantic security bugs. In addition, to reduce the manual effort in building bug benchmarks for evaluating bug detection and diagnosis tools, we use machine learning techniques to classify 109,014 bugs automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • ASF bugzilla (2010) http://issues.apache.org/bugzilla. Accessed September 2010

  • Bayes Net (2013) http://www.cs.waikato.ac.nz/~remco/weka_bn/

  • Coverity (2013) Automated error prevention and source code analysis. http://www.coverity.com

  • Debugging Memory Leaks (2013) https://wiki.mozilla.org/Performance:Leak_Tools

  • Kernel Bug Tracker (2010) http://bugzilla.kernel.org/. Accessed February 2010

  • Mozilla.org Bugzilla (2010) https://bugzilla.mozilla.org. Accessed September 2010

  • National Vulnerability Database (2011) http://nvd.nist.gov. Accessed 31 December 2011

  • Nvd Common Vulnerability Scoring System (2013) http://nvd.nist.gov/cvss.cfm?version=2

  • Support vector machines using sequential minimal optimization (2013) http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html

  • Valgrind (2013) http://www.valgrind.org/

  • Ahmed L, Serge D, Quinten S, Tim V (2011) Comparing mining algorithms for predicting the severity of a reported bug. In: Proceedings of the 15th European conference on software maintenance and reengineering, pp 249–258

  • Amir M, Tao X (2005) Helping users avoid bugs in gui applications. In: Proceedings of the 27th international conference on software engineering, pp 107–116

  • Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, pp 361–370

  • Aranda J, Venolia G (2009) The secret life of bugs: Going past the errors and omissions in software repositories. In: Proceedings of the 31st international conference on software engineering. IEEE, pp 298–308

  • Avizienis A, Laprie JC, Randell B, Landwehr CE (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secur Comput 1(1):11–33

    Article  Google Scholar 

  • Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun. ACM 27(1):42–52

    Article  Google Scholar 

  • Beizer B (1990) Software testing techniques, 2nd edn. Van Nostrand Reinhold Co., New York, NY, USA

    Google Scholar 

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ESEC/FSE ’09, pp 121–130

  • Bougie G, Treude C, German DM, Storey MA (2010) A comparative exploration of FreeBSD bug lifetimes. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 106–109

  • Canfora G, Cerulo L, Cimitile M, Di Penta M (2011) Social interactions around cross-system bug fixings: the case of freebsd and openbsd. In: Proceedings of the 8th working conference on mining software repositories, pp 143–152

  • Chang CC, Lin CJ (2001) Libsvm—a library for support vector machines. The Weka classifier works with version 2.82 of LIBSVM. http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  • Chillarege R, Kao WL, Condit RG (1991) Defect type and its impact on the growth curve. In: Proceedings of the 13th international conference on software engineering, pp 246–255

  • Chou A, Yang J, Chelf B, Hallem S, Engler DR (2001) An empirical study of operating system errors. In: Proceedings of the 18th ACM symposium on operating systems principles, pp 73–88

  • Compton BT, Withrow C (1990) Prediction and control of ADA software defects. J Syst Softw 12(3):199–207

    Article  Google Scholar 

  • Cowan C (2003) Software security for open-source systems. IEEE Secur Priv 1(1):38–45

    Article  Google Scholar 

  • Cowan C, Wagle P, Pu C, Beattie S, Walpole J (2000) Buffer overflows: attacks and defenses for the vulnerability of the decade. In: Proceedings of the DARPA information survivability conference and exposition

  • Cubranic D, Murphy GC (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering and knowledge engineering, pp 92–97

  • Dallmeier V, Zimmermann T (2007) Extraction of bug localization benchmarks from history. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 433–436

  • D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577

    Article  Google Scholar 

  • Edwards A, Tucker S, Demsky B (2010) Afid: an automated approach to collecting software faults. Autom Softw Eng 17(3):347–372

    Article  Google Scholar 

  • Emad S, Akinori I, Yasutaka K, Walid MI, Masao O, Bram A, Ahmed EH, Ken-ichi M (2010) Predicting re-opened bugs: a case study on the eclipse project. In: Proceedings of the 2010 17th working conference on reverse engineering, pp 249–258

  • Endres A (1975) An analysis of errors and their causes in system programs. In: Proceedings of the international conference on reliable software, pp 327–336

  • Engler D, Chen DY, Chou A (2001) Bugs as deviant behavior: a general approach to inferring errors in systems code. In: Proceedings of the 18th ACM symposium on operating systems principles, pp 57–72

  • Ernst MD, Cockrell J, Griswold WG, Notkin D (2001) Dynamically discovering likely program invariants to support program evolution. IEEE Trans Softw Eng 27(2):99–123

    Article  Google Scholar 

  • Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814

    Article  Google Scholar 

  • Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: an industrial case study. In: Proceedings of the 7th working conference on mining software repositories, pp 11–20

  • Germán DM (2006) An empirical study of fine-grained software modifications. Empir Softw Eng 11(3):369–393

    Article  Google Scholar 

  • Glass R (1981) Persistent software errors. IEEE Trans Softw Eng 7(2):162–168

    Article  Google Scholar 

  • Graves T, Karr A, Marron J, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Gu W, Kalbarczyk Z, Iyer RK, Yang ZY (2003) Characterization of linux kernel behavior under errors. In: Proceedings of the 2003 international conference on dependable systems and networks, pp 459–468

  • Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: 15th international symposium on software reliability engineering, pp 417–428

  • Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE ’10, vol 1, pp 495–504

  • Hafiz M (2010) Security on demand. Ph.D. thesis

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) TheWEKA data mining software: an update. SIGKDD Explorations, vol 11, issue 1

  • Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers

  • Hastings R, Joyce B (1992) Purify: fast detection of memory leaks and access errors. In: Proceedings of the winter USENIX conference, pp 125–136

  • Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a simplification of the bug report form in eclipse. In: Proceedings of the 2008 international workshop on mining software repositories, MSR ’08. ACM Press, pp 145–148

  • Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 34–43

  • Joachims T (2002) Learning to classify text using support vector machines. Kluwer Academic Publishers

  • Kaâniche M, Kanoun K, Cukier M, de Bastos Martini MR (1994) Software reliability analysis of three successive generations of a switching system. In: Proceedings of the 1st European dependable computing conference on dependable compproceedings of the first European dependable computing conference on dependable computing, pp 473–490

  • Kim S, Zimmermann T, Pan K, Whitehead E Jr (2006) Automatic identification of bug-introducing changes. In: 21st IEEE/ACM international conference on automated software engineering (ASE’06), pp 81–90

  • Kpodjedo S, Ricca F, Galinier P, Guéhéneuc YG, Antoniol G (2011) Design evolution metrics for defect prediction in object oriented systems. Empir Softw Eng 16(1):141–175

    Article  Google Scholar 

  • Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: 6th symposium on operating systems design and implementation, pp 289–302

  • Li Z, Tan L, Wang X, Lu S, Zhou Y, Zhai C (2006) Have things changed now? An empirical study of bug characteristics in modern open source software. In: Proceedings of the 1st workshop on architectural and system support for improving software dependability, ASID ’06

  • Lu S (2008) Understanding, detecting and exposing concurrency bugs. Ph.D. thesis

  • Lu S, Li Z, Qin F, Tan L, Zhou P, Zhou Y (2005) BugBench: a benchmark for evaluating bug detection tools. In: Workshop on the evaluation of software defect detection tools

  • Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th international conference on architectural support for programming languages and operating systems, pp 329–339

  • Massacci F, Neuhaus S, Nguyen VH (2011) After-life vulnerabilities: a study on firefox evolution, its vulnerabilities, and fixes. In: Proceedings of the 3rd international conference on engineering secure software and systems, pp 195–208

  • Matter D, Kuhn A, Nierstrasz O (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the 2009 6th IEEE international working conference on mining Software repositories, pp 131–140

  • Memon AM (2002) GUI testing: pitfalls and process. Computer 35(8):87–88

    Article  Google Scholar 

  • Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol (TOSEM) 11(3):309–346

    Article  Google Scholar 

  • Moller KH, Paulish DJ (1993) An empirical investigation of software fault distribution. In: Proceedings of the 1st international software metrics symposium, pp 82–90

  • Neuhaus S, Zimmermann T (2010) Security trend analysis with cve topic models. In: Proceedings of the 2010 IEEE 21st international symposium on software reliability engineering, pp 111–120

  • Ostrand T, Weyuker E (2002) The distribution of faults in a large industrial software system. In: Proceedings of the 2002 ACM SIGSOFT international symposium on software testing and analysis, pp 55–64

  • Ostrand TJ, Weyuker EJ (1984) Collecting and categorizing software error data in an industrial environment. J Syst Softw 4(4):289–300

    Article  Google Scholar 

  • Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355

    Article  Google Scholar 

  • Ozment A, Schechter SE (2006) Milk or wine: does software security improve with age? In: Proceedings of the 15th conference on USENIX security symposium, vol 15

  • Pamela B, Iulian N (2010) Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: Proceedings of the 2010 IEEE international conference on Software maintenance

  • Pamela B, Iulian N (2011) Bug-fix time prediction models: can we do better? In: Proceedings of the 8th workin conference on mining software repositories

  • Panjer LD (2007) Predicting Eclipse bug lifetimes. In: Proceedings of the Fourth International Workshop on mining software repositories, pp 29–32

  • woo Park J, woong Lee M, Kim J, won Hwang S, Kim S (2011) CosTriage: a cost-aware triage algorithm for bug reporting systems. In: In Proceedings of 25th conference on artificial intelligence

  • Park S, Lu S, Zhou Y (2009a) CTrigger: exposing atomicity violation bugs from their hiding places. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, pp 25–36

  • Park S, Zhou Y, Xiong W, Yin Z, Kaushik R, Lee KH, Lu S (2009b) PRES: probabilistic replay with execution sketching on multiprocessors. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, pp 177–192

  • Payne C (2002) On the security of open source software. Inf Syst J 12(1):61–78

    Google Scholar 

  • Philipp S, Juergen R, Philippe C (2008) Mining bug repositories—a quality assessment. In: Proceedings of the 2008 international conference on computational intelligence for modelling control and automation

  • Pighin M, Marzona A (2003) An empirical analysis of fault persistence through software releases. In: Proceedings of the international symposium on empirical software engineering, p 206

  • Podgurski A, Leon D, Francis P, Masri W, Minch M, Sun J, Wang B (2003) Automated support for classifying software failure reports. In: Proceedings of the 23th international conference on software engineering, pp 465–475

  • Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering, pp 499–510

  • Sahoo SK, Criswell J, Adve V (2010) An empirical study of reported bugs in server software with implications for automated bug diagnosis. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering—ICSE ’10, vol 1. ACM Press, p 485

  • Shin Y, Bell RM, Ostrand TJ, Weyuker EJ (2012) On the use of calling structure information to improve fault prediction. Empir Softw Eng 17(4–5):390–423

    Article  Google Scholar 

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR ’05: Proceedings of the 2005 international workshop on mining software repositories, pp 1–5

  • Sullivan M, Chillarege R (1991) Software defects and their impact on system availability—a study of field failures in operating systems. In: 21st int. symp. on fault-tolerant computing, pp 2–9

  • Sullivan M, Chillarege R (1992) A comparison of software defects in database management systems and operating systems. In: 22nd annual international symposium on fault-tolerant computing, pp 475–484

  • Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, pp 45–54

  • Swift MM, Bershad BN, Levy HM (2003) Improving the reliability of commodity operating systems. In: Proceedings of the ACM SIGOPS symposium on operating systems principles, pp 207–222

  • Syed A, Franz W (2010) Impact analysis of scrs using single and multi-label machine learning classification. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement

  • Tan L, Yuan D, Krishna G, Zhou Y (2007) /* iComment: bugs or bad comments? */. In: Proceedings of the 21st ACM symposium on operating systems principles

  • Tan L, Zhou Y, Padioleau Y (2011) aComment: mining annotations from comments and code to detect interrupt-related concurrency bugs. In: Proceedings of the 33rd international conference on software engineering

  • Tang Y, Tang Y, Gao Q, Gao Q, Qin F, Qin F (2008) LeakSurvivor: towards safely tolerating memory leaks for garbage-collected languages. In: USENIX 2008 annual technical conference on annual technical conference, pp 307–320

  • Tian Y, Lawall J, Lo D (2008) Identifying linux bug fixing patches. In: Proceedings of the international conference on software engineering

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470

  • Weyuker EJ, Ostrand TJ, Bell RM (2010) Comparing the effectiveness of several modeling methods for fault prediction. Empir Softw Eng 15(3):277–295

    Article  Google Scholar 

  • Wu R, Zhang H, Kim S, Cheung SC (2011) ReLink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 15–25

  • Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs? In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 26–36

  • Ying ATT, Murphy GC, Ng RT, Chu-Carroll M (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586

    Article  Google Scholar 

  • Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories, pp 93–102

Download references

Acknowledgements

We thank Luyang Wang and Yaoqiang Li for classifying some bug reports. We thank Shan Lu for the early discussion and feedback. The work is partially supported by the National Science and Engineering Research Council of Canada, the United States National Science Foundation, the United States Department of Energy, a Google gift grant, and an Intel gift grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Tan.

Additional information

Communicated by: Thomas Zimmermann

Appendices

Appendices

1.1 A Combining Two Data Sets

To leverage the randomly sampled bug reports studied in our prior work in 2005 (Li et al. 2006), each of the two bug report samples from Mozilla and Apache Bugzilla databases is combined from two random samples. The combination is performed in the following way to maintain the pure randomness of sampling. The goal is to ensure that the combined set of fixed bug reports is no different from a random sample of fixed bug reports on the entire Bugzilla databases now.

Figure 9 illustrates the combination approach. We randomly sampled 2X % of fixed bug reports in one Bugzilla database by the cutoff date of our prior work, referred to as Date1. Now we randomly select half of the 2X % of fixed bug reports, referred to as Set1; the other half is discarded. Note that Set1 is a random sample of X% of bug reports fixed by Date1. On our new sampling date (Table 1), denoted as Date2, we sample another X % of the fixed bug reports that were opened after Date1 and before Date2, denoted as Set2. We keep only half of the bug reports fixed by Date1 so that the sampled bug reports before Date1 and sampled bug reports after Date1 are in proportion to the bug reports belong to the two time ranges.

Fig. 9
figure 9

Combining Two Data Sets. Not Fixed denotes any status other than Fixed, e.g., reopened, invalid, etc. Old Data Set is the data set used in our previous work (Li et al. 2006) and New Data Set is the data set used in this paper

The status of a bug report may have changed since Date1 in the following two ways: (1) a fixed bug report by Date1 is no longer fixed by Date2; or (2) a unfixed bug report by Date1 is fixed by Date2. To compensate for these two scenarios, we identify all the fixed bug reports in Set1 that are no longer marked as fixed on Date2, denoted as Set3. Bug reports in Set3 should not be included in our sample, because if we take a random sample of fixed bug reports on Date2, those bug reports would not be sampled as they are not fixed. From the bug reports that are unfixed by Date1 but are fixed by Date2, we randomly sample X % of them, denoted as Set4. Our final random sample is the union of Set1, Set2, and Set4 with Set3 excluded.

Table 16 lists sizes of the four data sets for the three software projects. No combining is needed for the Linux kernel since it was sampled in 2010.

Table 16 Number of bug reports in Set1–4

There is no difference between our combined sample and a sample randomly picked from the fixed bug reports on Date2, as either is a random sample of X% on the population. Therefore, the combined sample is representative of fixed bug reports in the Bugzilla database. In addition, our results show the distributions of these sets are similar, meaning that the variance in different data sets is small, which increases the confidence and reproducibility of our results.

Developer may update bug reports after the bugs are fixed. Therefore, we check all bugs in Set1 to find out whether the later activities affect our classifications of the bugs. Fortunately, only 5 of those bug reports in Mozilla and none in Apache have activities after Date1. We manually read these 5 bug reports again; and find that those activities change the product, the QA contact, or the component of the bug reports, and do not change our original classifications. The component field used in the bug reports is finer-grained than the definition of component in Table 3. Therefore, the finer-grained component change in bug reports does not affect the higher-level component used in this paper.

1.2 B Bug Examples

Table 17 Bug examples for each category

1.3 C Detailed Numbers for the Figures

Table 18 Distribution of root causes with impacts (BugSet1)
Table 19 Distribution of root causes with impacts (BugSet1)
Table 20 Distribution of root causes with impacts (BugSet1)
Table 21 Distribution of bug impacts (BugSet1)
Table 22 Distribution of Mozilla and Apache bugs in software components (BugSet1)
Table 23 Distribution of Linux bugs in software components (BugSet1)
Table 24 Distributions of causes and impacts of security related bugs in NVD (BugSet2)
Table 25 Distributions of causes and impacts of security related bugs in NVD (BugSet2)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, L., Liu, C., Li, Z. et al. Bug characteristics in open source software. Empir Software Eng 19, 1665–1705 (2014). https://doi.org/10.1007/s10664-013-9258-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-013-9258-8

Keywords

Navigation