Skip to main content
Log in

Improving bug management using correlations in crash reports

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.microsoft.com/en-ca/default.aspx

  2. http://www.mozilla.org/en-US/

  3. http://msr.uwaterloo.ca/msr2008/challenge/

  4. http://wordnet.princeton.edu/

  5. http://mvnrepository.com/artifact/edu.washington.cs.knowitall/morpha-stemmer

  6. http://eigen.tuxfamily.org/index.php?title=Bugzilla

References

  • Agrawal R, Srikant R (1994) Fast algorithm for mining association rules in large databases. In: Proceedings of the 20th international conference on very large databases, pp 487–499. San Francisco

  • Ball T, Naik NM, Rajamani SK (2003) From symptom to cause: localizing errors in counterexample traces. In: Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on principles of programming languages, pp 97–105

  • Betttenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structual information from bug reports. In: Proceedings of the 5th international working conference on mining software repositories. Leipzig

  • Brodie M, Ma S, Rachevsky L, Champlin J (2005) Automatic problem determination using call-stack matching. J Netw Syst Manag 2:13

    Google Scholar 

  • Connecting with customers (2012) http://www.microsoft.com/mscorp/execmail/2002/1002customers.mspx. Accessed 27 March 2012

  • Cosine Similarity (2013) http://en.wikipedia.org/wiki/Cosine_similarity. Access 27 October 2013

  • Chan B, Zou Y, Hassan AE, Sinha A (2009) Visualizing the results of field testing. In: Proceedings of the 18th international conference on program comprehension, pp 114–123. Minho

  • Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) ReBucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 2012 international conference on software engineering, pp 1084–1093. Zurich

  • Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: a case study of Mozilla Firefox. In: Proceedings of the 27th IEEE international conference on software maintenance. Williamsburg

  • Eric Wong W, Debroy V (2009) A survey of software fault localization. Technical Report UTDCS-45-09, Department of Computer Science, The University of Texas at Dallas

  • Firefox Stability Improvement (2012) http://blog.mozilla.com/metrics/2010/04/08/dramaticstabilityimprovementsinrefox/ http://blog.mozilla.com/metrics/2010/04/08/dramaticstabilityimprovementsinrefox/. Accessed 22 March 2012

  • Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: ten years of implementation and experience

  • Heyer L, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome research, vol 9, no 11, pp 1106–1115. Cold Spring Harbor Laboratory Press

  • Jones J, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering, pp 467–477. Orlando

  • Jones JA, Harrold MJ (2005) Empirical evaluation of the Tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM conference on automated software engineering, pp 273–282

  • Kim D, Wang X, Kim S, Zeller A, Cheung SC, Park S (2011) Which crashes should I fix first? Predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 3:37

    Google Scholar 

  • Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: an aggregated view of multiple crashes to improve crash triage

  • Khomh F, Chan B, Zou Y, Hassan AE (2011) An entropy evaluation approach for triaging field crashes: a case study of Mozilla Firefox. In: Proceedings of the 18th working conference on reverse engineering. Lero, Limerick

  • Kruskal JB (1983) An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Review 25 (2):201–237

    Article  MathSciNet  MATH  Google Scholar 

  • Lee W, Soffa ML (2010) Path-based fault correlation. In: Proceedings of the 8th ACM SIGSOFT international symposium on foundations of software engineering. Santa Fe, New Mexico

  • Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, pp 15–26. Chicago, Illinois

  • Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Prentice Hall

  • Mozilla Crash Reporting Server (2012) http://crash-stats.mozilla.com/products/Firefox. Accessed 22 March 2012

  • Nessa S, Abedin M, Eric Wong W, Khan L, Qi Y (2008) Software fault localization using N-gram analysis. In: Proceedings of the 3rd international conference on wireless algorithms, systems, and applications, pp 548–559. LNCS

  • Podgurski A, Leon D, Francis PA, Masri W, Minch M, Sun J, Wang B (2003) Automated support for classifying software failure reports. In: Proceedings of the 25th international conference on software engineering, pp 465–475

  • Raghavan V, Wong M (1986) A critical analysis of vector space model for information retrieval. J Am Soc Inf Sci 37 (5):279–287

    Article  Google Scholar 

  • Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs? MSR 2010: 7th IEEE working conference on mining software repositories, pp 118–121

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. ACM SIGSOFT Softw Eng Notes 30 (4):1–5

    Article  Google Scholar 

  • Socorro: Mozilla’s Crash Reporting Server (2012) http://blog.mozilla.com/webdev/2010/05/19/socorro-mozilla-crash-reports/ http://blog.mozilla.com/webdev/2010/05/19/socorro-mozilla-crash-reports/. Accessed 22 March 2012

  • Sun C, Lo D, Wang X, Jiang J, Khoo S (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th international conference on software engineering, pp 45–54. Cape Town

  • Sun C, Lo D, Khoo S, Jiang J (2011) Toward more accurate retrieval of duplicate bug reports

  • Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, pp 79–90

  • Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: Proceedings of the 10th IEEE working conference on mining software repositories, pp 247–256. San Francisco

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470. Leipzig

  • Web browsers (2012) (Global marketshare), Roxr Software Ltd., http://bit.ly/81klgi. Retrieved Accessed 12 Jan 2012

  • Yin RK (2002) Case study research: design and methods, 3rd edn. SAGE Publications

Download references

Acknowledgements

The authors would like to thank Tejinder Dhaliwal and Feng Zhang, of Queen’s University, for their help during data collection and for their many useful comments on this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaohua Wang.

Additional information

Communicated by Massimiliano Di Penta and Sung Kim

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Khomh, F. & Zou, Y. Improving bug management using correlations in crash reports. Empir Software Eng 21, 337–367 (2016). https://doi.org/10.1007/s10664-014-9333-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9333-9

Keywords

Navigation