Abstract
Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithm for mining association rules in large databases. In: Proceedings of the 20th international conference on very large databases, pp 487–499. San Francisco
Ball T, Naik NM, Rajamani SK (2003) From symptom to cause: localizing errors in counterexample traces. In: Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on principles of programming languages, pp 97–105
Betttenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structual information from bug reports. In: Proceedings of the 5th international working conference on mining software repositories. Leipzig
Brodie M, Ma S, Rachevsky L, Champlin J (2005) Automatic problem determination using call-stack matching. J Netw Syst Manag 2:13
Connecting with customers (2012) http://www.microsoft.com/mscorp/execmail/2002/1002customers.mspx. Accessed 27 March 2012
Cosine Similarity (2013) http://en.wikipedia.org/wiki/Cosine_similarity. Access 27 October 2013
Chan B, Zou Y, Hassan AE, Sinha A (2009) Visualizing the results of field testing. In: Proceedings of the 18th international conference on program comprehension, pp 114–123. Minho
Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) ReBucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 2012 international conference on software engineering, pp 1084–1093. Zurich
Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: a case study of Mozilla Firefox. In: Proceedings of the 27th IEEE international conference on software maintenance. Williamsburg
Eric Wong W, Debroy V (2009) A survey of software fault localization. Technical Report UTDCS-45-09, Department of Computer Science, The University of Texas at Dallas
Firefox Stability Improvement (2012) http://blog.mozilla.com/metrics/2010/04/08/dramaticstabilityimprovementsinrefox/ http://blog.mozilla.com/metrics/2010/04/08/dramaticstabilityimprovementsinrefox/. Accessed 22 March 2012
Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: ten years of implementation and experience
Heyer L, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome research, vol 9, no 11, pp 1106–1115. Cold Spring Harbor Laboratory Press
Jones J, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering, pp 467–477. Orlando
Jones JA, Harrold MJ (2005) Empirical evaluation of the Tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM conference on automated software engineering, pp 273–282
Kim D, Wang X, Kim S, Zeller A, Cheung SC, Park S (2011) Which crashes should I fix first? Predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 3:37
Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: an aggregated view of multiple crashes to improve crash triage
Khomh F, Chan B, Zou Y, Hassan AE (2011) An entropy evaluation approach for triaging field crashes: a case study of Mozilla Firefox. In: Proceedings of the 18th working conference on reverse engineering. Lero, Limerick
Kruskal JB (1983) An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Review 25 (2):201–237
Lee W, Soffa ML (2010) Path-based fault correlation. In: Proceedings of the 8th ACM SIGSOFT international symposium on foundations of software engineering. Santa Fe, New Mexico
Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, pp 15–26. Chicago, Illinois
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Prentice Hall
Mozilla Crash Reporting Server (2012) http://crash-stats.mozilla.com/products/Firefox. Accessed 22 March 2012
Nessa S, Abedin M, Eric Wong W, Khan L, Qi Y (2008) Software fault localization using N-gram analysis. In: Proceedings of the 3rd international conference on wireless algorithms, systems, and applications, pp 548–559. LNCS
Podgurski A, Leon D, Francis PA, Masri W, Minch M, Sun J, Wang B (2003) Automated support for classifying software failure reports. In: Proceedings of the 25th international conference on software engineering, pp 465–475
Raghavan V, Wong M (1986) A critical analysis of vector space model for information retrieval. J Am Soc Inf Sci 37 (5):279–287
Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs? MSR 2010: 7th IEEE working conference on mining software repositories, pp 118–121
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. ACM SIGSOFT Softw Eng Notes 30 (4):1–5
Socorro: Mozilla’s Crash Reporting Server (2012) http://blog.mozilla.com/webdev/2010/05/19/socorro-mozilla-crash-reports/ http://blog.mozilla.com/webdev/2010/05/19/socorro-mozilla-crash-reports/. Accessed 22 March 2012
Sun C, Lo D, Wang X, Jiang J, Khoo S (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th international conference on software engineering, pp 45–54. Cape Town
Sun C, Lo D, Khoo S, Jiang J (2011) Toward more accurate retrieval of duplicate bug reports
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, pp 79–90
Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: Proceedings of the 10th IEEE working conference on mining software repositories, pp 247–256. San Francisco
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470. Leipzig
Web browsers (2012) (Global marketshare), Roxr Software Ltd., http://bit.ly/81klgi. Retrieved Accessed 12 Jan 2012
Yin RK (2002) Case study research: design and methods, 3rd edn. SAGE Publications
Acknowledgements
The authors would like to thank Tejinder Dhaliwal and Feng Zhang, of Queen’s University, for their help during data collection and for their many useful comments on this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Massimiliano Di Penta and Sung Kim
Rights and permissions
About this article
Cite this article
Wang, S., Khomh, F. & Zou, Y. Improving bug management using correlations in crash reports. Empir Software Eng 21, 337–367 (2016). https://doi.org/10.1007/s10664-014-9333-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-014-9333-9