ChangeLocator: locate crash-inducing changes based on crash reports

Wu, Rongxin; Wen, Ming; Cheung, Shing-Chi; Zhang, Hongyu

doi:10.1007/s10664-017-9567-4

ChangeLocator: locate crash-inducing changes based on crash reports

Published: 11 November 2017

Volume 23, pages 2866–2900, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Rongxin Wu¹,
Ming Wen¹,
Shing-Chi Cheung¹ &
…
Hongyu Zhang²

1330 Accesses
31 Citations
2 Altmetric
Explore all metrics

Abstract

Software crashes are severe manifestations of software bugs. Debugging crashing bugs is tedious and time-consuming. Understanding software changes that induce a crashing bug can provide useful contextual information for bug fixing and is highly demanded by developers. Locating the bug inducing changes is also useful for automatic program repair, since it narrows down the root causes and reduces the search space of bug fix location. However, currently there are no systematic studies on locating the software changes to a source code repository that induce a crashing bug reflected by a bucket of crash reports. To tackle this problem, we first conducted an empirical study on characterizing the bug inducing changes for crashing bugs (denoted as crash-inducing changes). We also propose ChangeLocator, a method to automatically locate crash-inducing changes for a given bucket of crash reports. We base our approach on a learning model that uses features originated from our empirical study and train the model using the data from the historical fixed crashes. We evaluated ChangeLocator with six release versions of Netbeans project. The results show that it can locate the crash-inducing changes for 44.7%, 68.5%, and 74.5% of the bugs by examining only top 1, 5 and 10 changes in the recommended list, respectively. It significantly outperforms the existing state-of-the-art approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Test case selection and prioritization using machine learning: a systematic literature review

Article 14 December 2021

Rongqi Pan, Mojtaba Bagherzadeh, … Lionel Briand

Software defect prediction: future directions and challenges

Article 27 February 2024

Zhiqiang Li, Jingwen Niu & Xiao-Yuan Jing

Notes

References

Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007. IEEE, Piscataway, pp 89–98
Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735–739
Article Google Scholar
An L, Khomh F (2015) An empirical study of crash-inducing commits in mozilla firefox. In: Proceedings of the 11th international conference on predictive models and data analytics in software engineering. ACM, New York, p 5
An L, Khomh F, Guéhéneuc Y-G (2017) An empirical study of crash-inducing commits in mozilla firefox. Softw Qual J, 1–32
Arcuri A, Yao X (2008) A novel co-evolutionary approach to automatic software bug fixing. In: 2008 IEEE Congress on evolutionary computation (IEEE world congress on computational intelligence). IEEE, Piscataway
Artzi S, Kim S, Ernst MD (2008) Recrash: making software failures reproducible by preserving object states. In: European conference on object-oriented programming, vol 8, pp 542–565
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter 6(1):20–29
Article Google Scholar
Bell J, Sarda N, Kaiser G (2013) Chronicler: lightweight recording to reproduce field failures. In: Proceedings of the 2013 international conference on software engineering. IEEE press, Piscataway, pp 362–371
Bug report list (2015) [online]. Available: https://bugzilla.mozilla.org/buglist.cgi?longdesc=regression%20range&longdesc_type=casesubstring&query_format=advanced&short_desc=crash&short_desc_type=allwordssubstr&order=bug_status%2cpriority%2cassigned_to%2cbug_id&limit=0
Cao Y, Zhang H, Ding S (2014) Symcrash: selective recording for reproducing crashes. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering. ACM, New York, pp 791–802
Mozilla crash reports (2015) [online]. Available: http://crashstats.mozilla.com
da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan A (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657
Article Google Scholar
Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) Rebucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 34th international conference on software engineering. IEEE press, Piscataway, pp 1084–1093
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013) Feature location in source code: a taxonomy and survey. Journal of software: Evolution and Process 25 (1):53–95
Google Scholar
Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles. ACM, New York, pp 103–116
Jin W, Orso A (2012) Bugredux: reproducing field failures for in-house debugging. In: Proceedings of the 34th international conference on software engineering. IEEE, Piscataway, pp 474–484
Jones JA, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering. ACM, New York, pp 467–477
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Article Google Scholar
Kim S, Pan K, Whitehead EJ Jr (2006) Micro pattern evolution. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, New York, pp 40–46
Kim S, Zimmermann T, Pan K, James E Jr et al (2006) Automatic identification of bug-introducing changes. In: Proceedings of the 21st IEEE/ACM international conference on automated software engineering. IEEE, Piscataway, pp 81–90
Kim S, Zimmermann T, Whitehead EJ Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering. IEEE computer society, Washington, pp 489–498
Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Article Google Scholar
Kim D, Wang X, Kim S, Zeller A, Cheung S-C, Park S (2011) Which crashes should i fix first?: predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 37(3):430–447
Article Google Scholar
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33rd international conference on software engineering. IEEE, Piscataway, pp 481–490
Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: an aggregated view of multiple crashes to improve crash triage. In: 2011 IEEE/IFIP 41St international conference on dependable systems & networks. IEEE, Piscataway, pp 486–493
Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Data preprocessing for supervised leaning. Int J Comput Sci 1(2):111–117
Google Scholar
Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 1:38
Google Scholar
Liblit B, Aiken A, Zheng AX, Jordan MI (2003) Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation. ACM, New York, pp 141–154
Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: ACM SIGPLAN Notices, vol 40, no 6. ACM, New York, pp 15–26
Mani I, Zhang I (2003) knn approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other, the annals of mathematical statistics
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Book MATH Google Scholar
Moran K, Linares-Vásquez M, Bernal-Cárdenas C, Vendome C, Poshyvanyk D (2016) Automatically discovering, reporting and reproducing android application crashes. In: 2016 IEEE international conference on software testing, verification and validation. IEEE, Piscataway, pp 33–44
Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: 2014 IEEE International conference on software maintenance and evolution. IEEE, Piscataway, pp 151–160
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering. ACM, New York, pp 181–190
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering. IEEE, Piscataway, pp 284–292
Nallapati R (2004) Discriminative models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 64–71
Netbeans bugzilla (2015) [online]. Available: https://netbeans.org/bugzilla
Netbeans exception reports (2015) [online]. Available: http://statistics.netbeans.org/analytics/list.do?query
Netbeans report exception faqs (2015) [online]. Available: http://wiki.netbeans.org/usecases
Netbeans source code repository (2015) [online]. Available: http://hg.netbeans.org
Technical note tn2123: Crashreporter (2015) [online]. Available: http://developer.apple.com/library/mac/#technotes/tn2004/tn2123.html
Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, New York, pp 199–209
Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI 2004: advances In artificial intelligence, vol 4, pp 312–321
Regression range (2015) [online]. Available: https://wiki.mozilla.org/firefox_OS/performance/bisecting_regressions
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories. ACM, New York, pp 43–52
Robertson SE, Jones KS (1976) Relevance weighting of search terms. Journal of the Association for Information Science and Technology 27(3):129–146
Google Scholar
Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: 2013 IEEE/ACM 28Th international conference on automated software engineering. IEEE, Piscataway, pp 345–355
Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: 2010 7Th IEEE working conference on mining software repositories. IEEE, Piscataway, pp 118–121
Seo H, Kim S (2012) Predicting recurring crash stacks. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering. ACM, New York, pp 180–189
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? ACM sigsoft software engineering notes 30(4):1–5
Article Google Scholar
Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 11–18
Venkatesh GA (1991) The semantic approach to program slicing. In: ACM SIGPLAN Notices, vol 26, no 6. ACM, New York, pp 107–119
Wang S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension. ACM, New York, pp 53–63
Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of the 2015 international symposium on software testing and analysis. ACM, New York, pp 1–11
Wang S, Khomh F, Zou Y (2016) Improving bug management using correlations in crash reports. Empir Softw Eng 21(2):337–367
Article Google Scholar
Weimer W, Forrest S, Le Goues C, Nguyen T (2010) Automatic program repair with evolutionary computation. Commun ACM 53(5):109–116
Article Google Scholar
Weka (2016) [online]. Available: http://www.cs.waikato.ac.nz/ml/weka
Wen M, Wu R, Cheung S-C (2016) Locus: locating bugs from software changes. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, New York
White M, Linares-Vásquez M, Johnson P, Bernal-Cárdenas C, Poshyvanyk D (2015) Generating reproducible and replayable bug reports from android application crashes. In: 2015 IEEE 23Rd international conference on program comprehension. IEEE, Piscataway, pp 48–59
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Google Scholar
Wong C-P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, Piscataway, pp 181–190
Wu R, Zhang H, Kim S, Cheung S-C (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering. ACM, New York, pp 15–25
Wu R, Zhang H, Cheung S-C, Kim S (2014) Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 international symposium on software testing and analysis. ACM, New York, pp 204–214
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, pp 689–699
Zeller A (1999) Yesterday, my program worked. today, it does not. why?. In: ACM SIGSOFT Software engineering notes, vol 24, no 6. Springer, Berlin, pp 253–267
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th international conference on software engineering. IEEE, Piscataway, pp 14–24

Download references

Acknowledgments

We thank anonymous reviewers for their insightful comments. This research is supported by Hong Kong SAR RGC/GRF grant 16202917, NSFC grant 61272089, and 2016 Microsoft Research Asia Collaborative Research Program.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Rongxin Wu, Ming Wen & Shing-Chi Cheung
School of Electrical Engineering and Computing, The University of Newcastle, Newcastle, Australia
Hongyu Zhang

Authors

Rongxin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Wen
View author publications
You can also search for this author in PubMed Google Scholar
Shing-Chi Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rongxin Wu.

Additional information

Communicated by: Martin Monperrus and Westley Weimer

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, R., Wen, M., Cheung, SC. et al. ChangeLocator: locate crash-inducing changes based on crash reports. Empir Software Eng 23, 2866–2900 (2018). https://doi.org/10.1007/s10664-017-9567-4

Download citation

Published: 11 November 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10664-017-9567-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ChangeLocator: locate crash-inducing changes based on crash reports

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Test case selection and prioritization using machine learning: a systematic literature review

Software defect prediction: future directions and challenges

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ChangeLocator: locate crash-inducing changes based on crash reports

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Test case selection and prioritization using machine learning: a systematic literature review

Software defect prediction: future directions and challenges

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation