Skip to main content
Log in

ChangeLocator: locate crash-inducing changes based on crash reports

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software crashes are severe manifestations of software bugs. Debugging crashing bugs is tedious and time-consuming. Understanding software changes that induce a crashing bug can provide useful contextual information for bug fixing and is highly demanded by developers. Locating the bug inducing changes is also useful for automatic program repair, since it narrows down the root causes and reduces the search space of bug fix location. However, currently there are no systematic studies on locating the software changes to a source code repository that induce a crashing bug reflected by a bucket of crash reports. To tackle this problem, we first conducted an empirical study on characterizing the bug inducing changes for crashing bugs (denoted as crash-inducing changes). We also propose ChangeLocator, a method to automatically locate crash-inducing changes for a given bucket of crash reports. We base our approach on a learning model that uses features originated from our empirical study and train the model using the data from the historical fixed crashes. We evaluated ChangeLocator with six release versions of Netbeans project. The results show that it can locate the crash-inducing changes for 44.7%, 68.5%, and 74.5% of the bugs by examining only top 1, 5 and 10 changes in the recommended list, respectively. It significantly outperforms the existing state-of-the-art approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. https://bugzilla.mozilla.org/show_bug.cgi?id=448608

  2. https://bugzilla.mozilla.org/show_bug.cgi?id=589191

  3. https://bugzilla.mozilla.org/show_bug.cgi?id=941044

  4. https://bugzilla.mozilla.org/show_bug.cgi?id=400291

  5. https://bugzilla.mozilla.org/show_bug.cgi?id=446630

References

  • Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007. IEEE, Piscataway, pp 89–98

  • Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735–739

    Article  Google Scholar 

  • An L, Khomh F (2015) An empirical study of crash-inducing commits in mozilla firefox. In: Proceedings of the 11th international conference on predictive models and data analytics in software engineering. ACM, New York, p 5

  • An L, Khomh F, Guéhéneuc Y-G (2017) An empirical study of crash-inducing commits in mozilla firefox. Softw Qual J, 1–32

  • Arcuri A, Yao X (2008) A novel co-evolutionary approach to automatic software bug fixing. In: 2008 IEEE Congress on evolutionary computation (IEEE world congress on computational intelligence). IEEE, Piscataway

  • Artzi S, Kim S, Ernst MD (2008) Recrash: making software failures reproducible by preserving object states. In: European conference on object-oriented programming, vol 8, pp 542–565

  • Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter 6(1):20–29

    Article  Google Scholar 

  • Bell J, Sarda N, Kaiser G (2013) Chronicler: lightweight recording to reproduce field failures. In: Proceedings of the 2013 international conference on software engineering. IEEE press, Piscataway, pp 362–371

  • Bug report list (2015) [online]. Available: https://bugzilla.mozilla.org/buglist.cgi?longdesc=regression%20range&longdesc_type=casesubstring&query_format=advanced&short_desc=crash&short_desc_type=allwordssubstr&order=bug_status%2cpriority%2cassigned_to%2cbug_id&limit=0

  • Cao Y, Zhang H, Ding S (2014) Symcrash: selective recording for reproducing crashes. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering. ACM, New York, pp 791–802

  • Mozilla crash reports (2015) [online]. Available: http://crashstats.mozilla.com

  • da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan A (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657

    Article  Google Scholar 

  • Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) Rebucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 34th international conference on software engineering. IEEE press, Piscataway, pp 1084–1093

  • Dit B, Revelle M, Gethers M, Poshyvanyk D (2013) Feature location in source code: a taxonomy and survey. Journal of software: Evolution and Process 25 (1):53–95

    Google Scholar 

  • Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles. ACM, New York, pp 103–116

  • Jin W, Orso A (2012) Bugredux: reproducing field failures for in-house debugging. In: Proceedings of the 34th international conference on software engineering. IEEE, Piscataway, pp 474–484

  • Jones JA, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering. ACM, New York, pp 467–477

  • Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Kim S, Pan K, Whitehead EJ Jr (2006) Micro pattern evolution. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, New York, pp 40–46

  • Kim S, Zimmermann T, Pan K, James E Jr et al (2006) Automatic identification of bug-introducing changes. In: Proceedings of the 21st IEEE/ACM international conference on automated software engineering. IEEE, Piscataway, pp 81–90

  • Kim S, Zimmermann T, Whitehead EJ Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering. IEEE computer society, Washington, pp 489–498

  • Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  • Kim D, Wang X, Kim S, Zeller A, Cheung S-C, Park S (2011) Which crashes should i fix first?: predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 37(3):430–447

    Article  Google Scholar 

  • Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33rd international conference on software engineering. IEEE, Piscataway, pp 481–490

  • Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: an aggregated view of multiple crashes to improve crash triage. In: 2011 IEEE/IFIP 41St international conference on dependable systems & networks. IEEE, Piscataway, pp 486–493

  • Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Data preprocessing for supervised leaning. Int J Comput Sci 1(2):111–117

    Google Scholar 

  • Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 1:38

    Google Scholar 

  • Liblit B, Aiken A, Zheng AX, Jordan MI (2003) Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation. ACM, New York, pp 141–154

  • Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: ACM SIGPLAN Notices, vol 40, no 6. ACM, New York, pp 15–26

  • Mani I, Zhang I (2003) knn approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other, the annals of mathematical statistics

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Moran K, Linares-Vásquez M, Bernal-Cárdenas C, Vendome C, Poshyvanyk D (2016) Automatically discovering, reporting and reproducing android application crashes. In: 2016 IEEE international conference on software testing, verification and validation. IEEE, Piscataway, pp 33–44

  • Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: 2014 IEEE International conference on software maintenance and evolution. IEEE, Piscataway, pp 151–160

  • Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering. ACM, New York, pp 181–190

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering. IEEE, Piscataway, pp 284–292

  • Nallapati R (2004) Discriminative models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 64–71

  • Netbeans bugzilla (2015) [online]. Available: https://netbeans.org/bugzilla

  • Netbeans exception reports (2015) [online]. Available: http://statistics.netbeans.org/analytics/list.do?query

  • Netbeans report exception faqs (2015) [online]. Available: http://wiki.netbeans.org/usecases

  • Netbeans source code repository (2015) [online]. Available: http://hg.netbeans.org

  • Technical note tn2123: Crashreporter (2015) [online]. Available: http://developer.apple.com/library/mac/#technotes/tn2004/tn2123.html

  • Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, New York, pp 199–209

  • Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI 2004: advances In artificial intelligence, vol 4, pp 312–321

  • Regression range (2015) [online]. Available: https://wiki.mozilla.org/firefox_OS/performance/bisecting_regressions

  • Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories. ACM, New York, pp 43–52

  • Robertson SE, Jones KS (1976) Relevance weighting of search terms. Journal of the Association for Information Science and Technology 27(3):129–146

    Google Scholar 

  • Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: 2013 IEEE/ACM 28Th international conference on automated software engineering. IEEE, Piscataway, pp 345–355

  • Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: 2010 7Th IEEE working conference on mining software repositories. IEEE, Piscataway, pp 118–121

  • Seo H, Kim S (2012) Predicting recurring crash stacks. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering. ACM, New York, pp 180–189

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? ACM sigsoft software engineering notes 30(4):1–5

    Article  Google Scholar 

  • Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 11–18

  • Venkatesh GA (1991) The semantic approach to program slicing. In: ACM SIGPLAN Notices, vol 26, no 6. ACM, New York, pp 107–119

  • Wang S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension. ACM, New York, pp 53–63

  • Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of the 2015 international symposium on software testing and analysis. ACM, New York, pp 1–11

  • Wang S, Khomh F, Zou Y (2016) Improving bug management using correlations in crash reports. Empir Softw Eng 21(2):337–367

    Article  Google Scholar 

  • Weimer W, Forrest S, Le Goues C, Nguyen T (2010) Automatic program repair with evolutionary computation. Commun ACM 53(5):109–116

    Article  Google Scholar 

  • Weka (2016) [online]. Available: http://www.cs.waikato.ac.nz/ml/weka

  • Wen M, Wu R, Cheung S-C (2016) Locus: locating bugs from software changes. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, New York

  • White M, Linares-Vásquez M, Johnson P, Bernal-Cárdenas C, Poshyvanyk D (2015) Generating reproducible and replayable bug reports from android application crashes. In: 2015 IEEE 23Rd international conference on program comprehension. IEEE, Piscataway, pp 48–59

  • Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington

    Google Scholar 

  • Wong C-P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, Piscataway, pp 181–190

  • Wu R, Zhang H, Kim S, Cheung S-C (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering. ACM, New York, pp 15–25

  • Wu R, Zhang H, Cheung S-C, Kim S (2014) Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 international symposium on software testing and analysis. ACM, New York, pp 204–214

  • Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, pp 689–699

  • Zeller A (1999) Yesterday, my program worked. today, it does not. why?. In: ACM SIGSOFT Software engineering notes, vol 24, no 6. Springer, Berlin, pp 253–267

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th international conference on software engineering. IEEE, Piscataway, pp 14–24

Download references

Acknowledgments

We thank anonymous reviewers for their insightful comments. This research is supported by Hong Kong SAR RGC/GRF grant 16202917, NSFC grant 61272089, and 2016 Microsoft Research Asia Collaborative Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongxin Wu.

Additional information

Communicated by: Martin Monperrus and Westley Weimer

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, R., Wen, M., Cheung, SC. et al. ChangeLocator: locate crash-inducing changes based on crash reports. Empir Software Eng 23, 2866–2900 (2018). https://doi.org/10.1007/s10664-017-9567-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9567-4

Keywords

Navigation