Skip to main content
Log in

Structured information in bug report descriptions—influence on IR-based bug localization and developers

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Multiple information retrieval (IR)-based bug localization techniques have been proposed over the last years. The foundation of the approaches relies on textual similarity of the bug report description and the source code files. The basic assumption is that these descriptions are well suited to query the code base. However, often bug reports contain structured information such as stack traces and source code next to natural language, which might interfere with the initial belief. In this paper, we systematically analyze the influence of structured information on IR-based techniques. Therefore, an empirical study on 7334 bug reports, out of which more than 30% contain structured information, was carried out. Based on the results, a follow-up user study was conducted focusing on source code fragments found in bug reports. Our results show that stack traces tend to negatively affect IR-based bug localization performance and require special handling. Compared to natural language–only reports, source code is beneficial for IR-based algorithms, as well as for developers to identify false positives in bug localization results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://issues.jboss.org/browse/JBSEAM-4132

  2. https://bit.ly/2UW6a58

  3. https://issues.jboss.org/browse/JBRULES-805

  4. https://issues.apache.org/jira/browse/GROOVY-5888

  5. https://issues.jboss.org/browse/ISPN-552

  6. https://issues.jboss.org/browse/JBSEAM-1918

  7. https://bit.ly/2GG8nh2

  8. https://issues.apache.org/jira/browse/GROOVY-580

  9. CTRL-f is a common keyboard shortcut to trigger built-in search functionality of software systems.

References

  • AmaLgam (2017). AmaLgam website. https://sites.google.com/site/wswshaoweiwang/.

  • Bacchelli, A., Cleve, A., Lanza, M., Mocci, A. (2011). Extracting structured data from natural language documents with island parsing. In: International Conference on Automated Software Engineering (ASE.

  • Bassil, S., & Keller, R.K. (2001). Software visualization tools: survey and analysis. In: 9th International Workshop on Program Comprehension (IWPC 2001), 12-13 May 2001, Toronto, Canada.

  • Bettenburg, N, Premraj, R, Zimmermann, T, Kim, S. (2008). Extracting structural information from bug reports. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR 2008.

  • BLUiR. (2017). BLUiR website. http://www.riponsaha.com/BLUiR.html.

  • Cliff, N. (1993). Dominance statistics: ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3), 494.

    Article  Google Scholar 

  • Collberg, CS, Kobourov, SG, Nagra, J, Pitts, J, Wampler, K. (2003). A system for graph-based visualization of the evolution of software. In: Proceedings ACM 2003 Symposium on Software Visualization, San Diego, California, USA, June 11-13, 2003.

  • Eick, S.G., Steffen, J.L., Sumner, E.E. Jr. (1992). Seesoft-a tool for visualizing line oriented software statistics. IEEE Trans Software Eng.

  • Git SCM. (2018). Git SCM. http://www.git-scm.com.

  • Gouveia, C, Campos, J, Abreu, R. (2013). Using HTML5 visualizations in software fault localization. In: 2013 First IEEE Working Conference on Software Visualization (VISSOFT), Eindhoven, The Netherlands, pp. 1–10.

  • Grissom, RJ, & Kim, JJ. (2012). Effect sizes for research: univariate and multivariate applications. Routledge: Taylor & Francis Group.

    Google Scholar 

  • JIRA. (2018). Jira issue tracking software. http://www.jira.com.

  • Kagdi, HH, Collard, ML, Maletic, JI. (2007). A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance 19(2).

  • Kruskal, WH, & Wallis, WA. (1952). Use of ranks in one-criterion variance analysis. Journal of the American statistical Association, 47(260), 583–621.

    Article  MATH  Google Scholar 

  • Lukins, SK, Kraft, NA, Etzkorn, LH. (2010). Bug localization using latent Dirichlet allocation. Information & Software Technology 52(9).

  • Mann, H.B., & Whitney, D.R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pp. 50–60.

  • Marcus, A, & Maletic, JI. (2003). Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th International Conference on Software Engineering.

  • Moreno, L, Treadway, JJ, Marcus, A, Shen, W. (2014). On the use of stack traces to improve text retrieval-based bug localization. In: 30th IEEE Int. Conference on Software Maintenance and Evolution.

  • Parnin, C., & Orso, A. (2011). Are automated debugging techniques actually helping programmers? In: Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA.

  • Rath, M., & Mäder, P. (2018). Replication data for: structured information in bug report descriptions — influence on IR-based bug localization and developers. https://bit.ly/2Es9hfT.

  • Rath, M, Rempel, P, Mȧder, P. (2017). The IlmSeven dataset. In: 25th IEEE International Requirements Engineering Conference, RE.

  • Rath, M, Lo, D, Mäder, P. (2018). Analyzing requirements and traceability information to improve bug localization. In 15th IEEE/ACM Working Conference on Mining Software Repositories MSR 2018. Gothenburg: ACM.

  • Reps, TW, Ball, T, Das, M, Larus, JR. (1997). The use of program profiling for software maintenance with applications to the year 2000 problem. In: Software Engineering - ESEC/FSE ’97, 6th European Software Engineering Conference Held Jointly with the 5th ACM SIGSOFT Symposium on Foundations of Software Engineering, Zurich, Switzerland.

  • Saha, RK, Lease, M, Khurshid, S, Perry, DE. (2013). Improving bug localization using structured information retrieval. In: 28th IEEE/ACM Int. Conference on Automated Software Engineering, ASE 2013.

  • Storey, MD, Cubranic, D, Germȧn, D.M. (2005). On the use of visualization to support awareness of human activities in software development: a survey and a framework. In: Proceedings of the ACM 2005 Symposium on Software Visualization, St. Louis, Missouri, USA, May 14-15, 2005.

  • Wang, Q, Parnin, C, Orso, A. (2015). Evaluating the usefulness of IR-based fault localization techniques. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015.

  • Wang, S, & Lo, D. (2014). Version history, similar report, and structure: putting them together for improved bug localization. In: 22nd International Conference on Program Comprehension, ICPC 2014.

  • Wang, S, & Lo, D. (2016). Amalgam+: composing rich information sources for accurate bug localization. Journal of Software: Evolution and Process 28(10).

  • Wen, M, Wu, R, Cheung, S. (2016) In Lo, D, Apel, S, Khurshid, S (Eds.), Locus: locating bugs from software changes, (pp. 262–273). Singapore: ACM. https://doi.org/10.1145/2970276.2970359.

  • Wong, C, Xiong, Y, Zhang, H, Hao, D, Zhang, L, Mei, H. (2014). Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: IEEE International Conference on Software Maintenance and Evolution.

  • Xia, X, Bao, L, Lo, D, Li, S. (2016). Automated debugging considered harmful considered harmful: a user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME.

  • Xie, X, Liu, Z, Song, S, Chen, Z, Xuan, J, Xu, B. (2016). Revisit of automatic debugging via human focus-tracking analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE.

  • Ye, X, Shen, H, Ma, X, Bunescu, RC, Liu, C. (2016). From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016.

  • Zhou, J, Zhang, H, Lo, D. (2012). Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th Int. Conf on Software Engineering, ICSE 2012.

Download references

Acknowledgements

We thank Mihaela Todorova Tomova and Mario Janke for their assistance in conducting the user study.

Funding

Our work is funded by the BMBF grant: 01IS16003B, DFG grant: MA 5030/3–1, the EU EFRE/TAB grant: 2015FE9033, and DLR grant: D/943/67258261.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Rath.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rath, M., Mäder, P. Structured information in bug report descriptions—influence on IR-based bug localization and developers. Software Qual J 27, 1315–1337 (2019). https://doi.org/10.1007/s11219-019-09445-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-019-09445-6

Keywords

Navigation