Are datasets for information retrieval-based bug localization techniques trustworthy?

Kim, Misoo; Lee, Eunseok

doi:10.1007/s10664-021-09946-8

Are datasets for information retrieval-based bug localization techniques trustworthy?

Impact analysis of bug types on IRBL

Published: 19 March 2021

Volume 26, article number 35, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

1122 Accesses
12 Citations
Explore all metrics

Abstract

Various evaluation datasets are used to evaluate the performance of information retrieval-based bug localization (IRBL) techniques. To accurately evaluate the IRBL and furthermore improve the performance, it is strongly required to analyze the validity of these datasets in advance. To this end, we surveyed 50 previous studies, collected 41,754 bug reports, and found out critical problems that affect the validity of results of performance evaluation. They are in both the ground truth and the search space. These problems arise from using different bug types without clearly distinguishing them. We divided the bugs into production- and test-related bugs. Based on this distinction, we investigate and analyze the impact of the bug type on IRBL performance evaluation. Approximately 18.6% of the bug reports were linked to non-buggy files as the ground truth. Up to 58.5% of the source files in the search space introduced noise into the localization of a specific bug type. From the experiments, we validated that the average precision changed in approximately 90% of the bug reports linked with an incorrect ground truth; we determined that specifying a suitable search space changed the average precision in at least half of the bug reports. Further, we showed that these problems can alter the relative ranks of the IRBL techniques. Our large-scale analysis demonstrated that a significant amount of noise occurs, which can compromise the evaluation results. An important finding of this study is that it is essential to consider the bug types to improve the accuracy of the performance evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Sebastian Baltes & Paul Ralph

Software defect prediction: future directions and challenges

Article 27 February 2024

Zhiqiang Li, Jingwen Niu & Xiao-Yuan Jing

Notes

References

Ali N, Sabane A, Gueheneuc Y G, Antoniol G (2012) Improving bug location using binary class relationships. In: Proceedings of the international working conference on source code analysis and manipulation (SCAM). IEEE, pp 174–183
Almhana R, Mkaouer W, Kessentini M, Ouni A (2016) Recommending relevant classes for bug reports using multi-objective search. In: Proceedings of the international conference on automated software engineering (ASE). ACM, pp 286–295
Beck K (2003) Test-driven development: by example. Addison-Wesley Professional
Beizer B (2003) Software testing techniques. Dreamtech
Catolino G, Palomba F, Zaidman A, Ferrucci F (2019) Not all bugs are the same: Understanding, characterizing, and classifying bug types. Journal of Systems and Software (JSS)
Chaparro O, Florez J M, Marcus A (2017) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 376–387
Chaparro O, Florez J M, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empirical Software Engineering (ESE), pp 1–61
Charette R N (2005) Why software fails [software failure]. Spectrum 42(9):42–49
Article Google Scholar
Corder G W, Foreman DI (2014) Nonparametric statistics: A step-by-step approach. John Wiley & Sons
Dallmeier V, Zimmermann T (2007) Extraction of bug localization benchmarks from history. In: Proceedings of the international conference on automated software engineering (ASE). ACM, pp 433–436
Davies S, Roper M (2013) Bug localisation through diverse sources of information. In: Proceedings of the international symposium on software reliability engineering workshops (ISSREW). IEEE, pp 126–131
Davies S, Roper M, Wood M (2012) Using bug report similarity to enhance bug localisation. In: Proceedings of the working conference on the reverse engineering (WCRE). IEEE, pp 125–134
Dillman DA, Smyth JD, Christian LM (2014) Internet, phone, mail, and mixed-mode surveys: the tailored design method. John Wiley & Sons
Dilshener T, Wermelinger M, Yu Y (2016) Locating bugs without looking back. In: Proceedings of the international conference on mining software repositories (MSR). ACM, pp 286–290
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013) Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process 25(1):53–95
Google Scholar
Garnier M, Ferreira I, Garcia A (2017) On the influence of program constructs on bug localization effectiveness. J Softw Eng Res Dev 5(1):6
Article Google Scholar
Garousi V, Küçük B (2018) Smells in software test code: A survey of knowledge in industry and academia. Journal of Systems and Software (JSS) 138:52–81
Article Google Scholar
Garousi V, Kucuk B, Felderer M (2018) What we know about smells in software test code. Software
Grottke M, Trivedi K S (2005) A classification of software faults. J Reliab Eng Assoc Japan 27(7):425–438
Google Scholar
Grottke M, Trivedi K S (2007) Fighting bugs: Remove, retry, replicate, and rejuvenate. Computer 40(2)
Grottke M, Nikora A P, Trivedi KS (2010) An empirical investigation of fault types in space mission system software. In: Proceedings of the international conference on dependable systems and networks (DSN). IEEE/IFIP, pp 447–456
Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE). IEEE Press, pp 842–851
Herzig K, Zeller A (2013) The impact of tangled code changes. In: Proceedings of the working conference on mining software repositories (MSR). IEEE, pp 121–130
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the international conference on software engineering (ICSE). IEEE Press, pp 392–401
Hill E, Rao S, Kak A (2012) On the use of stemming for concern location and bug localization in java. In: Proceedings of the international working conference on source code analysis and manipulation (SCAM). IEEE, pp 184–193
Huo X, Thung F, Li M, Lo D, Shi ST (2019) Deep transfer bug localization. IEEE Transactions on software engineering
Khatiwada S, Tushev M, Mahmoud A (2018) Just enough semantics: an information theoretic approach for ir-based software bug localization. Information and Software Technology (IST) 93:45–57
Article Google Scholar
Kim D, Zeller A, Tao Y, Kim S (2013) Where should we fix this bug?: A two-phase recommendation model. Trans Softw Eng (TSE) 99(1):1
Google Scholar
Kim M, Lee E (2018) Are information retrieval-based bug localization techniques trustworthy?. In: Proceedings of the international conference on software engineering: Companion Proceeedings (ICSE-C), ACM, ICSE ’18, pp 248–249
Kim M, Lee E (2019) A novel approach to automatic query reformulation for ir-based bug localization. In: Proceedings of the symposium on applied computing (SAC). ACM, pp 1752–1759
Kim M, Lee E (2020) Manq: Many-objective optimization-based automatic query reduction for ir-based bug localization. Information and Software Technology, pp 106334
Kochhar P S, Tian Y, Lo D (2014) Potential biases in bug localization: Do they matter?. In: Proceedings of the international conference on automated software engineering (ASE). ACM, pp 803–814
Kochhar P S, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the international symposium on software testing and analysis (ISSTA). ACM, pp 165–176
Koyuncu A, Bissyandé TF, Kim D, Liu K, Klein J, Monperrus M, Traon YL (2019) D&c: A divide-and-conquer approach to ir-based bug localization. arXiv:190202703
Labuschagne A, Inozemtseva L, Holmes R (2017) Measuring the cost of regression testing in practice: a study of java projects using continuous integration. In: Proceedings of the joint meeting on foundations of software engineering (FSE). ACM, pp 821–830
Lam A N, Nguyen A T, Nguyen H A, Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports (n). In: Proceedings of the international conference on automated software engineering (ASE). IEEE, pp 476–481
Lam A N, Nguyen A T, Nguyen H A, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: Proceedings of the international conference on program comprehension (ICPC). IEEE, pp 218–229
Larson R R (2010) Introduction to information retrieval. J. Am. Soc. Inf. Sci. Technol. 61(4):852–853
MathSciNet Google Scholar
Lawrie D, Binkley D (2018) On the value of bug reports for retrieval-based bug localization. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 524–528
Le T D B, Oentaryo R J, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: Proceedings of the joint meeting on foundations of software engineering (FSE). ACM, pp 579–590
Lee J, Kim D, Bissyandé T F, Jung W, Le Traon Y (2018) Bench4bl: Reproducibility study on the performance of ir-based bug localization. In: Proceedings of the international symposium on software testing and analysis (ISSTA), ACM, ISSTA 2018, pp 61–72
Liang H, Sun L, Wang M, Yang Y (2019) Deep learning with customized abstract syntax tree for bug localization. IEEE Access 7:116309–116320
Article Google Scholar
Lu S, Li Z, Qin F, Tan L, Zhou P, Zhou Y (2005) Bugbench: Benchmarks for evaluating bug detection tools. In: Proceedings of the workshop on the evaluation of software defect detection tools
Lukins S K, Kraft N A, Etzkorn L H (2010) Bug localization using latent dirichlet allocation. Information and Software Technology (IST) 52(9):972–990
Article Google Scholar
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: Proceedings of the international requirements engineering conference (RE). IEEE, pp 116–125
Marsavina C, Romano D, Zaidman A (2014 ) Studying fine-grained co-evolution patterns of production and test code. In: Proceedings of the international working conference on source code analysis and manipulation (SCAM). IEEE, pp 195–204
Meszaros G (2007) xUnit test patterns: Refactoring test code. Pearson Education
Moreno L, Treadway J J, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 151–160
Moreno L, Bavota G, Haiduc S, Di Penta M, Oliveto R, Russo B, Marcus A (2015) Query-based configuration of text retrieval solutions for software engineering tasks. In: Proceedings of the joint meeting on foundations of software engineering (FSE). ACM, pp 567–578
Nguyen H A, Nguyen A T, Nguyen TN (2013) Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization. In: Proceedings of the international symposium on software reliability engineering (ISSRE). IEEE, pp 138–147
Palomba F, Zaidman A (2019) The smell of fear: on the relation between test smells and flaky tests. Empirical Software Engineering (ESE), pp 1–40
Palomba F, Zaidman A, De Lucia A (2018) Automatic test smell detection using information retrieval techniques. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 311–322
Poshyvanyk D, Gueheneuc Y G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. Trans Softw Eng (TSE) 33(6)
Rahman M M, Roy CK (2018) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the joint meeting on European software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 621–632
Rao S, Medeiros H, Kak A (2013) An incremental update framework for efficient retrieval from software libraries for bug localization. In: Proceedings of the working conference on reverse engineering (WCRE). IEEE, pp 62–71
Rath M, Mäder P (2019) Structured information in bug report descriptions—influence on ir-based bug localization and developers. Softw. Qual. J. 27(3):1315–1337
Article Google Scholar
Saha R K, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of the international conference on automated software engineering (ASE). IEEE, pp 345–355
Saha R K, Lawall J, Khurshid S, Perry DE (2014) On the effectiveness of information retrieval based bug localization for c programs. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 161–170
Shi Z, Keung J, Bennin K E, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl. Soft Comput. 62:636–648
Article Google Scholar
Sisman B, Kak AC (2012) Incorporating version histories in information retrieval based bug localization. In: Proceedings of the working conference on mining software repositories (MSR). IEEE, pp 50–59
Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the working conference on mining software repositories (MSR). IEEE Press, pp 309–318
Sisman B, Akbar S A, Kak A C (2017) Exploiting spatial code proximity and order for improved source code retrieval for bug localization. J Softw Evol Process 29(1):e1805
Article Google Scholar
Sun X, Zhou W, Li B, Ni Z, Lu J (2019) Bug localization for version issues with defect patterns. IEEE Access 7:18811–18820
Article Google Scholar
Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empirical Software Engineering (ESE) 19(6):1665–1705
Article Google Scholar
Tantithamthavorn C, Teekavanich R, Ihara A, Matsumoto K (2013) Mining a change history to quickly identify bug locations: A case study of the eclipse project. In: Proceedings of the international symposium on software reliability engineering workshops (ISSREW). IEEE, pp 108–113
Tantithamthavorn C, Abebe S L, Hassan A E, Ihara A, Matsumoto K (2018) The impact of ir-based classifier configuration on the performance and the effort of method-level bug localization. Information and Software Technology (IST) 102:160–174
Article Google Scholar
Thomas S W, Nagappan M, Blostein D, Hassan A E (2013) The impact of classifier configuration and classifier combination on bug localization. Trans Softw Eng (TSE) 39(10):1427–1443
Article Google Scholar
Vahabzadeh A, Fard A M, Mesbah A (2015) An empirical study of bugs in test code. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 101–110
Vahabzadeh Sefiddarbon A (2016) A study of bugs in test code and a test model for analyzing tests. PhD thesis, University of British Columbia
Wang B, Xu L, Yan M, Liu C, Liu L (2020a) Multi-dimension convolutional neural network for bug localization. IEEE Transactions on Services Computing
Wang S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings of the international conference on program comprehension (ICPC). ACM, pp 53–63
Wang S, Lo D (2016) Amalgam+: Composing rich information sources for accurate bug localization. J Softw Evol Process 28(10):921–942
Article Google Scholar
Wang S, Lo D, Lawall J (2014) Compositional vector space models for improved bug localization. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 171–180
Wang Y, Yao Y, Tong H, Huo X, Li M, Xu F, Lu J (2020b) Enhancing supervised bug localization with metadata and stack-trace. Knowledge and Information Systems, pp 1–24
Wen M, Wu R, Cheung S C (2016) Locus: Locating bugs from software changes. In: Proceedings of the international conference on automated software engineering (ASE). IEEE, pp 262–273
Wong C P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, pp 181–190
Xiao Y, Keung J, Bennin K E, Mi Q (2018) Machine translation-based bug localization technique for bridging lexical gap. Information and Software Technology (IST) 99:58–61
Article Google Scholar
Xiao Y, Keung J, Bennin K E, Mi Q (2019) Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software Technology (IST) 105:17–29
Article Google Scholar
Yang G, Min K, Lee B (2020) Applying deep learning algorithm to automatic bug localization and repair. In: Proceedings of the 35th Annual ACM symposium on applied computing, pp 1634–1641
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the international symposium on foundations of software engineering (FSE). ACM, pp 689–699
Ye X, Bunescu R, Liu C (2016a) Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation. Trans Softw Eng (TSE) 42(4):379–402
Article Google Scholar
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016b) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE). ACM, pp 404–415
Youm K C, Ahn J, Kim J, Lee E (2015) Bug localization based on code change histories and bug reports. In: Proceedings of the Asia-Pacific software engineering conference (APSEC). IEEE, pp 190–197
Youm K C, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Information and Software Technology (IST) 82:177–192
Article Google Scholar
Zhang W, Li Z, Wang Q, Li J (2019) Finelocator: a novel approach to method-level fine-grained bug localization by query expansion. Information and Software Technology (IST)
Zhao F, Tang Y, Yang Y, Lu H, Zhou Y, Xu B (2015) Is learning-to-rank cost-effective in recommending relevant files for bug localization?. In: Proceedings of the international conference on software quality, reliability and security (QRS). IEEE, pp 298–303
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 14–24
Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? Trans Softw Eng (TSE) 36 (5):618–643
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Technology and Basic Science Research Program through the National Research Foundation of Korea, funded by the Ministry of Science, ICT, and Future Planning (2016R1D1A1B03934610, 2017M3C4A7068179, 2018R1D1A1B07050073, 2019R1A2C2006411). Also, we are most grateful to Dit et al., Zhou et al., Rao et al., Sisman et al., Moreno et al., Ye et al., and Lee et al. for access to their datasets on their web sites.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Republic of Korea
Misoo Kim
College of Computing, Sungkyunkwan University, Suwon, Republic of Korea
Eunseok Lee

Authors

Misoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Eunseok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eunseok Lee.

Additional information

Communicated by: Denys Poshyvanyk

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Survey and Investigation Results

Table 18 Survey results for the evaluation datasets used in previous research

Full size table

Appendix B: Investigation for the Representative Datasets

Table 19 Investigation results for the six representative datasets

Full size table

Appendix C: Appendix C: Experimental Results Based on VSM-based IRBL

Table 20 Project-wise evaluation results of buggy file localization with two search spaces that are all source files (ALL), and suitable source files (SUIT)

Full size table

Table 21 Project-wise evaluation results of buggy method localization with two search spaces that are all source files (ALL), and suitable source files (SUIT).

Full size table

Appendix D: Experimental Results Based on Existing IRBL Techniques

Table 22 Project-wise evaluation results of BugLocator

Full size table

Table 23 Project-wise evaluation results of BRTracer

Full size table

Table 24 Project-wise evaluation results of BLUiR

Full size table

Table 25 Project-wise evaluation results of AmaLgam

Full size table

Table 26 Project-wise evaluation results of BLIA

Full size table

Table 27 Project-wise evaluation results of Locus

Full size table

Glossary

–Bug type:: The bug type is the class of bugs, as determined by their characteristics. In this paper, we classify bugs as either production or test bugs based on the type of source files changed to fix the bug.
– Production bug:: A production bug is a production-code-related bug. This type of bug exists in the production source files (or production files) executed for software usage and production.
– Test bug:: A test bug is a test-code-related bug. This bug exists in the test source files (or test files) written and executed for automatic software testing.
– Ground-truth file:: The ground-truth file is the answer file for the evaluation of IRBL. A correct ground-truth file is a buggy file where a bug is located; the file should be modified to resolve the bug. An incorrect ground-truth file is a non-buggy file, changing which does not fix the bug.
– Search space:: A search space is a set of source files selected to be retrieved for evaluating the performance of IRBL. A suitable search space should specify the source files that are relevant to the bug type. An unsuitable search space contains the noisy source files that are not related to the bug type and affects the search space and results as noise.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M., Lee, E. Are datasets for information retrieval-based bug localization techniques trustworthy?. Empir Software Eng 26, 35 (2021). https://doi.org/10.1007/s10664-021-09946-8

Download citation

Accepted: 22 January 2021
Published: 19 March 2021
DOI: https://doi.org/10.1007/s10664-021-09946-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Are datasets for information retrieval-based bug localization techniques trustworthy?

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Sampling in software engineering research: a critical review and guidelines

Software defect prediction: future directions and challenges

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Survey and Investigation Results

Appendix B: Investigation for the Representative Datasets

Appendix C: Appendix C: Experimental Results Based on VSM-based IRBL

Appendix D: Experimental Results Based on Existing IRBL Techniques

Glossary

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Are datasets for information retrieval-based bug localization techniques trustworthy?

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Sampling in software engineering research: a critical review and guidelines

Software defect prediction: future directions and challenges

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Survey and Investigation Results

Appendix B: Investigation for the Representative Datasets

Appendix C: Appendix C: Experimental Results Based on VSM-based IRBL

Appendix D: Experimental Results Based on Existing IRBL Techniques

Glossary

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation