research-article

A novel approach to automatic query reformulation for IR-based bug localization

Authors:

Eunseok LeeAuthors Info & Claims

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Pages 1752 - 1759

https://doi.org/10.1145/3297280.3297451

Published: 08 April 2019 Publication History

Abstract

Automatic query reformulation techniques for Information Retrieval based Bug Localization (IRBL) have been proposed to improve the quality of queries and IRBL performance. Recently proposed techniques determine the quality of queries via the bugs' description and reformulate them using important terms in the top-N source files retrieved by the initial query. However, the bugs' description may not contain enough information about the bugs, and the retrieved top-N files may not always provide important terms. In this paper, we propose a novel automatic query reformulation approach to improve IRBL performance beyond that of a recent technique. Our method expands bug reports using attachments and expands queries by reducing the noisy terms in them. We experimented with 1,546 bug reports. According to our results, we found that the quality of 70 reports was wrongly determined, and our method improved IRBL performance by up to 118% for these reports. Moreover, compared with a state-of-the-art technique, our method resulted in improvements of approximately 17% in Top-1, 11% in MRR@10, and 10% in MAP@10.

References

[1]

Robert N Charette. Why software fails {software failure}. IEEE Spectrum, 42(9):42--49, 2005.

Digital Library

[2]

Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schroter, and Cathrin Weiss. What makes a good bug report? IEEE Transactions on Software Engineering (TSE), 36(5):618--643, 2010.

Digital Library

[3]

Jian Zhou, Hongyu Zhang, and David Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Proceedings of 34th International Conference on Software Engineering (ICSE), pages 14--24. IEEE, 2012.

Digital Library

[4]

Stephen W Thomas, Meiyappan Nagappan, Dorothea Blostein, and Ahmed E Hassan. The impact of classifier configuration and classifier combination on bug localization. IEEE Transactions on Software Engineering (TSE), 39(10):1427--1443, 2013.

Digital Library

[5]

Laura Moreno, John Joseph Treadway, Andrian Marcus, and Wuwei Shen. On the use of stack traces to improve text retrieval-based bug localization. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 151--160. IEEE, 2014.

Digital Library

[6]

Klaus Changsun Youm, June Ahn, and Eunseok Lee. Improved bug localization based on code change histories and bug reports. Information and Software Technology (IST), 82:177--192, 2017.

[7]

Mohammad Masudur Rahman and Chanchai K Roy. Improving bug localization with report quality dynamics and query reformulation. In Proceedings of 40th International Conference on Software Engineering: Companion (ICSE-C), pages 348--349. ACM, 2018.

Digital Library

[8]

Qianqian Wang, Chris Parnin, and Alessandro Orso. Evaluating the usefulness of ir-based fault localization techniques. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), pages 1--11. ACM, 2015.

Digital Library

[9]

Mohammad Masudur Rahman and Chanchal K Roy. Improving ir-based bug localization with context-aware query reformulation. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 1--12. ACM, 2018.

Digital Library

[10]

Ray R Larson. Introduction to information retrieval. Journal of the American Society for Information Science and Technology, 61(4):852--853, 2010.

Digital Library

[11]

Denys Poshyvanyk, Yann-Gael Gueheneuc, Andrian Marcus, Giuliano Antoniol, and Vaclav Rajlich. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering (TSE), 33(6), 2007.

Digital Library

[12]

Stacy K Lukins, Nicholas A Kraft, and Letha H Etzkorn. Bug localization using latent dirichlet allocation. Information and Software Technology (IST), 52(9):972--990, 2010.

Digital Library

[13]

Ripon K Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E Perry. Improving bug localization using structured information retrieval. In Proceedings of IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pages 345--355. IEEE, 2013.

Digital Library

[14]

Bunyamin Sisman and Avinash C Kak. Incorporating version histories in information retrieval based bug localization. In Proceeedings of 9th IEEE Working Conference on Mining Software Repositories (MSR), pages 50--59. IEEE, 2012.

Digital Library

[15]

Ming Wen, Rongxin Wu, and Shing-Chi Cheung. Locus: Locating bugs from software changes. In Proceedings of 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 262--273. IEEE, 2016.

Digital Library

[16]

Xin Ye, Razvan Bunescu, and Chang Liu. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), pages 689--699. ACM, 2014.

Digital Library

[17]

Chu-Pan Wong, Yingfei Xiong, Hongyu Zhang, Dan Hao, Lu Zhang, and Hong Mei. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 181--190. IEEE, 2014.

Digital Library

[18]

Tezcan Dilshener, Michel Wermelinger, and Yijun Yu. Locating bugs without looking back. Automated Software Engineering (ASE), 25(3):383--434, 2018.

Digital Library

[19]

Xin Ye, Razvan Bunescu, and Chang Liu. Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation. IEEE Transactions on Software Engineering (TSE), 42(4):379--402, 2016.

Digital Library

[20]

Zhendong Shi, Jacky Keung, Kwabena Ebo Bennin, and Xingjun Zhang. Comparing learning to rank techniques in hybrid bug localization. Applied Soft Computing (ASC), 62:636--648, 2018.

[21]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. Combining deep learning with information retrieval to localize buggy files for bug reports (n). In Proceedings of 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 476--481. IEEE, 2015.

Digital Library

[22]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. Bug localization with combination of deep learning and information retrieval. In Proceedings of IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pages 218--229. IEEE, 2017.

Digital Library

[23]

Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th international Conference on Software Engineering (ICSE), pages 404--415. ACM, 2016.

Digital Library

[24]

Bunyamin Sisman and Avinash C Kak. Assisting code search with automatic query reformulation for bug localization. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pages 309--318. IEEE Press, 2013.

Digital Library

[25]

Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 243--250. ACM, 2008.

Digital Library

[26]

Bunyamin Sisman, Shayan A Akbar, and Avinash C Kak. Exploiting spatial code proximity and order for improved source code retrieval for bug localization. Journal of Software: Evolution and Process, 29(1):e1805, 2017.

[27]

Oscar Chaparro, Juan Manuel Florez, and Andrian Marcus. Using observed behavior to reformulate queries during text retrieval-based bug localization. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 376--387. IEEE, 2017.

[28]

Eric Enslen, Emily Hill, Lori Pollock, and K Vijay-Shanker. Mining source code to automatically split identifiers for software analysis. In Proceedings of IEEE 6th International Working Conference on Mining Software Repository (MSR), pages 71--80. IEEE, 2009.

Digital Library

[29]

Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004.

[30]

Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald C Gall. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proceedings of IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 547--558. IEEE, 2016.

Digital Library

[31]

Qasim Umer, Hui Liu, and Yasir Sultan. Emotion based automated priority prediction for bug reports. IEEE Access, 6:35743--35752, 2018.

[32]

Geunseok Yang, Seungsuk Baek, Jung-Won Lee, and Byungjeong Lee. Analyzing emotion words to predict severity of software bugs: a case study of open source projects. In Proceedings of the Symposium on Applied Computing (SAC), pages 1280--1287. ACM, 2017.

Digital Library

[33]

Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. Arminer: Mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 767--778. ACM, 2014.

Digital Library

[34]

Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah. An empirical study of bugs in test code. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 101--110. IEEE, 2015.

Digital Library

[35]

Misoo Kim and Eunseok Lee. Are information retrieval-based bug localization techniques trustworthy? In Proceedings of the 40th International Conference on Software Engineering: Companion (ICSE-C), pages 248--249. ACM, 2018.

Digital Library

[36]

Kent Beck. Test-driven development: by example. Addison-Wesley Professional, 2003.

Digital Library

[37]

Emily Hill, Lori Pollock, and K. Vijay-Shanker. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st International Conference on Software Engineering (ICSE), pages 232--242. IEEE, 2009.

Digital Library

[38]

Karen SparckK Jones. A statistical interpetation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.

Cited By

Yang GJi JKim E(2025)LSTM Attention-Driven Similarity Learning for Effective Bug LocalizationApplied Sciences10.3390/app1503158215:3(1582)Online publication date: 4-Feb-2025
https://doi.org/10.3390/app15031582
Kim MKim YLee E(2025)Production and test bug report classification based on transfer learningInformation and Software Technology10.1016/j.infsof.2025.107685181(107685)Online publication date: May-2025
https://doi.org/10.1016/j.infsof.2025.107685
Xu HWang ZZou W(2025)A more accurate bug localization technique for bugs with multiple buggy code filesInformation and Software Technology10.1016/j.infsof.2025.107675181(107675)Online publication date: May-2025
https://doi.org/10.1016/j.infsof.2025.107675
Show More Cited By

Index Terms

A novel approach to automatic query reformulation for IR-based bug localization
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software evolution

Recommendations

How does the first buggy file work well for iterative IR-based bug localization?
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Information-retrieval-based bug localization (IRBL) generates a ranked list of source files by using bug reports as queries and provides the list to developers to reduce debugging costs. The IRBL performance is typically evaluated by batch because the ...
Improving IR-based bug localization with context-aware query reformulation
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Recent findings suggest that Information Retrieval (IR)-based bug localization techniques do not perform well if the bug report lacks rich structured information (e.g., relevant program entity names). Conversely, excessive structured information (e.g., ...
Are information retrieval-based bug localization techniques trustworthy?
ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

Information retrieval-based bug localization techniques are evaluated using datasets with an oracle. However, datasets can contain non-buggy files, which affect the reliability of these techniques. To investigate the impact of non-buggy files, we show ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

April 2019

2682 pages

ISBN:9781450359337

DOI:10.1145/3297280

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University, Marietta, Georgia
,
George A. Papadopoulos
University of Cyprus, Nicosia, Cyprus

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SAC '19

Sponsor:

SIGAPP

SAC '19: The 34th ACM/SIGAPP Symposium on Applied Computing

April 8 - 12, 2019

Limassol, Cyprus

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
332
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang GJi JKim E(2025)LSTM Attention-Driven Similarity Learning for Effective Bug LocalizationApplied Sciences10.3390/app1503158215:3(1582)Online publication date: 4-Feb-2025
https://doi.org/10.3390/app15031582
Kim MKim YLee E(2025)Production and test bug report classification based on transfer learningInformation and Software Technology10.1016/j.infsof.2025.107685181(107685)Online publication date: May-2025
https://doi.org/10.1016/j.infsof.2025.107685
Xu HWang ZZou W(2025)A more accurate bug localization technique for bugs with multiple buggy code filesInformation and Software Technology10.1016/j.infsof.2025.107675181(107675)Online publication date: May-2025
https://doi.org/10.1016/j.infsof.2025.107675
Kinikar MSaleena B(2024)Automatic Query Generation Based on Adaptive Naked Mole-Rate AlgorithmMultimedia Tools and Applications10.1007/s11042-024-19492-2Online publication date: 27-Jun-2024
https://doi.org/10.1007/s11042-024-19492-2
Rahman MRoy C(2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 28-Sep-2023
https://dl.acm.org/doi/10.1145/3607179
Hirsch THofer BBissyandé TKlein JBird CSarro F(2023)The MAP Metric in Information Retrieval Fault LocalizationProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00041(1480-1491)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00041
Kim MKim YLee EHong JBures MPark JCerny T(2022)How does the first buggy file work well for iterative IR-based bug localization?Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507034(1509-1516)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3477314.3507034
Kim MKim YLee E(2022)An Empirical Study of IR-based Bug Localization for Deep Learning-based Software2022 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST53961.2022.00024(128-139)Online publication date: Apr-2022
https://doi.org/10.1109/ICST53961.2022.00024
Yang GLee B(2021)Utilizing Topic-Based Similar Commit Information and CNN-LSTM Algorithm for Bug LocalizationSymmetry10.3390/sym1303040613:3(406)Online publication date: 2-Mar-2021
https://doi.org/10.3390/sym13030406
Kim MKim YLee E(2021)A Novel Automatic Query Expansion with Word Embedding for IR-based Bug Localization2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE52982.2021.00038(276-287)Online publication date: Oct-2021
https://doi.org/10.1109/ISSRE52982.2021.00038
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten