skip to main content
10.1145/3297280.3297451acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A novel approach to automatic query reformulation for IR-based bug localization

Published: 08 April 2019 Publication History

Abstract

Automatic query reformulation techniques for Information Retrieval based Bug Localization (IRBL) have been proposed to improve the quality of queries and IRBL performance. Recently proposed techniques determine the quality of queries via the bugs' description and reformulate them using important terms in the top-N source files retrieved by the initial query. However, the bugs' description may not contain enough information about the bugs, and the retrieved top-N files may not always provide important terms. In this paper, we propose a novel automatic query reformulation approach to improve IRBL performance beyond that of a recent technique. Our method expands bug reports using attachments and expands queries by reducing the noisy terms in them. We experimented with 1,546 bug reports. According to our results, we found that the quality of 70 reports was wrongly determined, and our method improved IRBL performance by up to 118% for these reports. Moreover, compared with a state-of-the-art technique, our method resulted in improvements of approximately 17% in Top-1, 11% in MRR@10, and 10% in MAP@10.

References

[1]
Robert N Charette. Why software fails {software failure}. IEEE Spectrum, 42(9):42--49, 2005.
[2]
Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schroter, and Cathrin Weiss. What makes a good bug report? IEEE Transactions on Software Engineering (TSE), 36(5):618--643, 2010.
[3]
Jian Zhou, Hongyu Zhang, and David Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Proceedings of 34th International Conference on Software Engineering (ICSE), pages 14--24. IEEE, 2012.
[4]
Stephen W Thomas, Meiyappan Nagappan, Dorothea Blostein, and Ahmed E Hassan. The impact of classifier configuration and classifier combination on bug localization. IEEE Transactions on Software Engineering (TSE), 39(10):1427--1443, 2013.
[5]
Laura Moreno, John Joseph Treadway, Andrian Marcus, and Wuwei Shen. On the use of stack traces to improve text retrieval-based bug localization. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 151--160. IEEE, 2014.
[6]
Klaus Changsun Youm, June Ahn, and Eunseok Lee. Improved bug localization based on code change histories and bug reports. Information and Software Technology (IST), 82:177--192, 2017.
[7]
Mohammad Masudur Rahman and Chanchai K Roy. Improving bug localization with report quality dynamics and query reformulation. In Proceedings of 40th International Conference on Software Engineering: Companion (ICSE-C), pages 348--349. ACM, 2018.
[8]
Qianqian Wang, Chris Parnin, and Alessandro Orso. Evaluating the usefulness of ir-based fault localization techniques. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), pages 1--11. ACM, 2015.
[9]
Mohammad Masudur Rahman and Chanchal K Roy. Improving ir-based bug localization with context-aware query reformulation. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 1--12. ACM, 2018.
[10]
Ray R Larson. Introduction to information retrieval. Journal of the American Society for Information Science and Technology, 61(4):852--853, 2010.
[11]
Denys Poshyvanyk, Yann-Gael Gueheneuc, Andrian Marcus, Giuliano Antoniol, and Vaclav Rajlich. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering (TSE), 33(6), 2007.
[12]
Stacy K Lukins, Nicholas A Kraft, and Letha H Etzkorn. Bug localization using latent dirichlet allocation. Information and Software Technology (IST), 52(9):972--990, 2010.
[13]
Ripon K Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E Perry. Improving bug localization using structured information retrieval. In Proceedings of IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pages 345--355. IEEE, 2013.
[14]
Bunyamin Sisman and Avinash C Kak. Incorporating version histories in information retrieval based bug localization. In Proceeedings of 9th IEEE Working Conference on Mining Software Repositories (MSR), pages 50--59. IEEE, 2012.
[15]
Ming Wen, Rongxin Wu, and Shing-Chi Cheung. Locus: Locating bugs from software changes. In Proceedings of 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 262--273. IEEE, 2016.
[16]
Xin Ye, Razvan Bunescu, and Chang Liu. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), pages 689--699. ACM, 2014.
[17]
Chu-Pan Wong, Yingfei Xiong, Hongyu Zhang, Dan Hao, Lu Zhang, and Hong Mei. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 181--190. IEEE, 2014.
[18]
Tezcan Dilshener, Michel Wermelinger, and Yijun Yu. Locating bugs without looking back. Automated Software Engineering (ASE), 25(3):383--434, 2018.
[19]
Xin Ye, Razvan Bunescu, and Chang Liu. Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation. IEEE Transactions on Software Engineering (TSE), 42(4):379--402, 2016.
[20]
Zhendong Shi, Jacky Keung, Kwabena Ebo Bennin, and Xingjun Zhang. Comparing learning to rank techniques in hybrid bug localization. Applied Soft Computing (ASC), 62:636--648, 2018.
[21]
An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. Combining deep learning with information retrieval to localize buggy files for bug reports (n). In Proceedings of 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 476--481. IEEE, 2015.
[22]
An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. Bug localization with combination of deep learning and information retrieval. In Proceedings of IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pages 218--229. IEEE, 2017.
[23]
Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th international Conference on Software Engineering (ICSE), pages 404--415. ACM, 2016.
[24]
Bunyamin Sisman and Avinash C Kak. Assisting code search with automatic query reformulation for bug localization. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pages 309--318. IEEE Press, 2013.
[25]
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 243--250. ACM, 2008.
[26]
Bunyamin Sisman, Shayan A Akbar, and Avinash C Kak. Exploiting spatial code proximity and order for improved source code retrieval for bug localization. Journal of Software: Evolution and Process, 29(1):e1805, 2017.
[27]
Oscar Chaparro, Juan Manuel Florez, and Andrian Marcus. Using observed behavior to reformulate queries during text retrieval-based bug localization. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 376--387. IEEE, 2017.
[28]
Eric Enslen, Emily Hill, Lori Pollock, and K Vijay-Shanker. Mining source code to automatically split identifiers for software analysis. In Proceedings of IEEE 6th International Working Conference on Mining Software Repository (MSR), pages 71--80. IEEE, 2009.
[29]
Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004.
[30]
Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald C Gall. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proceedings of IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 547--558. IEEE, 2016.
[31]
Qasim Umer, Hui Liu, and Yasir Sultan. Emotion based automated priority prediction for bug reports. IEEE Access, 6:35743--35752, 2018.
[32]
Geunseok Yang, Seungsuk Baek, Jung-Won Lee, and Byungjeong Lee. Analyzing emotion words to predict severity of software bugs: a case study of open source projects. In Proceedings of the Symposium on Applied Computing (SAC), pages 1280--1287. ACM, 2017.
[33]
Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. Arminer: Mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 767--778. ACM, 2014.
[34]
Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah. An empirical study of bugs in test code. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 101--110. IEEE, 2015.
[35]
Misoo Kim and Eunseok Lee. Are information retrieval-based bug localization techniques trustworthy? In Proceedings of the 40th International Conference on Software Engineering: Companion (ICSE-C), pages 248--249. ACM, 2018.
[36]
Kent Beck. Test-driven development: by example. Addison-Wesley Professional, 2003.
[37]
Emily Hill, Lori Pollock, and K. Vijay-Shanker. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st International Conference on Software Engineering (ICSE), pages 232--242. IEEE, 2009.
[38]
Karen SparckK Jones. A statistical interpetation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.

Cited By

View all
  • (2025)LSTM Attention-Driven Similarity Learning for Effective Bug LocalizationApplied Sciences10.3390/app1503158215:3(1582)Online publication date: 4-Feb-2025
  • (2025)Production and test bug report classification based on transfer learningInformation and Software Technology10.1016/j.infsof.2025.107685181(107685)Online publication date: May-2025
  • (2025)A more accurate bug localization technique for bugs with multiple buggy code filesInformation and Software Technology10.1016/j.infsof.2025.107675181(107675)Online publication date: May-2025
  • Show More Cited By

Index Terms

  1. A novel approach to automatic query reformulation for IR-based bug localization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
    April 2019
    2682 pages
    ISBN:9781450359337
    DOI:10.1145/3297280
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 April 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic debugging
    2. automatic query reformulation
    3. bug report
    4. information retrieval-based bug localization
    5. test file

    Qualifiers

    • Research-article

    Conference

    SAC '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)LSTM Attention-Driven Similarity Learning for Effective Bug LocalizationApplied Sciences10.3390/app1503158215:3(1582)Online publication date: 4-Feb-2025
    • (2025)Production and test bug report classification based on transfer learningInformation and Software Technology10.1016/j.infsof.2025.107685181(107685)Online publication date: May-2025
    • (2025)A more accurate bug localization technique for bugs with multiple buggy code filesInformation and Software Technology10.1016/j.infsof.2025.107675181(107675)Online publication date: May-2025
    • (2024)Automatic Query Generation Based on Adaptive Naked Mole-Rate AlgorithmMultimedia Tools and Applications10.1007/s11042-024-19492-2Online publication date: 27-Jun-2024
    • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 28-Sep-2023
    • (2023)The MAP Metric in Information Retrieval Fault LocalizationProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00041(1480-1491)Online publication date: 11-Nov-2023
    • (2022)How does the first buggy file work well for iterative IR-based bug localization?Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507034(1509-1516)Online publication date: 25-Apr-2022
    • (2022)An Empirical Study of IR-based Bug Localization for Deep Learning-based Software2022 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST53961.2022.00024(128-139)Online publication date: Apr-2022
    • (2021)Utilizing Topic-Based Similar Commit Information and CNN-LSTM Algorithm for Bug LocalizationSymmetry10.3390/sym1303040613:3(406)Online publication date: 2-Mar-2021
    • (2021)A Novel Automatic Query Expansion with Word Embedding for IR-based Bug Localization2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE52982.2021.00038(276-287)Online publication date: Oct-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media