Modeling function-level interactions for file-level bug localization

Liang, Hongliang; Hang, Dengji; Li, Xiangyu

doi:10.1007/s10664-022-10237-z

Modeling function-level interactions for file-level bug localization

Published: 01 October 2022

Volume 27, article number 186, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

621 Accesses
3 Citations
Explore all metrics

Abstract

Automatic bug localization, i.e., automatically locating potential buggy source files given a bug report, plays an essential role in software engineering. For instance, bug localization helps developers fix bugs quickly. Although information retrieval-based bug localization methods are simple and easy to understand, it is difficult for them to bridge the lexical gap between bug reports and programs and capture the rich structural information in programs. Deep learning-based bug localization (DLBL) methods can utilize the structural information of the program, but they cannot handle long code sequences well. For example, CNN fails to capture remote code interaction features, while RNN (like LSTM, GRU) is vulnerable to gradient disappearance or burst when facing long code sequences. Additionally, DLBL methods fail to model metadata features such as bug-fixing recency and frequency. In this paper, we research how to locate buggy files by learning function-level features. Specifically, we propose a new framework called FLIM that can extract semantic features of a program at the function level and then calculates the relevance between natural and programming language by aggregating function-level interactions. We leverage a fine-tuned language model to treat the bug localization task as a code retrieval task, and use a learning-to-rank model to fuse the function-level semantic features with IR features to calculate the final relevance. We evaluate FLIM by conducting extensive experiments on widely-used six software projects. Experimental results demonstrate that FLIM outperforms six state-of-the-art methods of bug localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Challenges of Low-Code/No-Code Software Development: A Literature Review

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

Data Availability

The datasets generated during and/or analysed during the current study are available in the github repository, https://github.com/hongliangliang/flim.

Notes

https://github.com/hongliangliang/flim

References

Abozeed SM, ElNainay MY, Fouad SA, Abougabal MS (2020) Software bug prediction employing feature selection and deep learning. In: 2019 International conference on advances in the emerging computing technologies (AECT), pp 1–6
Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum based fault localization. In: Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007. IEEE, pp 89–98
Abreu R, Zoeteweij P, van Gemund AJC (2006) An evaluation of similarity coefficients for software fault localization. In: 12th IEEE Pacific Rim international symposium on dependable computing (PRDC 2006), 18-20 December 2006, University of California, Riverside, USA. IEEE Computer Society, pp 39–46
Akbar SA, Kak AC (2020) A large-scale comparative evaluation of IR-based tools for bug localization. In: Proceedings of the 17th international conference on mining software repositories. Association for Computing Machinery, New York, pp 21–31
Almhana R, Kessentini M, Mkaouer W (2021) Method-level bug localization using hybrid multi-objective search. Inf Softw Technol 131:106474
Article Google Scholar
Alon U, Zilberstein M, Levy O, Yahav E (2019) code2vec: learning distributed representations of code. In: Proceedings of the ACM on programming languages, pp 1–29
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Budd TA (1980) Mutation analysis of program test data, Ph.D. dissertation, New Haven, CT, USA, aAI8025191
Chen S, Hou Y, Cui Y, Che W, Liu T, Yu X (2020) Recall and learn: fine-tuning deep pretrained language models with less forgetting. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Online, pp 7870–7881
Chen AR, Chen T-HP, Wang S (2021) Pathidea: improving information retrieval-based bug localization by re-constructing execution paths using logs. IEEE Trans Softw Eng, 1–1
Clark K, Luong M-T, Le QV, Manning CD (2020) ELECTRA: pretraining text encoders as discriminators rather than generators. ICLR
Cossock D, Zhang T (2006) Subset ranking using regression. In: Lugosi G, Simon HU (eds) Learning theory, 19th annual conference on learning theory, COLT 2006, Pittsburgh, PA, USA, June 22-25 Proceedings. Springer, pp 605–619
Devlin J, Chang M -W, Lee K, Toutanova K (2019) BERT: pretraining of deep bidirectional transformers for language understanding. NAACL
Fejzer MM, Narebski J, Przymus PM, Stencel K (2021) Tracking buggy files: new efficient adaptive bug localization algorithm. IEEE Trans Softw Eng, 1–1
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) CodeBERT: a pretrained model for programming and natural languages. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics. Online, pp 1536–1547
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: 2009 IEEE International conference on software maintenance, pp 351–360
Guyon I (1997) A scaling law for the validation-set training-set size ratio. AT&T Bell Laboratories
Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence. International joint conferences on artificial intelligence organization, Melbourne, pp 1909–1915
Huo X, Li M, Zhou Z-H (2016) Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI
Huo X, Li M, Zhou Z-H (2020) Control flow graph embedding based on multi-instance decomposition for bug localization. In: Proceedings of the AAAI conference on artificial intelligence, pp 4223–4230
Huo X, Thung F, Li M, Lo D, Shi S-T (2021) Deep Transfer Bug Localization. IEEE Trans Softw Eng 47:1368–1380
Article Google Scholar
Husain H, Wu H-H, Gazit T, Allamanis M, Brockschmidt M (2020) CodeSearchNet challenge: evaluating the state of semantic code search. arXiv:190909436 [cs, stat]
Kim Y, Denton C, Hoang L, Rush AM (2017) Structured attention networks. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net
Kim M, Lee E (2021) Are datasets for information retrieval-based bug localization techniques trustworthy? Empir Software Eng 26:35
Article Google Scholar
Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: IEEE/ACM 25th International conference on program comprehension (ICPC) 2017, pp 218–229
Le T -D B, Thung F, Lo D (2017) Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22:2237–2279
Article Google Scholar
Li X, Li W, Zhang Y, Zhang L (2019) DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization. In: Zhang D, Møller A (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2019, Beijing, China, July 15-19, 2019. ACM, pp 169–180
Li W, Li Q, Ming Y, Dai W, Ying S, Yuan M (2022) An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects. Empir Softw Eng 27(2):47
Article Google Scholar
Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: Sarkar V, Hall MW (eds) Proceedings of the ACM SIGPLAN 2005 conference on programming language design and implementation, Chicago, IL, USA, June 12-15, 2005. ACM, pp 15–26
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent Dirichlet allocation. In: Reverse engineering, 2008. WCRE ’08. 15th Working conference on. pp 155 –164
Moon S, Kim Y, Kim M, Yoo S (2014) Ask the mutants: mutating faulty programs for fault localization. In: Seventh IEEE international conference on software testing, verification and validation, ICST 2014, March 31 2014-April 4, 2014, Cleveland, Ohio, USA. IEEE Computer Society, pp 153–162
Musco V, Monperrus M, Preux P (2017) A large-scale study of call graph-based impact prediction using mutation testing. Softw Qual J 25:921–950
Article Google Scholar
Owen AB (2007) A robust hybrid of lasso and ridge regression. Contemp Math 443:7
MathSciNet MATH Google Scholar
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long papers). Association for Computational Linguistics, New Orleans, pp 2227–2237
Peters ME, Ruder S, Smith NA (2019) To tune or not to tune? Adapting pretrained representations to diverse tasks. In: Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019). Association for Computational Linguistics, Florence, pp 7–14
Poshyvanyk D, Gueheneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33:420–432
Article Google Scholar
Qi B, Sun H, Yuan W, Zhang H, Meng X (2022) DreamLoc: a deep relevance matching-based framework for bug localization. IEEE Trans Reliab 71(1):235–249
Article Google Scholar
Radford A, Narasimhan Ks (2018) Improving language understanding by generative pretraining
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:1–67
MathSciNet MATH Google Scholar
Razzaq A, Buckley J, Patten JV, Chochlov M, Sai AR (2021) BoostNSift: a query boosting and code sifting technique for method level bug localization. In: 2021 IEEE 21st international working conference on source code analysis and manipulation (SCAM). IEEE, Luxembourg, pp 81–91
Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: 2013 28th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 345–355
Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl Soft Comput 62:636–648
Article Google Scholar
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Article MathSciNet Google Scholar
Sun C, Myers A, Vondrick C, Murphy KP, Schmid C (2019) Video BERT: a joint model for video and language representation learning. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 7463–7472
Thakkar G, Pinnis M (2020) Pretraining and fine-tuning strategies for sentiment analysis of latvian tweets. Baltic HLT
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. Curran Associates, Inc.
Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension. Association for Computing Machinery, New York, pp 53–63
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) XLNet: generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems. Curran Associates, Inc.
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. Association for Computing Machinery, New York, pp 689–699
Ye X, Bunescu R, Liu C (2016) Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans Softw Eng 42:379–402
Article Google Scholar
Youm KC, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Inf Softw Technol 82:177–192
Article Google Scholar
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Brodley CE (ed) Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004. ACM
Zhang X-Y, Zheng Z (2019) A visualization analytical framework for software fault localization metrics. In: 24th IEEE Pacific Rim international symposium on dependable computing, PRDC 2019. IEEE, Kyoto, pp 148–157
Zhang L, Xie T, Zhang L, Tillmann N, de Halleux J, Mei H (2010) Test generation via dynamic symbolic execution for mutation testing. In: 2010 IEEE international conference on software maintenance, pp 1–10
Zhang W, Li Z, Wang Q, Li J (2019) FineLocator: a novel approach to method-level fine-grained bug localization by query expansion. Inf Softw Technol 110:121–135
Article Google Scholar
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 2012 34th International conference on software engineering (ICSE), pp 14–24
Zhu Z, Li Y, Tong H, Wang Y (2020) CooBa: cross-project bug localization via adversarial transfer learning. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence. International joint conferences on artificial intelligence organization, Yokohama, pp 3565–3571
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Statistical Soc B 67:301–320
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

TSIS Lab., Beijing University of Posts and Communications, Beijing, China
Hongliang Liang, Dengji Hang & Xiangyu Li

Authors

Hongliang Liang
View author publications
You can also search for this author in PubMed Google Scholar
Dengji Hang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongliang Liang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Denys Poshyvanyk

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, H., Hang, D. & Li, X. Modeling function-level interactions for file-level bug localization. Empir Software Eng 27, 186 (2022). https://doi.org/10.1007/s10664-022-10237-z

Download citation

Accepted: 01 September 2022
Published: 01 October 2022
DOI: https://doi.org/10.1007/s10664-022-10237-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling function-level interactions for file-level bug localization

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Challenges of Low-Code/No-Code Software Development: A Literature Review

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Data Availability

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling function-level interactions for file-level bug localization

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Challenges of Low-Code/No-Code Software Development: A Literature Review

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Data Availability

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation