research-article

Bug Localization by Learning to Rank and Represent Bug Inducing Changes

Authors:
Pablo Loyola

IBM Research, Tokyo, Japan

IBM Research, Tokyo, Japan
View Profile

,
Kugamoorthy Gajananan

IBM Research, Tokyo, Japan

IBM Research, Tokyo, Japan
View Profile

,
Fumiko Satoh

IBM Research, Tokyo, Japan

IBM Research, Tokyo, Japan
View Profile

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementOctober 2018Pages 657–665https://doi.org/10.1145/3269206.3271811

Published:17 October 2018Publication History

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 657–665

ABSTRACT

In software development, bug localization is the process finding portions of source code associated to a submitted bug report. This task has been modeled as an information retrieval task at source code file, where the report is the query. In this work, we propose a model that, instead of working at file level, learns feature representations from source changes extracted from the project history at both syntactic and code change dependency perspectives to support bug localization.

To that end, we structured an end-to-end architecture able to integrate feature learning and ranking between sets of bug reports and source code changes.

We evaluated our model against the state of the art of bug localization on several real world software projects obtaining competitive results in both intra-project and cross-project settings. Besides the positive results in terms of model accuracy, as we are giving the developer not only the location of the bug associated to the report, but also the change that introduced, we believe this could give a broader context for supporting fixing tasks.

References

Tien-Duy B. Le, David Lo, Claire Le Goues, and Lars Grunske. 2016. A learning-to-rank based fault localization approach using likely invariants. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 177--188. Google ScholarDigital Library
Irina Ioana Brudaru and Andreas Zeller. 2008. What is the long-term impact of changes? In Proceedings of the 2008 international workshop on Recommendation systems for software engineering. ACM, 30--32. Google ScholarDigital Library
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. ACM, 89--96. Google ScholarDigital Library
Raymond P. L. Buse and Westley R. Weimer. 2010. Automatically documenting program changes. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 33--42. Google ScholarDigital Library
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google Scholar
Luis Fernando Cortés-Coy, Mario Linares-Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On automatically generating commit messages via summarization of source code changes. In Source Code Analysis and Manipulation (SCAM), 2014 IEEE 14th International Working Conference on. IEEE, 275--284. Google ScholarDigital Library
Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E. Hassan. 2017. A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering, Vol. 43, 7 (2017), 641--657.Google ScholarDigital Library
Valentin Dallmeier and Thomas Zimmermann. 2007. Extraction of bug localization benchmarks from history. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 433--436. Google ScholarDigital Library
Wei Fu and Tim Menzies. 2017. Easy over Hard: A Case Study on Deep Learning. arXiv preprint arXiv:1703.00133 (2017). Google ScholarDigital Library
Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 763--773. Google ScholarDigital Library
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837--847. Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Xuan Huo and Ming Li. {n. d.}. Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code. ({n. d.}).Google Scholar
Xuan Huo, Ming Li, and Zhi-Hua Zhou. 2016. Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code. In IJCAI. 1606--1612. Google ScholarDigital Library
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In ACL (1).Google Scholar
Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146. Google ScholarDigital Library
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142. Google ScholarDigital Library
James C. King. 1976. Symbolic execution and program testing. Commun. ACM, Vol. 19, 7 (1976), 385--394. Google ScholarDigital Library
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 218--229. Google ScholarDigital Library
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
Mario Linares-Vásquez, Luis Fernando Cortés-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. Changescribe: A tool for automatically generating commit messages. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, Vol. 2. IEEE, 709--712. Google ScholarDigital Library
Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 287--292.Google ScholarCross Ref
Pablo Loyola and Yutaka Matsuo. 2017. Learning graph representations for defect prediction. In Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, 265--267. Google ScholarDigital Library
Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent dirichlet allocation. Information and Software Technology, Vol. 52, 9 (2010), 972--990. Google ScholarDigital Library
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google Scholar
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In AAAI. 1287--1293. Google ScholarDigital Library
Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang, and Zhi Jin. 2015. Discriminative neural sentence modeling by tree-based convolution. arXiv preprint arXiv:1504.01106 (2015).Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701--710. Google ScholarDigital Library
Ripon K. Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E. Perry. 2013. Improving bug localization using structured information retrieval. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 345--355. Google ScholarDigital Library
Jacek śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When Do Changes Induce Fixes? In Proceedings of the 2005 International Workshop on Mining Software Repositories (MSR '05). ACM, New York, NY, USA, 1--5. Google ScholarDigital Library
Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, Vol. 4, 2 (2012), 26--31.Google Scholar
Shaowei Wang and David Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 53--63. Google ScholarDigital Library
Shaowei Wang and David Lo. 2016. AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization. Journal of Software: Evolution and Process, Vol. 28, 10 (2016), 921--942. Google ScholarDigital Library
Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 262--273. Google ScholarDigital Library
Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, and Ruslan Salakhutdinov. 2016. Words or characters? fine-grained gating for reading comprehension. arXiv preprint arXiv:1611.01724 (2016).Google Scholar
Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 689--699. Google ScholarDigital Library
Xin Ye, Razvan Bunescu, and Chang Liu. 2016. Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation. IEEE Transactions on Software Engineering, Vol. 42, 4 (2016), 379--402. Google ScholarDigital Library
Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering. ACM, 404--415. Google ScholarDigital Library
Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 14--24. Google ScholarDigital Library

Index Terms

Bug Localization by Learning to Rank and Represent Bug Inducing Changes
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Language models
      2. Learning to rank
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software
      2. Software evolution

Recommendations

Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on Internetware

Bug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Read More
Multi-level reranking approach for bug localization

Bug fixing has a key role in software quality evaluation. Bug fixing starts with the bug localization step, in which developers use textual bug information to find location of source codes which have the bug. Bug localization is a tedious and time ...
Read More
A preliminary study on using code smells to improve bug localization
ICPC '18: Proceedings of the 26th Conference on Program Comprehension

Bug localization is a technique that has been proposed to support the process of identifying the locations of bugs specified in a bug report. A traditional approach such as information retrieval (IR)-based bug localization calculates the similarity ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bug localization
information retrieval
source code changes
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 460
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bug Localization by Learning to Rank and Represent Bug Inducing Changes

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bug localization via searching crowd-contributed code

Multi-level reranking approach for bug localization

A preliminary study on using code smells to improve bug localization