skip to main content
10.1145/3269206.3271811acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Bug Localization by Learning to Rank and Represent Bug Inducing Changes

Published:17 October 2018Publication History

ABSTRACT

In software development, bug localization is the process finding portions of source code associated to a submitted bug report. This task has been modeled as an information retrieval task at source code file, where the report is the query. In this work, we propose a model that, instead of working at file level, learns feature representations from source changes extracted from the project history at both syntactic and code change dependency perspectives to support bug localization.

To that end, we structured an end-to-end architecture able to integrate feature learning and ranking between sets of bug reports and source code changes.

We evaluated our model against the state of the art of bug localization on several real world software projects obtaining competitive results in both intra-project and cross-project settings. Besides the positive results in terms of model accuracy, as we are giving the developer not only the location of the bug associated to the report, but also the change that introduced, we believe this could give a broader context for supporting fixing tasks.

References

  1. Tien-Duy B. Le, David Lo, Claire Le Goues, and Lars Grunske. 2016. A learning-to-rank based fault localization approach using likely invariants. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 177--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Irina Ioana Brudaru and Andreas Zeller. 2008. What is the long-term impact of changes? In Proceedings of the 2008 international workshop on Recommendation systems for software engineering. ACM, 30--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. ACM, 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Raymond P. L. Buse and Westley R. Weimer. 2010. Automatically documenting program changes. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google ScholarGoogle Scholar
  6. Luis Fernando Cortés-Coy, Mario Linares-Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On automatically generating commit messages via summarization of source code changes. In Source Code Analysis and Manipulation (SCAM), 2014 IEEE 14th International Working Conference on. IEEE, 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E. Hassan. 2017. A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering, Vol. 43, 7 (2017), 641--657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Valentin Dallmeier and Thomas Zimmermann. 2007. Extraction of bug localization benchmarks from history. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 433--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Wei Fu and Tim Menzies. 2017. Easy over Hard: A Case Study on Deep Learning. arXiv preprint arXiv:1703.00133 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 763--773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837--847. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xuan Huo and Ming Li. {n. d.}. Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code. ({n. d.}).Google ScholarGoogle Scholar
  14. Xuan Huo, Ming Li, and Zhi-Hua Zhou. 2016. Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code. In IJCAI. 1606--1612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In ACL (1).Google ScholarGoogle Scholar
  16. Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. James C. King. 1976. Symbolic execution and program testing. Commun. ACM, Vol. 19, 7 (1976), 385--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  20. An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 218--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google ScholarGoogle Scholar
  22. Mario Linares-Vásquez, Luis Fernando Cortés-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. Changescribe: A tool for automatically generating commit messages. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, Vol. 2. IEEE, 709--712. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 287--292.Google ScholarGoogle ScholarCross RefCross Ref
  24. Pablo Loyola and Yutaka Matsuo. 2017. Learning graph representations for defect prediction. In Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, 265--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent dirichlet allocation. Information and Software Technology, Vol. 52, 9 (2010), 972--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google ScholarGoogle Scholar
  27. Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In AAAI. 1287--1293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang, and Zhi Jin. 2015. Discriminative neural sentence modeling by tree-based convolution. arXiv preprint arXiv:1504.01106 (2015).Google ScholarGoogle Scholar
  29. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  30. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ripon K. Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E. Perry. 2013. Improving bug localization using structured information retrieval. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 345--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jacek śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When Do Changes Induce Fixes? In Proceedings of the 2005 International Workshop on Mining Software Repositories (MSR '05). ACM, New York, NY, USA, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, Vol. 4, 2 (2012), 26--31.Google ScholarGoogle Scholar
  34. Shaowei Wang and David Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 53--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shaowei Wang and David Lo. 2016. AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization. Journal of Software: Evolution and Process, Vol. 28, 10 (2016), 921--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 262--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, and Ruslan Salakhutdinov. 2016. Words or characters? fine-grained gating for reading comprehension. arXiv preprint arXiv:1611.01724 (2016).Google ScholarGoogle Scholar
  38. Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 689--699. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xin Ye, Razvan Bunescu, and Chang Liu. 2016. Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation. IEEE Transactions on Software Engineering, Vol. 42, 4 (2016), 379--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering. ACM, 404--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bug Localization by Learning to Rank and Represent Bug Inducing Changes

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
            October 2018
            2362 pages
            ISBN:9781450360142
            DOI:10.1145/3269206

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 October 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader