ABSTRACT
In software development, bug localization is the process finding portions of source code associated to a submitted bug report. This task has been modeled as an information retrieval task at source code file, where the report is the query. In this work, we propose a model that, instead of working at file level, learns feature representations from source changes extracted from the project history at both syntactic and code change dependency perspectives to support bug localization.
To that end, we structured an end-to-end architecture able to integrate feature learning and ranking between sets of bug reports and source code changes.
We evaluated our model against the state of the art of bug localization on several real world software projects obtaining competitive results in both intra-project and cross-project settings. Besides the positive results in terms of model accuracy, as we are giving the developer not only the location of the bug associated to the report, but also the change that introduced, we believe this could give a broader context for supporting fixing tasks.
- Tien-Duy B. Le, David Lo, Claire Le Goues, and Lars Grunske. 2016. A learning-to-rank based fault localization approach using likely invariants. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 177--188. Google ScholarDigital Library
- Irina Ioana Brudaru and Andreas Zeller. 2008. What is the long-term impact of changes? In Proceedings of the 2008 international workshop on Recommendation systems for software engineering. ACM, 30--32. Google ScholarDigital Library
- Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. ACM, 89--96. Google ScholarDigital Library
- Raymond P. L. Buse and Westley R. Weimer. 2010. Automatically documenting program changes. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 33--42. Google ScholarDigital Library
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google Scholar
- Luis Fernando Cortés-Coy, Mario Linares-Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On automatically generating commit messages via summarization of source code changes. In Source Code Analysis and Manipulation (SCAM), 2014 IEEE 14th International Working Conference on. IEEE, 275--284. Google ScholarDigital Library
- Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E. Hassan. 2017. A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering, Vol. 43, 7 (2017), 641--657.Google ScholarDigital Library
- Valentin Dallmeier and Thomas Zimmermann. 2007. Extraction of bug localization benchmarks from history. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 433--436. Google ScholarDigital Library
- Wei Fu and Tim Menzies. 2017. Easy over Hard: A Case Study on Deep Learning. arXiv preprint arXiv:1703.00133 (2017). Google ScholarDigital Library
- Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 763--773. Google ScholarDigital Library
- Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837--847. Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Xuan Huo and Ming Li. {n. d.}. Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code. ({n. d.}).Google Scholar
- Xuan Huo, Ming Li, and Zhi-Hua Zhou. 2016. Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code. In IJCAI. 1606--1612. Google ScholarDigital Library
- Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In ACL (1).Google Scholar
- Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146. Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142. Google ScholarDigital Library
- James C. King. 1976. Symbolic execution and program testing. Commun. ACM, Vol. 19, 7 (1976), 385--394. Google ScholarDigital Library
- Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 218--229. Google ScholarDigital Library
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
- Mario Linares-Vásquez, Luis Fernando Cortés-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. Changescribe: A tool for automatically generating commit messages. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, Vol. 2. IEEE, 709--712. Google ScholarDigital Library
- Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 287--292.Google ScholarCross Ref
- Pablo Loyola and Yutaka Matsuo. 2017. Learning graph representations for defect prediction. In Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, 265--267. Google ScholarDigital Library
- Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent dirichlet allocation. Information and Software Technology, Vol. 52, 9 (2010), 972--990. Google ScholarDigital Library
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google Scholar
- Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In AAAI. 1287--1293. Google ScholarDigital Library
- Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang, and Zhi Jin. 2015. Discriminative neural sentence modeling by tree-based convolution. arXiv preprint arXiv:1504.01106 (2015).Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701--710. Google ScholarDigital Library
- Ripon K. Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E. Perry. 2013. Improving bug localization using structured information retrieval. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 345--355. Google ScholarDigital Library
- Jacek śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When Do Changes Induce Fixes? In Proceedings of the 2005 International Workshop on Mining Software Repositories (MSR '05). ACM, New York, NY, USA, 1--5. Google ScholarDigital Library
- Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, Vol. 4, 2 (2012), 26--31.Google Scholar
- Shaowei Wang and David Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 53--63. Google ScholarDigital Library
- Shaowei Wang and David Lo. 2016. AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization. Journal of Software: Evolution and Process, Vol. 28, 10 (2016), 921--942. Google ScholarDigital Library
- Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 262--273. Google ScholarDigital Library
- Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, and Ruslan Salakhutdinov. 2016. Words or characters? fine-grained gating for reading comprehension. arXiv preprint arXiv:1611.01724 (2016).Google Scholar
- Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 689--699. Google ScholarDigital Library
- Xin Ye, Razvan Bunescu, and Chang Liu. 2016. Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation. IEEE Transactions on Software Engineering, Vol. 42, 4 (2016), 379--402. Google ScholarDigital Library
- Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering. ACM, 404--415. Google ScholarDigital Library
- Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 14--24. Google ScholarDigital Library
Index Terms
- Bug Localization by Learning to Rank and Represent Bug Inducing Changes
Recommendations
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on InternetwareBug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Multi-level reranking approach for bug localization
Bug fixing has a key role in software quality evaluation. Bug fixing starts with the bug localization step, in which developers use textual bug information to find location of source codes which have the bug. Bug localization is a tedious and time ...
A preliminary study on using code smells to improve bug localization
ICPC '18: Proceedings of the 26th Conference on Program ComprehensionBug localization is a technique that has been proposed to support the process of identifying the locations of bugs specified in a bug report. A traditional approach such as information retrieval (IR)-based bug localization calculates the similarity ...
Comments