ABSTRACT
The links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in measuring software defects and maintenance efforts. However, the state-of-the-art bug-to-fix link recovery approaches still rely much on textual matching between bug reports and commit/change logs and cannot handle well the cases where their contents are not textually similar.
This paper introduces MLink, a multi-layered approach that takes into account not only textual features but also source code features of the changed code corresponding to the commit logs. It is also capable of learning the association relations between the terms in bug reports and the names of entities/components in the changed source code of the commits from the established bug-to-fix links, and uses them for link recovery between the reports and commits that do not share much similar texts. Our empirical evaluation on real-world projects shows that MLink can improve the state-of-the-art bug-to-fix link recovery methods by 11--18%, 13--17%, and 8--17% in F-score, recall, and precision, respectively.
- G. Antoniol, K. Ayari, M. Di Penta, F. Khomh, and Y.-G. Guéhéneuc. Is it a bug or an enhancement?: a text-based approach to classify change requests. In Proceedings of the conference of the center for advanced studies research, CASCON'08. ACM, 2008. Google ScholarDigital Library
- G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng., 28:970--983, October 2002. Google ScholarDigital Library
- Apache. http://httpd.apache.org/.Google Scholar
- J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In ICSE '09, pp. 298--308. IEEE CS, 2009. Google ScholarDigital Library
- H. U. Asuncion, A. U. Asuncion, and R. N. Taylor. Software traceability with topic modeling. In ICSE'10, pages 95--104. ACM, 2010. Google ScholarDigital Library
- A. Bacchelli, M. D'Ambros, M. Lanza, and R. Robbes. Benchmarking lightweight techniques to link e-mails and source code. In Working Conference on Reverse Engineering, WCRE'09, pp. 205--214. IEEE CS, 2009. Google ScholarDigital Library
- A. Bacchelli, M. Lanza, and R. Robbes. Linking e-mails and source code artifacts. In ICSE'10, pages 375--384. ACM, 2010. Google ScholarDigital Library
- A. Bachmann and A. Bernstein. Software process data quality and characteristics: a historical view on open and closed source projects. In IWPSE-Evol'09, pages 119--128. ACM, 2009. Google ScholarDigital Library
- A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein. The missing links: bugs and bug-fix commits. In FSE'10, pages 97--106. ACM, 2010. Google ScholarDigital Library
- C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu. Fair and balanced?: bias in bug-fix datasets. In ESEC/FSE '09, pages 121--130. ACM, 2009. Google ScholarDigital Library
- C. Bird, A. Bachmann, F. Rahman, and A. Bernstein. Linkster: enabling efficient manual inspection and annotation of mined data. In FSE'10, ACM. 2010. Google ScholarDigital Library
- C. Corley, N. Kraft, L. Etzkorn, S. Lukins. Recovering traceability links between source code and fixed bugs via patch analysis. In TEFSE'11, IEEE CS, 2011. Google ScholarDigital Library
- J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker. A machine learning approach for tracing regulatory codes to product specific requirements. In ICSE'10, pages 155--164. ACM, 2010. Google ScholarDigital Library
- M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In Working Conference on Reverse Engineering, WCRE'03, pages 90--99. IEEE CS, 2003. Google ScholarDigital Library
- M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In ICSM'03, pp. 23--32. IEEE, 2003. Google ScholarDigital Library
- A. Hindle, D. M. German, and R. Holt. What do large commits tell us?: a taxonomical study of large commits. Int. working conference on Mining software repositories, MSR '08, pages 99--108. ACM, 2008. Google ScholarDigital Library
- S. Kim, J. Whitehead, and Y. Zhang. Classifying software changes: Clean or buggy? IEEE Trans. on Software Engineering, 34(2):181--196. 2008. Google ScholarDigital Library
- S. Kim, T. Zimmermann, K. Pan, and J. Whitehead. Automatic identification of bug-introducing changes. In ASE'06, pages 81--90. IEEE CS, 2006. Google ScholarDigital Library
- S. Kim, T. Zimmermann, J. Whitehead, and A. Zeller. Predicting faults from cached history. In ICSE'07, pages 489--498. IEEE CS, 2007. Google ScholarDigital Library
- S. Kim, H. Zhang, R. Wu and L. Gong. Dealing with Noise in Defect Prediction. In ICSE'11, pages 481--490. IEEE CS, 2011. Google ScholarDigital Library
- G. A. Liebchen and M. Shepperd. Data sets and data quality in software engineering. In international workshop on Predictor models in software engineering, PROMISE '08, pp. 39--44. ACM, 2008. Google ScholarDigital Library
- A. Marcus and J. I. Maletic. Recovering documentation-to-source-code traceability links using latent semantic indexing. In ICSE'03, pages 125--135. IEEE CS, 2003. Google ScholarDigital Library
- A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of open source software development: Apache and mozilla. ACM Trans. Softw. Eng. Methodol., 11:309--346, July 2002. Google ScholarDigital Library
- A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM'00, pages 120--130. IEEE CS, 2000. Google ScholarDigital Library
- T. Morse. Concurrent Versions System. Linux Journal. Vol no 21es. 1996.Google Scholar
- I. Myrtveit, E. Stensrud, and U. H. Olsson. Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans. Softw. Eng., 27:999--1013, Nov 2001. Google ScholarDigital Library
- A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen. A Topic-based Approach for Narrowing the Search Space of Buggy Files from a Bug Report. In ASE'11. IEEE CS, 2011. Google ScholarDigital Library
- T. H. D. Nguyen, B. Adams, and A. E. Hassan. A case study of bias in bug-fix datasets. In WCRE'10, pages 259--268. IEEE CS, 2010. Google ScholarDigital Library
- Openintents. http://www.openintents.org/.Google Scholar
- G. Salton and C. Yang. On the specification of term values in automatic indexing. Journal of Documentation, 29(4):351--372, 1973.Google ScholarCross Ref
- A. Schröter, T. Zimmermann, R. Premraj, and A. Zeller. If your bug database could talk... In Proceedings of the 5th International Symposium on Empirical Software Engineering, pages 18--20, 2006.Google Scholar
- J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? Int. workshop on Mining software repositories, MSR'05, pages 1--5. ACM, 2005. Google ScholarDigital Library
- K. Strike, K. El Emam, and N. Madhavji. Software cost estimation with incomplete data. IEEE Trans. Softw. Eng., 27:890--908, October 2001. Google ScholarDigital Library
- R. Wu, H. Zhang, S. Kim, and S.-C. Cheung. Relink: recovering links between bugs and changes. In ESEC/FSE '11, pages 15--25. ACM, 2011. Google ScholarDigital Library
- T. Zimmermann. Preprocessing cvs data for fine-grained analysis. In MSR'04, pp. 2--6. IEEE, 2004.Google ScholarCross Ref
- T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In PROMISE'07, pages 9--19. IEEE CS, 2007. Google ScholarDigital Library
- Zxing. http://code.google.com/p/zxing/.Google Scholar
- Subversion SVN. http://subversion.tigris.org/.Google Scholar
- Bugzilla. http://www.bugzilla.org/.Google Scholar
- ReLink Project. http://www.cse.ust.hk/~scc/Relink.htm.Google Scholar
Index Terms
- Multi-layered approach for recovering links between bug reports and fixes
Recommendations
Memories of bug fixes
SIGSOFT '06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineeringThe change history of a software project contains a rich collection of code changes that record previous development experience. Changes that fix bugs are especially interesting, since they record both the old buggy code and the new fixed code. This ...
From Android Bug Reports to Android Bug Handling Process: An Empirical Study of Open-Source Development
Android is an operating system for mobile devices. Its development is led by Google and some other companies. Because of the open-source property of Android, anyone can report a bug through its online bug tracking system. In this paper, we analyze the ...
Towards Semi-automatic Bug Triage and Severity Prediction Based on Topic Model and Multi-feature of Bug Reports
COMPSAC '14: Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications ConferenceBug fixing is an essential activity in the software maintenance, because most of the software systems have unavoidable defects. When new bugs are submitted, triagers have to find and assign appropriate developers to fix the bugs. However, if the bugs are ...
Comments