skip to main content
10.1145/2393596.2393671acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Multi-layered approach for recovering links between bug reports and fixes

Published:11 November 2012Publication History

ABSTRACT

The links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in measuring software defects and maintenance efforts. However, the state-of-the-art bug-to-fix link recovery approaches still rely much on textual matching between bug reports and commit/change logs and cannot handle well the cases where their contents are not textually similar.

This paper introduces MLink, a multi-layered approach that takes into account not only textual features but also source code features of the changed code corresponding to the commit logs. It is also capable of learning the association relations between the terms in bug reports and the names of entities/components in the changed source code of the commits from the established bug-to-fix links, and uses them for link recovery between the reports and commits that do not share much similar texts. Our empirical evaluation on real-world projects shows that MLink can improve the state-of-the-art bug-to-fix link recovery methods by 11--18%, 13--17%, and 8--17% in F-score, recall, and precision, respectively.

References

  1. G. Antoniol, K. Ayari, M. Di Penta, F. Khomh, and Y.-G. Guéhéneuc. Is it a bug or an enhancement?: a text-based approach to classify change requests. In Proceedings of the conference of the center for advanced studies research, CASCON'08. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng., 28:970--983, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apache. http://httpd.apache.org/.Google ScholarGoogle Scholar
  4. J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In ICSE '09, pp. 298--308. IEEE CS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. U. Asuncion, A. U. Asuncion, and R. N. Taylor. Software traceability with topic modeling. In ICSE'10, pages 95--104. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bacchelli, M. D'Ambros, M. Lanza, and R. Robbes. Benchmarking lightweight techniques to link e-mails and source code. In Working Conference on Reverse Engineering, WCRE'09, pp. 205--214. IEEE CS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Bacchelli, M. Lanza, and R. Robbes. Linking e-mails and source code artifacts. In ICSE'10, pages 375--384. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bachmann and A. Bernstein. Software process data quality and characteristics: a historical view on open and closed source projects. In IWPSE-Evol'09, pages 119--128. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein. The missing links: bugs and bug-fix commits. In FSE'10, pages 97--106. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu. Fair and balanced?: bias in bug-fix datasets. In ESEC/FSE '09, pages 121--130. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Bird, A. Bachmann, F. Rahman, and A. Bernstein. Linkster: enabling efficient manual inspection and annotation of mined data. In FSE'10, ACM. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Corley, N. Kraft, L. Etzkorn, S. Lukins. Recovering traceability links between source code and fixed bugs via patch analysis. In TEFSE'11, IEEE CS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker. A machine learning approach for tracing regulatory codes to product specific requirements. In ICSE'10, pages 155--164. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In Working Conference on Reverse Engineering, WCRE'03, pages 90--99. IEEE CS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In ICSM'03, pp. 23--32. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Hindle, D. M. German, and R. Holt. What do large commits tell us?: a taxonomical study of large commits. Int. working conference on Mining software repositories, MSR '08, pages 99--108. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Kim, J. Whitehead, and Y. Zhang. Classifying software changes: Clean or buggy? IEEE Trans. on Software Engineering, 34(2):181--196. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kim, T. Zimmermann, K. Pan, and J. Whitehead. Automatic identification of bug-introducing changes. In ASE'06, pages 81--90. IEEE CS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Kim, T. Zimmermann, J. Whitehead, and A. Zeller. Predicting faults from cached history. In ICSE'07, pages 489--498. IEEE CS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Kim, H. Zhang, R. Wu and L. Gong. Dealing with Noise in Defect Prediction. In ICSE'11, pages 481--490. IEEE CS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. A. Liebchen and M. Shepperd. Data sets and data quality in software engineering. In international workshop on Predictor models in software engineering, PROMISE '08, pp. 39--44. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Marcus and J. I. Maletic. Recovering documentation-to-source-code traceability links using latent semantic indexing. In ICSE'03, pages 125--135. IEEE CS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of open source software development: Apache and mozilla. ACM Trans. Softw. Eng. Methodol., 11:309--346, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM'00, pages 120--130. IEEE CS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Morse. Concurrent Versions System. Linux Journal. Vol no 21es. 1996.Google ScholarGoogle Scholar
  26. I. Myrtveit, E. Stensrud, and U. H. Olsson. Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans. Softw. Eng., 27:999--1013, Nov 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen. A Topic-based Approach for Narrowing the Search Space of Buggy Files from a Bug Report. In ASE'11. IEEE CS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. H. D. Nguyen, B. Adams, and A. E. Hassan. A case study of bias in bug-fix datasets. In WCRE'10, pages 259--268. IEEE CS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Openintents. http://www.openintents.org/.Google ScholarGoogle Scholar
  30. G. Salton and C. Yang. On the specification of term values in automatic indexing. Journal of Documentation, 29(4):351--372, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Schröter, T. Zimmermann, R. Premraj, and A. Zeller. If your bug database could talk... In Proceedings of the 5th International Symposium on Empirical Software Engineering, pages 18--20, 2006.Google ScholarGoogle Scholar
  32. J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? Int. workshop on Mining software repositories, MSR'05, pages 1--5. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Strike, K. El Emam, and N. Madhavji. Software cost estimation with incomplete data. IEEE Trans. Softw. Eng., 27:890--908, October 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Wu, H. Zhang, S. Kim, and S.-C. Cheung. Relink: recovering links between bugs and changes. In ESEC/FSE '11, pages 15--25. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Zimmermann. Preprocessing cvs data for fine-grained analysis. In MSR'04, pp. 2--6. IEEE, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  36. T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In PROMISE'07, pages 9--19. IEEE CS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zxing. http://code.google.com/p/zxing/.Google ScholarGoogle Scholar
  38. Subversion SVN. http://subversion.tigris.org/.Google ScholarGoogle Scholar
  39. Bugzilla. http://www.bugzilla.org/.Google ScholarGoogle Scholar
  40. ReLink Project. http://www.cse.ust.hk/~scc/Relink.htm.Google ScholarGoogle Scholar

Index Terms

  1. Multi-layered approach for recovering links between bug reports and fixes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FSE '12: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
        November 2012
        494 pages
        ISBN:9781450316149
        DOI:10.1145/2393596

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 November 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of128submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader