skip to main content
10.1145/2593801.2593810acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Mining issue tracking systems using topic models for trend analysis, corpus exploration, and understanding evolution

Published:03 June 2014Publication History

ABSTRACT

Issue Tracking systems (ITS) such as Google Code Hosting and Bugzilla facilitate software maintenance activities through bug reporting, archiving and fixing. The large number of bug reports and their unstructured text makes it impractical for developers to manually extract actionable intelligence to expedite bug fixing. In this paper, we present an application of mining bug report description and threaded discussion comments using Latent Dirichlet Allocation (LDA) which is a topic modeling technique. We apply LDA on the Chromium Browser Project bug archives (open-source) to extract topics (discovery of semantically related terms) and the latent semantic relationship between documents (bug reports) and extracted topics for corpus exploration, trend analysis and understanding evolution in maintenance domain. We conduct a series of experiments to uncover latent topics potentially useful for developers and testers based on the bug meta-data such as time, priority, type, category and status. The analysis of reopened and duplicate bugs in particular, has important inferences for the developers and can help in applications such as expertise modeling, resource allocation and knowledge management.

References

  1. J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In ICSE, pages 361–370, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. H. Bennett and V. T. Rajlich. Software maintenance and evolution: a roadmap. In Future of Software Engineering, pages 73–87. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. L. Bernardi, C. Sementa, Q. Zagarese, D. Distante, and M. Di Penta. What topics do firefox and chrome contributors discuss? In MSR, pages 234–237, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. M. Blei and J. Lafferty. Topic models. Text mining: classification, clustering, and applications, 10:71, 2009.Google ScholarGoogle Scholar
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3:993–1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Han, C. Zhang, X. Fan, A. Hindle, K. Wong, and E. Stroulia. Understanding android fragmentation with topic analysis of vendor-specific bugs. In WCRE, pages 83–92, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. P. Lientz, E. B. Swanson, and G. E. Tompkins. Characteristics of application software maintenance. Communications of the ACM, 21(6):466–471, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Linstead and P. Baldi. Mining the coherence of gnome bug reports with statistical topic models. In MSR, pages 99–102, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res.(JAIR), 30:249–272, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In SIGKDD, pages 500–509. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J.-w. Park, M.-W. Lee, J. Kim, S.-w. Hwang, and S. Kim. Costriage: A cost-aware triage algorithm for bug reporting systems. In AAAI, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Savage, B. Dit, M. Gethers, and D. Poshyvanyk. Topic xp: Exploring topics in source code using latent dirichlet allocation. In ICSM, pages 1–6, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Tian, M. Revelle, and D. Poshyvanyk. Using latent dirichlet allocation for automatic categorization of software. In MSR, pages 163–166, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. M. Wallach. Topic modeling: beyond bag-of-words. In ICML, pages 977–984, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In ICSE, pages 364–374, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Zaman, B. Adams, and A. E. Hassan. Security versus performance bugs: a case study on firefox. In MSR, pages 93–102. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Software Engineering (ICSE), 2012 34th International Conference on, pages 14–24, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining issue tracking systems using topic models for trend analysis, corpus exploration, and understanding evolution

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader