ABSTRACT
Issue Tracking systems (ITS) such as Google Code Hosting and Bugzilla facilitate software maintenance activities through bug reporting, archiving and fixing. The large number of bug reports and their unstructured text makes it impractical for developers to manually extract actionable intelligence to expedite bug fixing. In this paper, we present an application of mining bug report description and threaded discussion comments using Latent Dirichlet Allocation (LDA) which is a topic modeling technique. We apply LDA on the Chromium Browser Project bug archives (open-source) to extract topics (discovery of semantically related terms) and the latent semantic relationship between documents (bug reports) and extracted topics for corpus exploration, trend analysis and understanding evolution in maintenance domain. We conduct a series of experiments to uncover latent topics potentially useful for developers and testers based on the bug meta-data such as time, priority, type, category and status. The analysis of reopened and duplicate bugs in particular, has important inferences for the developers and can help in applications such as expertise modeling, resource allocation and knowledge management.
- J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In ICSE, pages 361–370, 2006. Google ScholarDigital Library
- K. H. Bennett and V. T. Rajlich. Software maintenance and evolution: a roadmap. In Future of Software Engineering, pages 73–87. ACM, 2000. Google ScholarDigital Library
- M. L. Bernardi, C. Sementa, Q. Zagarese, D. Distante, and M. Di Penta. What topics do firefox and chrome contributors discuss? In MSR, pages 234–237, 2011. Google ScholarDigital Library
- D. M. Blei and J. Lafferty. Topic models. Text mining: classification, clustering, and applications, 10:71, 2009.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3:993–1022, 2003. Google ScholarDigital Library
- D. Han, C. Zhang, X. Fan, A. Hindle, K. Wong, and E. Stroulia. Understanding android fragmentation with topic analysis of vendor-specific bugs. In WCRE, pages 83–92, 2012. Google ScholarDigital Library
- B. P. Lientz, E. B. Swanson, and G. E. Tompkins. Characteristics of application software maintenance. Communications of the ACM, 21(6):466–471, 1978. Google ScholarDigital Library
- E. Linstead and P. Baldi. Mining the coherence of gnome bug reports with statistical topic models. In MSR, pages 99–102, 2009. Google ScholarDigital Library
- A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res.(JAIR), 30:249–272, 2007. Google ScholarDigital Library
- D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In SIGKDD, pages 500–509. ACM, 2007. Google ScholarDigital Library
- J.-w. Park, M.-W. Lee, J. Kim, S.-w. Hwang, and S. Kim. Costriage: A cost-aware triage algorithm for bug reporting systems. In AAAI, 2011.Google ScholarDigital Library
- T. Savage, B. Dit, M. Gethers, and D. Poshyvanyk. Topic xp: Exploring topics in source code using latent dirichlet allocation. In ICSM, pages 1–6, 2010. Google ScholarDigital Library
- K. Tian, M. Revelle, and D. Poshyvanyk. Using latent dirichlet allocation for automatic categorization of software. In MSR, pages 163–166, 2009. Google ScholarDigital Library
- H. M. Wallach. Topic modeling: beyond bag-of-words. In ICML, pages 977–984, 2006. Google ScholarDigital Library
- W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In ICSE, pages 364–374, 2009. Google ScholarDigital Library
- S. Zaman, B. Adams, and A. E. Hassan. Security versus performance bugs: a case study on firefox. In MSR, pages 93–102. ACM, 2011. Google ScholarDigital Library
- J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Software Engineering (ICSE), 2012 34th International Conference on, pages 14–24, 2012. Google ScholarDigital Library
Index Terms
Mining issue tracking systems using topic models for trend analysis, corpus exploration, and understanding evolution
Recommendations
Studying software evolution using topic models
Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by ...
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data and text documents. Researchers have published many articles in the field of topic modeling and applied in ...
Comments