ABSTRACT
Bug-reports are valuable sources of information. However, study of the bug-reports' content written in natural language demands tedious human efforts for manual interpretation. This difficulty limits the scale of empirical studies, which rely on interpretation and categorization of bug-reports. In this work, we investigate the effectiveness of Labeled Latent Dirichlet Allocation (LLDA) in automatic classification of bug-reports into a predefined set of categories.
- Stanford topic modelling toolbox, http://nlp.stanford.edu/software/tmt/tmt-0.3, last access: Jan 2016.Google Scholar
- P. Anbalagan and M. Vouk. On predicting the time taken to correct bug reports in open source projects. In Proceedings of the International Conference on Software Maintenance (ICSM), pages 523--526, 2009.Google ScholarCross Ref
- G. Antoniol, K. Ayari, M. Penta, F. Khomh, and Y. Guéhéneuc. Is it a bug or an enhancement? In Proceedings of the Centre for Advanced Studies Conference (CASCON), pages 304--318, 2008. Google ScholarDigital Library
- E. Giger, M. Pinzger, and H. Gall. Predicting the fix time of bugs. In Proceedings of the International Workshop on Recommendation Systems for Software Engineering (WRSSE), pages 52--56, 2010. Google ScholarDigital Library
- A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals. Predicting the severity of a reported bug. In Proceedings of the International Conference on Mining Software Repositories (MSR), pages 1--10, 2010.Google ScholarCross Ref
- T. Menzies and A. Marcus. Automated severity assessment of software defect reports. In Proceedings of the International Conference on Software Maintenance (ICSM), pages 346--355, 2008.Google ScholarCross Ref
- N. Pingclasai, H. Hata, and K. Matsumoto. Classifying bug reports to bugs and other requests using topic modeling. In Proceedings of the Asia-Pacific Software Engineering Conference (APSEC), pages 13--18, 2013.Google ScholarCross Ref
- D. Ramage, D. Hall, R. Nallapati, and C. Manning. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 248--256, 2009. Google ScholarDigital Library
- S. Rastkar, G. Murphy, and G. Murray. Summarizing software artifacts: A case study of bug reports. In Proceedings of the International Conference on Software Engineering (ICSE), pages 505--514, 2010. Google ScholarDigital Library
- P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proceedings of the International Conference on Software Engineering, pages 499--510, 2007. Google ScholarDigital Library
- K. Somasundaram and G. Murphy. Automatic categorization of bug reports using latent dirichlet allocation. In Proceedings of the India Software Engineering Conference (ISEC), pages 125--130, 2012. Google ScholarDigital Library
- C. Sun, D. Lo amd S. Khoo, and J. Jiang. Towards more accurate retrieval of duplicate bug reports. In Proceedings of the International Conference on Automated Software Engineering (ASE), pages 253--262, 2011. Google ScholarDigital Library
- C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the International Conference on Software Engineering (ICSE), pages 45--54, 2010. Google ScholarDigital Library
- Y. Tian, D. Lo, and C. Sun. Drone: Predicting priority of reported bugs by multi-factor analysis. In Proceedings of the International Conference on Software Maintenance (ICSM), pages 200--209, 2013. Google ScholarDigital Library
- D. Čubranić Automatic bug triage using text categorization. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering (SEKE), pages 92--97, 2004.Google Scholar
- C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How long will it take to fix this bug? In Proceedings of the International Conference on Mining Software Repositories (MSR), page 1, 2007. Google ScholarDigital Library
- Y. Zhou, Y. Tong, R. Gu, and H. Gall. Combining text mining and data mining for bug report classification. In Proceedings of the International Conference on Software Maintenance and Evolution(ICSME), pages 311--320, 2014. Google ScholarDigital Library
- M. Zibran. What makes APIs difficult to use? J. Comp. Sci. Netw. Sec., 8(4):255--261, 2008.Google Scholar
- M. Zibran, F. Eishita, and C. Roy. Useful, but usable? factors affecting the usability of APIs. In Proceedings of the International Working Conference on Reverse Engineering (WCRE), pages 151--155, 2011. Google ScholarDigital Library
Index Terms
- On the effectiveness of labeled latent dirichlet allocation in automatic bug-report categorization
Recommendations
Automatic categorization of bug reports using latent Dirichlet allocation
ISEC '12: Proceedings of the 5th India Software Engineering ConferenceSoftware developers, particularly in open-source projects, rely on bug repositories to organize their work. On a bug report, the component field is used to indicate to which team of developers a bug should be routed. Researchers have shown that ...
Bug localization using latent Dirichlet allocation
Context: Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has ...
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataExtraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Comments