Abstract
During the software development process, occurring problems are collected and managed as bug reports using bug tracking systems. Usually, a bug report is specified by a title, a more detailed description, and additional categorical information, e.g., the affected component or the reporter. It is the task of the triage owner to assign open bug reports to developers with the required skills to fix them. However, the bug assignment task is time-consuming, especially in large software projects with many involved developers. This observation motivates using (semi-)automatic algorithms for assigning bugs to developers. Various approaches have been developed that rely on a machine learning model trained on historical bug reports. Thereby, the modeling of the textual components is mainly done using topic models, mainly Latent Dirichlet Allocation (LDA). Although different variants, inference techniques, and libraries for LDA exist and various hyperparameters can be specified, most works treat topic models as a black box without exploring them in detail. In this work, we extend a study of Atzberger and Schneider et al. on the use of the Author-Topic Model (ATM) for bug triaging tasks. We demonstrate the influence of the underlying topic model, the used library and inference techniques, and the hyperparameters on the bug triaging results. The results of our conducted experiments on a dataset from the Mozilla Firefox project provide guidelines for applying LDA for bug triaging tasks effectively.
The first two authors contributed equally to this work. This work is mainly based on a former publication of the two main authors and their co-authors and the master thesis of the second author.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aktas, E.U., Yilmaz, C.: Automated issue assignment: results and insights from an industrial case. Empir. Softw. Eng. 25(5), 3544–3589 (2020). https://doi.org/10.1007/s10664-020-09846-3
Atzberger, D., Schneider, J., Scheibel, W., Limberger, D., Trapp, M., Döllner, J.: Mining developer expertise from bug tracking systems using the author-topic model. In: Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2022, pp. 107–118. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011045100003176
Banitaan, S., Alenezi, M.: TRAM: an approach for assigning bug reports using their metadata. In: Proceedings 3rd International Conference on Communications and Information Technology, ICCIT 2013, pp. 215–219. IEEE (2013). https://doi.org/10.1109/ICCITechnology.2013.6579552
Bhattacharya, P., Neamtiu, I.: Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: Proceedings International Conference on Software Maintenance, ICSM 2010, pp. 1–10. IEEE (2010). https://doi.org/10.1109/ICSM.2010.5609736
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chen, T.-H., Thomas, S.W., Hassan, A.E.: A survey on the use of topic models when mining software repositories. Empir. Softw. Eng. 21(5), 1843–1919 (2015). https://doi.org/10.1007/s10664-015-9402-8
Dedik, V., Rossi, B.: Automated bug triaging in an industrial context. In: Proceedings 42th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2016, pp. 363–367. IEEE (2016). https://doi.org/10.1109/SEAA.2016.20
Geigle, C.: Inference methods for Latent Dirichlet allocation (course notes in CS 598 CXZ: advanced topics in information retrieval). Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign (2016)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228–5235 (4 2004). https://doi.org/10.1073/pnas.0307752101
Hoffman, M., Bach, F., Blei, D.: Online learning for Latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, NIPS 2010, vol. 23, pp. 856–864. Curran Associates Inc. (2010)
Hu, H., Zhang, H., Xuan, J., Sun, W.: Effective bug triage based on historical bug-fix information. In: Proceedings 25th International Symposium on Software Reliability Engineering, ISSRE 2014, pp. 122–132. IEEE (2014). https://doi.org/10.1109/ISSRE.2014.17
Jonsson, L., Borg, M., Broman, D., Sandahl, K., Eldh, S., Runeson, P.: Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir. Softw. Eng. 21(4), 1533–1578 (2015). https://doi.org/10.1007/s10664-015-9401-9
Kagdi, H., Gethers, M., Poshyvanyk, D., Hammad, M.: Assigning change requests to software developers. J. Softw. Evol. Process 24(1), 3–33 (2012). https://doi.org/10.1002/smr.530
Khatun, A., Sakib, K.: A bug assignment technique based on bug fixing expertise and source commit recency of developers. In: Proceedings 19th International Conference on Computer and Information Technology, ICCIT 2016, pp. 592–597. IEEE (2016). https://doi.org/10.1109/ICCITECHN.2016.7860265
Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., Baldi, P.: Mining eclipse developer contributions via author-topic models. In: Proceedings 4th International Workshop on Mining Software Repositories, MSR 2007, pp. 1–4. IEEE (2007). https://doi.org/10.1109/MSR.2007.20
Mani, S., Sankaran, A., Aralikatte, R.: DeepTriage: exploring the effectiveness of deep learning for bug triaging. In: Proceedings India Joint International Conference on Data Science and Management of Data, pp. 171–179. ACM (2019). https://doi.org/10.1145/3297001.3297023
Matter, D., Kuhn, A., Nierstrasz, O.: Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings 6th International Working Conference on Mining Software Repositories, MSR 2009, pp. 131–140. IEEE (2009). https://doi.org/10.1109/MSR.2009.5069491
McCallum, A.K.: MALLET: a machine learning for language toolkit (2002). http://www.cs.umass.edu/%7Emccallum/mallet
Mortensen, O.: The author-topic model. Master’s thesis, Technical University of Denmark, Department of Applied Mathematics and Computer Science (2017)
Naguib, H., Narayan, N., Brügge, B., Helal, D.: Bug report assignee recommendation using activity profiles. In: Proceedings 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 22–30. IEEE (2013). https://doi.org/10.1109/MSR.2013.6623999
Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. J. Mach. Learn. Res. 10, 1801–1828 (2009)
Nguyen, T.T., Nguyen, A.T., Nguyen, T.N.: Topic-based, time-aware bug assignment. SIGSOFT Softw. Eng. Notes 39(1), 1–4 (2014). https://doi.org/10.1145/2557833.2560585
Ramage, D., Rosen, E., Chuang, J., Manning, C.D., McFarland, D.A.: Topic modeling for the social sciences. In: Proceedings Workshop on Applications for Topic Models: Text and Beyond, pp. 23:1–4 (2009)
Rehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA (2010)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 487–494. AUAI Press (2004)
Sajedi-Badashian, A., Hindle, A., Stroulia, E.: Crowdsourced bug triaging, ICSME 2015, pp. 506–510. IEEE (2015). https://doi.org/10.1109/ICSM.2015.7332503
Sajedi-Badashian, A., Stroulia, E.: Guidelines for evaluating bug-assignment research. J. Softw. Evol. Process 32(9) (2020). https://doi.org/10.1002/smr.2250
Schofield, A., Magnusson, M., Mimno, D.: Pulling out the stops: rethinking stopword removal for topic models. In: Proceedings 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, pp. 432–436. ACL (2017). https://doi.org/10.18653/v1/E17-2069
Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016). https://doi.org/10.1162/tacl_a_00099
Schofield, A., Thompson, L., Mimno, D.: Quantifying the effects of text duplication on semantic models. In: Proceedings Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2737–2747. ACL (2017). https://doi.org/10.18653/v1/D17-1290
Shokripour, R., Anvik, J., Kasirun, Z., Zamani, S.: Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 2–11. IEEE (2013). https://doi.org/10.1109/MSR.2013.6623997
Syed, S., Spruit, M.: Exploring symmetrical and asymmetrical Dirichlet priors for latent Dirichlet allocation. Inte. J. Seman. Comput. 12(3), 399–423 (2018). https://doi.org/10.1142/S1793351X18400184
Tamrawi, A., Nguyen, T.T., Al-Kofahi, J.M., Nguyen, T.N.: Fuzzy set and cache-based approach for bug triaging. In: Proceedings 19th SIGSOFT Symposium on Foundations of Software Engineering, FSE/ESEC 2011, pp. 365–375. ACM (2011). https://doi.org/10.1145/2025113.2025163
Wallach, H.M.: Structured topic models for language. Ph.D. thesis, Newnham College, University of Cambridge (2008)
Wallach, H.M., Mimno, D., McCallum, A.K.: Rethinking LDA: why priors matter. In: Proceedings 22nd International Conference on Neural Information Processing Systems, NIPS 2009, pp. 1973–1981. Curran Associates, Inc. (2009)
Wu, W., Zhang, W., Yang, Y., Wang, Q.: DREX: developer recommendation with k-nearest-neighbor search and expertise ranking. In: Proceedings 18th Asia-Pacific Software Engineering Conference, APSEC 2011, pp. 389–396. IEEE (2011). https://doi.org/10.1109/APSEC.2011.15
Xia, X., Lo, D., Ding, Y., Al-Kofahi, J.M., Nguyen, T.N., Wang, X.: Improving automated bug triaging with specialized topic model. Trans. Softw. Eng. 43(3), 272–297 (2016). https://doi.org/10.1109/TSE.2016.2576454
Xia, X., Lo, D., Wang, X., Zhou, B.: Accurate developer recommendation for bug resolution. In: Proceedings 20th Working Conference on Reverse Engineering, WCRE 2013, pp. 72–81. IEEE (2013). https://doi.org/10.1109/WCRE.2013.6671282
Xie, X., Zhang, W., Yang, Y., Wang, Q.: DRETOM: developer recommendation based on topic models for bug resolution. In: Proceedings 8th International Conference on Predictive Models in Software Engineering, PROMISE 2012, pp. 19–28. ACM (2012). https://doi.org/10.1145/2365324.2365329
Yang, G., Zhang, T., Lee, B.: Towards semi-automatic bug triage and severity prediction based on topic model and multi-feature of bug reports. In: Proceedings 38th Annual Computer Software and Applications Conference, COMPSAC 2014, pp. 97–106. IEEE (2014). https://doi.org/10.1109/COMPSAC.2014.16
Yao, L., Mimno, D., McCallum, A.K.: Efficient methods for topic model inference on streaming document collections. In: Proceedings SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 937–945. ACM (2009). https://doi.org/10.1145/1557019.1557121
Zhang, T., Chen, J., Yang, G., Lee, B., Luo, X.: Towards more accurate severity prediction and fixer recommendation of software bugs. J. Syst. Softw. 117, 166–184 (2016). https://doi.org/10.1016/j.jss.2016.02.034
Zhang, T., Yang, G., Lee, B., Lua, E.K.: A novel developer ranking algorithm for automatic bug triage using topic model and developer relations. In: Proceedings 21st Asia-Pacific Software Engineering Conference, APSEC 2014, pp. 223–230. IEEE (2014). https://doi.org/10.1109/APSEC.2014.43
Zhang, W., Han, G., Wang, Q.: BUTTER: an approach to bug triage with topic modeling and heterogeneous network analysis. In: Proceedings International Conference on Cloud Computing and Big Data, CCBD 2014, pp. 62–69. IEEE (2014). https://doi.org/10.1109/CCBD.2014.14
Zou, W., Lo, D., Chen, Z., Xia, X., Feng, Y., Xu, B.: How practitioners perceive automated bug report management techniques. Trans. Softw. Eng. 46(8), 836–862 (2020). https://doi.org/10.1109/TSE.2018.2870414
Acknowledgements
This work is part of the “Software-DNA” project, which is funded by the European Regional Development Fund (ERDF or EFRE in German) and the State of Brandenburg (ILB). This work is part of the KMU project “KnowhowAnalyzer” (Förderkennzeichen 01IS20088B), which is funded by the German Ministry for Education and Research (Bundesministerium für Bildung und Forschung).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Atzberger, D., Schneider, J., Scheibel, W., Trapp, M., Döllner, J. (2023). Evaluating Probabilistic Topic Models for Bug Triaging Tasks. In: Kaindl, H., Mannion, M., Maciaszek, L.A. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2022. Communications in Computer and Information Science, vol 1829. Springer, Cham. https://doi.org/10.1007/978-3-031-36597-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-36597-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36596-6
Online ISBN: 978-3-031-36597-3
eBook Packages: Computer ScienceComputer Science (R0)