Evaluating Probabilistic Topic Models for Bug Triaging Tasks

Atzberger, Daniel; Schneider, Jonathan; Scheibel, Willy; Trapp, Matthias; Döllner, Jürgen

doi:10.1007/978-3-031-36597-3_3

Daniel Atzberger⁸,
Jonathan Schneider⁸,
Willy Scheibel⁸,
Matthias Trapp⁸ &
…
Jürgen Döllner⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1829))

Included in the following conference series:

International Conference on Evaluation of Novel Approaches to Software Engineering

253 Accesses
1 Citations

Abstract

During the software development process, occurring problems are collected and managed as bug reports using bug tracking systems. Usually, a bug report is specified by a title, a more detailed description, and additional categorical information, e.g., the affected component or the reporter. It is the task of the triage owner to assign open bug reports to developers with the required skills to fix them. However, the bug assignment task is time-consuming, especially in large software projects with many involved developers. This observation motivates using (semi-)automatic algorithms for assigning bugs to developers. Various approaches have been developed that rely on a machine learning model trained on historical bug reports. Thereby, the modeling of the textual components is mainly done using topic models, mainly Latent Dirichlet Allocation (LDA). Although different variants, inference techniques, and libraries for LDA exist and various hyperparameters can be specified, most works treat topic models as a black box without exploring them in detail. In this work, we extend a study of Atzberger and Schneider et al. on the use of the Author-Topic Model (ATM) for bug triaging tasks. We demonstrate the influence of the underlying topic model, the used library and inference techniques, and the hyperparameters on the bug triaging results. The results of our conducted experiments on a dataset from the Mozilla Firefox project provide guidelines for applying LDA for bug triaging tasks effectively.

The first two authors contributed equally to this work. This work is mainly based on a former publication of the two main authors and their co-authors and the master thesis of the second author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aktas, E.U., Yilmaz, C.: Automated issue assignment: results and insights from an industrial case. Empir. Softw. Eng. 25(5), 3544–3589 (2020). https://doi.org/10.1007/s10664-020-09846-3
Article Google Scholar
Atzberger, D., Schneider, J., Scheibel, W., Limberger, D., Trapp, M., Döllner, J.: Mining developer expertise from bug tracking systems using the author-topic model. In: Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2022, pp. 107–118. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011045100003176
Banitaan, S., Alenezi, M.: TRAM: an approach for assigning bug reports using their metadata. In: Proceedings 3rd International Conference on Communications and Information Technology, ICCIT 2013, pp. 215–219. IEEE (2013). https://doi.org/10.1109/ICCITechnology.2013.6579552
Bhattacharya, P., Neamtiu, I.: Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: Proceedings International Conference on Software Maintenance, ICSM 2010, pp. 1–10. IEEE (2010). https://doi.org/10.1109/ICSM.2010.5609736
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chen, T.-H., Thomas, S.W., Hassan, A.E.: A survey on the use of topic models when mining software repositories. Empir. Softw. Eng. 21(5), 1843–1919 (2015). https://doi.org/10.1007/s10664-015-9402-8
Article Google Scholar
Dedik, V., Rossi, B.: Automated bug triaging in an industrial context. In: Proceedings 42th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2016, pp. 363–367. IEEE (2016). https://doi.org/10.1109/SEAA.2016.20
Geigle, C.: Inference methods for Latent Dirichlet allocation (course notes in CS 598 CXZ: advanced topics in information retrieval). Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign (2016)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228–5235 (4 2004). https://doi.org/10.1073/pnas.0307752101
Hoffman, M., Bach, F., Blei, D.: Online learning for Latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, NIPS 2010, vol. 23, pp. 856–864. Curran Associates Inc. (2010)
Google Scholar
Hu, H., Zhang, H., Xuan, J., Sun, W.: Effective bug triage based on historical bug-fix information. In: Proceedings 25th International Symposium on Software Reliability Engineering, ISSRE 2014, pp. 122–132. IEEE (2014). https://doi.org/10.1109/ISSRE.2014.17
Jonsson, L., Borg, M., Broman, D., Sandahl, K., Eldh, S., Runeson, P.: Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir. Softw. Eng. 21(4), 1533–1578 (2015). https://doi.org/10.1007/s10664-015-9401-9
Article Google Scholar
Kagdi, H., Gethers, M., Poshyvanyk, D., Hammad, M.: Assigning change requests to software developers. J. Softw. Evol. Process 24(1), 3–33 (2012). https://doi.org/10.1002/smr.530
Article Google Scholar
Khatun, A., Sakib, K.: A bug assignment technique based on bug fixing expertise and source commit recency of developers. In: Proceedings 19th International Conference on Computer and Information Technology, ICCIT 2016, pp. 592–597. IEEE (2016). https://doi.org/10.1109/ICCITECHN.2016.7860265
Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., Baldi, P.: Mining eclipse developer contributions via author-topic models. In: Proceedings 4th International Workshop on Mining Software Repositories, MSR 2007, pp. 1–4. IEEE (2007). https://doi.org/10.1109/MSR.2007.20
Mani, S., Sankaran, A., Aralikatte, R.: DeepTriage: exploring the effectiveness of deep learning for bug triaging. In: Proceedings India Joint International Conference on Data Science and Management of Data, pp. 171–179. ACM (2019). https://doi.org/10.1145/3297001.3297023
Matter, D., Kuhn, A., Nierstrasz, O.: Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings 6th International Working Conference on Mining Software Repositories, MSR 2009, pp. 131–140. IEEE (2009). https://doi.org/10.1109/MSR.2009.5069491
McCallum, A.K.: MALLET: a machine learning for language toolkit (2002). http://www.cs.umass.edu/%7Emccallum/mallet
Mortensen, O.: The author-topic model. Master’s thesis, Technical University of Denmark, Department of Applied Mathematics and Computer Science (2017)
Google Scholar
Naguib, H., Narayan, N., Brügge, B., Helal, D.: Bug report assignee recommendation using activity profiles. In: Proceedings 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 22–30. IEEE (2013). https://doi.org/10.1109/MSR.2013.6623999
Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. J. Mach. Learn. Res. 10, 1801–1828 (2009)
MathSciNet MATH Google Scholar
Nguyen, T.T., Nguyen, A.T., Nguyen, T.N.: Topic-based, time-aware bug assignment. SIGSOFT Softw. Eng. Notes 39(1), 1–4 (2014). https://doi.org/10.1145/2557833.2560585
Article Google Scholar
Ramage, D., Rosen, E., Chuang, J., Manning, C.D., McFarland, D.A.: Topic modeling for the social sciences. In: Proceedings Workshop on Applications for Topic Models: Text and Beyond, pp. 23:1–4 (2009)
Google Scholar
Rehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA (2010)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 487–494. AUAI Press (2004)
Google Scholar
Sajedi-Badashian, A., Hindle, A., Stroulia, E.: Crowdsourced bug triaging, ICSME 2015, pp. 506–510. IEEE (2015). https://doi.org/10.1109/ICSM.2015.7332503
Sajedi-Badashian, A., Stroulia, E.: Guidelines for evaluating bug-assignment research. J. Softw. Evol. Process 32(9) (2020). https://doi.org/10.1002/smr.2250
Schofield, A., Magnusson, M., Mimno, D.: Pulling out the stops: rethinking stopword removal for topic models. In: Proceedings 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, pp. 432–436. ACL (2017). https://doi.org/10.18653/v1/E17-2069
Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016). https://doi.org/10.1162/tacl_a_00099
Article Google Scholar
Schofield, A., Thompson, L., Mimno, D.: Quantifying the effects of text duplication on semantic models. In: Proceedings Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2737–2747. ACL (2017). https://doi.org/10.18653/v1/D17-1290
Shokripour, R., Anvik, J., Kasirun, Z., Zamani, S.: Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 2–11. IEEE (2013). https://doi.org/10.1109/MSR.2013.6623997
Syed, S., Spruit, M.: Exploring symmetrical and asymmetrical Dirichlet priors for latent Dirichlet allocation. Inte. J. Seman. Comput. 12(3), 399–423 (2018). https://doi.org/10.1142/S1793351X18400184
Article Google Scholar
Tamrawi, A., Nguyen, T.T., Al-Kofahi, J.M., Nguyen, T.N.: Fuzzy set and cache-based approach for bug triaging. In: Proceedings 19th SIGSOFT Symposium on Foundations of Software Engineering, FSE/ESEC 2011, pp. 365–375. ACM (2011). https://doi.org/10.1145/2025113.2025163
Wallach, H.M.: Structured topic models for language. Ph.D. thesis, Newnham College, University of Cambridge (2008)
Google Scholar
Wallach, H.M., Mimno, D., McCallum, A.K.: Rethinking LDA: why priors matter. In: Proceedings 22nd International Conference on Neural Information Processing Systems, NIPS 2009, pp. 1973–1981. Curran Associates, Inc. (2009)
Google Scholar
Wu, W., Zhang, W., Yang, Y., Wang, Q.: DREX: developer recommendation with k-nearest-neighbor search and expertise ranking. In: Proceedings 18th Asia-Pacific Software Engineering Conference, APSEC 2011, pp. 389–396. IEEE (2011). https://doi.org/10.1109/APSEC.2011.15
Xia, X., Lo, D., Ding, Y., Al-Kofahi, J.M., Nguyen, T.N., Wang, X.: Improving automated bug triaging with specialized topic model. Trans. Softw. Eng. 43(3), 272–297 (2016). https://doi.org/10.1109/TSE.2016.2576454
Article Google Scholar
Xia, X., Lo, D., Wang, X., Zhou, B.: Accurate developer recommendation for bug resolution. In: Proceedings 20th Working Conference on Reverse Engineering, WCRE 2013, pp. 72–81. IEEE (2013). https://doi.org/10.1109/WCRE.2013.6671282
Xie, X., Zhang, W., Yang, Y., Wang, Q.: DRETOM: developer recommendation based on topic models for bug resolution. In: Proceedings 8th International Conference on Predictive Models in Software Engineering, PROMISE 2012, pp. 19–28. ACM (2012). https://doi.org/10.1145/2365324.2365329
Yang, G., Zhang, T., Lee, B.: Towards semi-automatic bug triage and severity prediction based on topic model and multi-feature of bug reports. In: Proceedings 38th Annual Computer Software and Applications Conference, COMPSAC 2014, pp. 97–106. IEEE (2014). https://doi.org/10.1109/COMPSAC.2014.16
Yao, L., Mimno, D., McCallum, A.K.: Efficient methods for topic model inference on streaming document collections. In: Proceedings SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 937–945. ACM (2009). https://doi.org/10.1145/1557019.1557121
Zhang, T., Chen, J., Yang, G., Lee, B., Luo, X.: Towards more accurate severity prediction and fixer recommendation of software bugs. J. Syst. Softw. 117, 166–184 (2016). https://doi.org/10.1016/j.jss.2016.02.034
Article Google Scholar
Zhang, T., Yang, G., Lee, B., Lua, E.K.: A novel developer ranking algorithm for automatic bug triage using topic model and developer relations. In: Proceedings 21st Asia-Pacific Software Engineering Conference, APSEC 2014, pp. 223–230. IEEE (2014). https://doi.org/10.1109/APSEC.2014.43
Zhang, W., Han, G., Wang, Q.: BUTTER: an approach to bug triage with topic modeling and heterogeneous network analysis. In: Proceedings International Conference on Cloud Computing and Big Data, CCBD 2014, pp. 62–69. IEEE (2014). https://doi.org/10.1109/CCBD.2014.14
Zou, W., Lo, D., Chen, Z., Xia, X., Feng, Y., Xu, B.: How practitioners perceive automated bug report management techniques. Trans. Softw. Eng. 46(8), 836–862 (2020). https://doi.org/10.1109/TSE.2018.2870414
Article Google Scholar

Download references

Acknowledgements

This work is part of the “Software-DNA” project, which is funded by the European Regional Development Fund (ERDF or EFRE in German) and the State of Brandenburg (ILB). This work is part of the KMU project “KnowhowAnalyzer” (Förderkennzeichen 01IS20088B), which is funded by the German Ministry for Education and Research (Bundesministerium für Bildung und Forschung).

Author information

Authors and Affiliations

Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
Daniel Atzberger, Jonathan Schneider, Willy Scheibel, Matthias Trapp & Jürgen Döllner

Authors

Daniel Atzberger
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Willy Scheibel
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Trapp
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Döllner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Atzberger .

Editor information

Editors and Affiliations

TU Wien, Vienna, Austria
Hermann Kaindl
Glasgow Caledonian University, Glasgow, UK
Mike Mannion
Wroclaw University of Economics, Wroclaw, Poland
Leszek A. Maciaszek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Atzberger, D., Schneider, J., Scheibel, W., Trapp, M., Döllner, J. (2023). Evaluating Probabilistic Topic Models for Bug Triaging Tasks. In: Kaindl, H., Mannion, M., Maciaszek, L.A. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2022. Communications in Computer and Information Science, vol 1829. Springer, Cham. https://doi.org/10.1007/978-3-031-36597-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-36597-3_3
Published: 08 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36596-6
Online ISBN: 978-3-031-36597-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics