Skip to main content

Evaluating Probabilistic Topic Models for Bug Triaging Tasks

  • Conference paper
  • First Online:
Evaluation of Novel Approaches to Software Engineering (ENASE 2022)

Abstract

During the software development process, occurring problems are collected and managed as bug reports using bug tracking systems. Usually, a bug report is specified by a title, a more detailed description, and additional categorical information, e.g., the affected component or the reporter. It is the task of the triage owner to assign open bug reports to developers with the required skills to fix them. However, the bug assignment task is time-consuming, especially in large software projects with many involved developers. This observation motivates using (semi-)automatic algorithms for assigning bugs to developers. Various approaches have been developed that rely on a machine learning model trained on historical bug reports. Thereby, the modeling of the textual components is mainly done using topic models, mainly Latent Dirichlet Allocation (LDA). Although different variants, inference techniques, and libraries for LDA exist and various hyperparameters can be specified, most works treat topic models as a black box without exploring them in detail. In this work, we extend a study of Atzberger and Schneider et al. on the use of the Author-Topic Model (ATM) for bug triaging tasks. We demonstrate the influence of the underlying topic model, the used library and inference techniques, and the hyperparameters on the bug triaging results. The results of our conducted experiments on a dataset from the Mozilla Firefox project provide guidelines for applying LDA for bug triaging tasks effectively.

The first two authors contributed equally to this work. This work is mainly based on a former publication of the two main authors and their co-authors and the master thesis of the second author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.nltk.org/nltk_data/.

  2. 2.

    https://hunch.net/?p=309.

  3. 3.

    https://stackoverflow.com/questions/47310137/mallet-hyperparameter-optimization.

References

  1. Aktas, E.U., Yilmaz, C.: Automated issue assignment: results and insights from an industrial case. Empir. Softw. Eng. 25(5), 3544–3589 (2020). https://doi.org/10.1007/s10664-020-09846-3

    Article  Google Scholar 

  2. Atzberger, D., Schneider, J., Scheibel, W., Limberger, D., Trapp, M., Döllner, J.: Mining developer expertise from bug tracking systems using the author-topic model. In: Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2022, pp. 107–118. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011045100003176

  3. Banitaan, S., Alenezi, M.: TRAM: an approach for assigning bug reports using their metadata. In: Proceedings 3rd International Conference on Communications and Information Technology, ICCIT 2013, pp. 215–219. IEEE (2013). https://doi.org/10.1109/ICCITechnology.2013.6579552

  4. Bhattacharya, P., Neamtiu, I.: Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: Proceedings International Conference on Software Maintenance, ICSM 2010, pp. 1–10. IEEE (2010). https://doi.org/10.1109/ICSM.2010.5609736

  5. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Chen, T.-H., Thomas, S.W., Hassan, A.E.: A survey on the use of topic models when mining software repositories. Empir. Softw. Eng. 21(5), 1843–1919 (2015). https://doi.org/10.1007/s10664-015-9402-8

    Article  Google Scholar 

  7. Dedik, V., Rossi, B.: Automated bug triaging in an industrial context. In: Proceedings 42th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2016, pp. 363–367. IEEE (2016). https://doi.org/10.1109/SEAA.2016.20

  8. Geigle, C.: Inference methods for Latent Dirichlet allocation (course notes in CS 598 CXZ: advanced topics in information retrieval). Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign (2016)

    Google Scholar 

  9. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228–5235 (4 2004). https://doi.org/10.1073/pnas.0307752101

  10. Hoffman, M., Bach, F., Blei, D.: Online learning for Latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, NIPS 2010, vol. 23, pp. 856–864. Curran Associates Inc. (2010)

    Google Scholar 

  11. Hu, H., Zhang, H., Xuan, J., Sun, W.: Effective bug triage based on historical bug-fix information. In: Proceedings 25th International Symposium on Software Reliability Engineering, ISSRE 2014, pp. 122–132. IEEE (2014). https://doi.org/10.1109/ISSRE.2014.17

  12. Jonsson, L., Borg, M., Broman, D., Sandahl, K., Eldh, S., Runeson, P.: Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir. Softw. Eng. 21(4), 1533–1578 (2015). https://doi.org/10.1007/s10664-015-9401-9

    Article  Google Scholar 

  13. Kagdi, H., Gethers, M., Poshyvanyk, D., Hammad, M.: Assigning change requests to software developers. J. Softw. Evol. Process 24(1), 3–33 (2012). https://doi.org/10.1002/smr.530

    Article  Google Scholar 

  14. Khatun, A., Sakib, K.: A bug assignment technique based on bug fixing expertise and source commit recency of developers. In: Proceedings 19th International Conference on Computer and Information Technology, ICCIT 2016, pp. 592–597. IEEE (2016). https://doi.org/10.1109/ICCITECHN.2016.7860265

  15. Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., Baldi, P.: Mining eclipse developer contributions via author-topic models. In: Proceedings 4th International Workshop on Mining Software Repositories, MSR 2007, pp. 1–4. IEEE (2007). https://doi.org/10.1109/MSR.2007.20

  16. Mani, S., Sankaran, A., Aralikatte, R.: DeepTriage: exploring the effectiveness of deep learning for bug triaging. In: Proceedings India Joint International Conference on Data Science and Management of Data, pp. 171–179. ACM (2019). https://doi.org/10.1145/3297001.3297023

  17. Matter, D., Kuhn, A., Nierstrasz, O.: Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings 6th International Working Conference on Mining Software Repositories, MSR 2009, pp. 131–140. IEEE (2009). https://doi.org/10.1109/MSR.2009.5069491

  18. McCallum, A.K.: MALLET: a machine learning for language toolkit (2002). http://www.cs.umass.edu/%7Emccallum/mallet

  19. Mortensen, O.: The author-topic model. Master’s thesis, Technical University of Denmark, Department of Applied Mathematics and Computer Science (2017)

    Google Scholar 

  20. Naguib, H., Narayan, N., Brügge, B., Helal, D.: Bug report assignee recommendation using activity profiles. In: Proceedings 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 22–30. IEEE (2013). https://doi.org/10.1109/MSR.2013.6623999

  21. Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. J. Mach. Learn. Res. 10, 1801–1828 (2009)

    MathSciNet  MATH  Google Scholar 

  22. Nguyen, T.T., Nguyen, A.T., Nguyen, T.N.: Topic-based, time-aware bug assignment. SIGSOFT Softw. Eng. Notes 39(1), 1–4 (2014). https://doi.org/10.1145/2557833.2560585

    Article  Google Scholar 

  23. Ramage, D., Rosen, E., Chuang, J., Manning, C.D., McFarland, D.A.: Topic modeling for the social sciences. In: Proceedings Workshop on Applications for Topic Models: Text and Beyond, pp. 23:1–4 (2009)

    Google Scholar 

  24. Rehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA (2010)

    Google Scholar 

  25. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 487–494. AUAI Press (2004)

    Google Scholar 

  26. Sajedi-Badashian, A., Hindle, A., Stroulia, E.: Crowdsourced bug triaging, ICSME 2015, pp. 506–510. IEEE (2015). https://doi.org/10.1109/ICSM.2015.7332503

  27. Sajedi-Badashian, A., Stroulia, E.: Guidelines for evaluating bug-assignment research. J. Softw. Evol. Process 32(9) (2020). https://doi.org/10.1002/smr.2250

  28. Schofield, A., Magnusson, M., Mimno, D.: Pulling out the stops: rethinking stopword removal for topic models. In: Proceedings 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, pp. 432–436. ACL (2017). https://doi.org/10.18653/v1/E17-2069

  29. Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016). https://doi.org/10.1162/tacl_a_00099

    Article  Google Scholar 

  30. Schofield, A., Thompson, L., Mimno, D.: Quantifying the effects of text duplication on semantic models. In: Proceedings Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2737–2747. ACL (2017). https://doi.org/10.18653/v1/D17-1290

  31. Shokripour, R., Anvik, J., Kasirun, Z., Zamani, S.: Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 2–11. IEEE (2013). https://doi.org/10.1109/MSR.2013.6623997

  32. Syed, S., Spruit, M.: Exploring symmetrical and asymmetrical Dirichlet priors for latent Dirichlet allocation. Inte. J. Seman. Comput. 12(3), 399–423 (2018). https://doi.org/10.1142/S1793351X18400184

    Article  Google Scholar 

  33. Tamrawi, A., Nguyen, T.T., Al-Kofahi, J.M., Nguyen, T.N.: Fuzzy set and cache-based approach for bug triaging. In: Proceedings 19th SIGSOFT Symposium on Foundations of Software Engineering, FSE/ESEC 2011, pp. 365–375. ACM (2011). https://doi.org/10.1145/2025113.2025163

  34. Wallach, H.M.: Structured topic models for language. Ph.D. thesis, Newnham College, University of Cambridge (2008)

    Google Scholar 

  35. Wallach, H.M., Mimno, D., McCallum, A.K.: Rethinking LDA: why priors matter. In: Proceedings 22nd International Conference on Neural Information Processing Systems, NIPS 2009, pp. 1973–1981. Curran Associates, Inc. (2009)

    Google Scholar 

  36. Wu, W., Zhang, W., Yang, Y., Wang, Q.: DREX: developer recommendation with k-nearest-neighbor search and expertise ranking. In: Proceedings 18th Asia-Pacific Software Engineering Conference, APSEC 2011, pp. 389–396. IEEE (2011). https://doi.org/10.1109/APSEC.2011.15

  37. Xia, X., Lo, D., Ding, Y., Al-Kofahi, J.M., Nguyen, T.N., Wang, X.: Improving automated bug triaging with specialized topic model. Trans. Softw. Eng. 43(3), 272–297 (2016). https://doi.org/10.1109/TSE.2016.2576454

    Article  Google Scholar 

  38. Xia, X., Lo, D., Wang, X., Zhou, B.: Accurate developer recommendation for bug resolution. In: Proceedings 20th Working Conference on Reverse Engineering, WCRE 2013, pp. 72–81. IEEE (2013). https://doi.org/10.1109/WCRE.2013.6671282

  39. Xie, X., Zhang, W., Yang, Y., Wang, Q.: DRETOM: developer recommendation based on topic models for bug resolution. In: Proceedings 8th International Conference on Predictive Models in Software Engineering, PROMISE 2012, pp. 19–28. ACM (2012). https://doi.org/10.1145/2365324.2365329

  40. Yang, G., Zhang, T., Lee, B.: Towards semi-automatic bug triage and severity prediction based on topic model and multi-feature of bug reports. In: Proceedings 38th Annual Computer Software and Applications Conference, COMPSAC 2014, pp. 97–106. IEEE (2014). https://doi.org/10.1109/COMPSAC.2014.16

  41. Yao, L., Mimno, D., McCallum, A.K.: Efficient methods for topic model inference on streaming document collections. In: Proceedings SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 937–945. ACM (2009). https://doi.org/10.1145/1557019.1557121

  42. Zhang, T., Chen, J., Yang, G., Lee, B., Luo, X.: Towards more accurate severity prediction and fixer recommendation of software bugs. J. Syst. Softw. 117, 166–184 (2016). https://doi.org/10.1016/j.jss.2016.02.034

    Article  Google Scholar 

  43. Zhang, T., Yang, G., Lee, B., Lua, E.K.: A novel developer ranking algorithm for automatic bug triage using topic model and developer relations. In: Proceedings 21st Asia-Pacific Software Engineering Conference, APSEC 2014, pp. 223–230. IEEE (2014). https://doi.org/10.1109/APSEC.2014.43

  44. Zhang, W., Han, G., Wang, Q.: BUTTER: an approach to bug triage with topic modeling and heterogeneous network analysis. In: Proceedings International Conference on Cloud Computing and Big Data, CCBD 2014, pp. 62–69. IEEE (2014). https://doi.org/10.1109/CCBD.2014.14

  45. Zou, W., Lo, D., Chen, Z., Xia, X., Feng, Y., Xu, B.: How practitioners perceive automated bug report management techniques. Trans. Softw. Eng. 46(8), 836–862 (2020). https://doi.org/10.1109/TSE.2018.2870414

    Article  Google Scholar 

Download references

Acknowledgements

This work is part of the “Software-DNA” project, which is funded by the European Regional Development Fund (ERDF or EFRE in German) and the State of Brandenburg (ILB). This work is part of the KMU project “KnowhowAnalyzer” (Förderkennzeichen 01IS20088B), which is funded by the German Ministry for Education and Research (Bundesministerium für Bildung und Forschung).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Atzberger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Atzberger, D., Schneider, J., Scheibel, W., Trapp, M., Döllner, J. (2023). Evaluating Probabilistic Topic Models for Bug Triaging Tasks. In: Kaindl, H., Mannion, M., Maciaszek, L.A. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2022. Communications in Computer and Information Science, vol 1829. Springer, Cham. https://doi.org/10.1007/978-3-031-36597-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36597-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36596-6

  • Online ISBN: 978-3-031-36597-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics