Skip to main content

Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with Covariates

  • Conference paper
  • First Online:
Evaluation of Novel Approaches to Software Engineering (ENASE 2022)

Abstract

The expected bug-fixing resolution time is one of the most important factors in bug triage, as an accurate prediction of bug-fixing times of newly submitted bugs helps to support both resource allocation and the triage process. Our approach treats the problem of bug-fix time estimation as a text categorization problem. To address this problem, we used Latent Dirichlet Allocation (LDA) model, a hierarchical statistical model based on what are called topics. Formally, a topic is a probability distribution over terms in a vocabulary. Such topic models provide useful descriptive statistics for a collection, which facilitates tasks like classification. Here we build a classification model on latent Dirichlet allocation (LDA). In LDA, we treat the topic proportions for a bug report as a draw from a Dirichlet distribution. We obtain the words in the bug report by repeatedly choosing a topic assignment from those proportions, then drawing a word from the corresponding topic. In supervised latent Dirichlet allocation (SLDA), we add to LDA a response variable associated with each document. Finally, we consider the supervised latent Dirichlet allocation with covariates (SLDAX) model, a generalization of SLDA, that incorporates manifest variables and latent topics as predictors of an outcome. We evaluated the proposed approach on a large dataset, composed of data gathered from defect tracking systems of five well-known open-source systems. Results show that SLDAX provides a better recall than those provided by topic models LDA-based.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alenezi, M., Banitaan, S., Zarour, M.: Using categorical features in mining bug tracking systems to assign bug reports. arXiv preprint arxiv:1804.07803 (2018)

  2. Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Osterweil, L.J., Rombach, H.D., Soffa, M.L. (eds.) 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, 20–28 May 2006, pp. 361–370. ACM (2006). https://doi.org/10.1145/1134285.1134336

  3. Ardimento., P., Boffoli., N.: A supervised generative topic model to predict bug-fixing time on open source software projects. In: Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - ENASE, pp. 233–240. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011113100003176

  4. Ardimento, P., Dinapoli, A.: Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS 2017). Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3102254.3102275

  5. Ardimento, P., Mele, C.: Using BERT to predict bug-fixing time. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, Bari, Italy, 27–29 May 2020, pp. 1–7. IEEE (2020). https://doi.org/10.1109/EAIS48028.2020.9122781

  6. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826

  7. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: A review for statisticians. arXiv preprint arxiv:1601.00670 (2016)

  8. Blei, D.M., McAuliffe, J.D.: Supervised topic models. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 3–6 December 2007, pp. 121–128. Curran Associates, Inc. (2007). https://proceedings.neurips.cc//paper/2007/hash/d56b9fc4b0f1be8871f5e1c40c0067e7-Abstract.html

  9. Blei, D.M., McAuliffe, J.D.: Supervised Topic Models (2010)

    Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://jmlr.org/papers/v3/blei03a.html

  11. Du, J., Ren, X., Li, H., Jiang, F., Yu, X.: Prediction of bug-fixing time based on distinguishable sequences fusion in open source software. J. Softw.: Evol. Process (2022). https://doi.org/10.1002/smr.2443

    Article  Google Scholar 

  12. Du, J., Ren, X., Li, H., Jiang, F., Yu, X.: Prediction of bug-fixing time based on distinguishable sequences fusion in open source software. J. Softw.: Evol. Process e2443 (2022). https://doi.org/10.1002/smr.2443

  13. Eclipse: Bugzilla installation for eclipse project (2022). https://bugs.eclipse.org/bugs/. Accessed 7 Sept 2022

  14. Giger, E., Pinzger, M., Gall, H.C.: Predicting the fix time of bugs. In: Holmes, R., Robillard, M.P., Walker, R.J., Zimmermann, T. (eds.) Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE 2010, Cape Town, South Africa, 4 May 2010, pp. 52–56. ACM (2010). https://doi.org/10.1145/1808920.1808933

  15. Hamdy, A., El-Laithy, A.R.: Semantic categorization of software bug repositories for severity assignment automation. In: Jarzabek, S., Poniszewska-Marańda, A., Madeyski, L. (eds.) Integrating Research and Practice in Software Engineering. SCI, vol. 851, pp. 15–30. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26574-8_2

  16. Kim, S., Whitehead, E.J.: How long did it take to fix bugs? In: Proceedings of the 2006 International Workshop on Mining Software Repositories (MSR 2006), pp. 173–174. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1137983.1138027

  17. LibreOffice. Bugzilla installation for livecode project (2022). https://quality.livecode.com/. Accessed 7 Sept. 2022

  18. Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)

    Google Scholar 

  19. Meng, D., et al.: Bran: Reduce vulnerability search space in large open source repositories by learning bug symptoms. In: Cao, J., Au, M.H., Lin, Z., Yung, M. (eds.) ASIA CCS ’21: ACM Asia Conference on Computer and Communications Security, Virtual Event, Hong Kong, 7–11 June 2021. pp. 731–743. ACM (2021). https://doi.org/10.1145/3433210.3453115

  20. Mogotsi, I.C.: Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval, vol. 482, pp. 192–195. Cambridge University Press, Cambridge (2008). ISBN: 978-0-521-86571-5. Inf. Retr. 13(2) (2010). https://doi.org/10.1007/s10791-009-9115-y

  21. Mohsin, H., Shi, C.: SPBC: A self-paced learning model for bug classification from historical repositories of open-source software. Expert Syst. Appl. 167, 113808 (2021). https://doi.org/10.1016/j.eswa.2020.113808

  22. Mozilla: Bugzilla installation for mozilla project (2022). https://bugzilla.mozilla.org/home. Accessed 7 Sept. 2022

  23. NetBeans: Bugzilla installation for netbeans project (2022). https://bz.apache.org/netbeans/. Accessed 7 Sept. 2022

  24. Novell: Bugzilla installation for novell project (2022). https://bugzilla.novell.com/index.cgi. Accessed 7 Sept. 2022

  25. Panjer, L.D.: Predicting eclipse bug lifetimes. In: Proceedings of the Fourth International Workshop on Mining Software Repositories (MSR 2007), p. 29. IEEE Computer Society, USA (2007). https://doi.org/10.1109/MSR.2007.25

  26. RProject: The r project for statistical computing (2022). https://www.r-project.org/. Accessed 7 Sept. 2022

  27. Silva, C.C., Galster, M., Gilson, F.: Topic modeling in software engineering research. Empir. Softw. Eng. 26(6), 1–62 (2021)

    Article  Google Scholar 

  28. Sontag, D.A., Roy, D.M.: Complexity of inference in latent dirichlet allocation. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain. pp. 1008–1016 (2011). https://proceedings.neurips.cc/paper/2011/hash/3871bd64012152bfb53fdf04b401193f-Abstract.html

  29. Sun, X., Zhou, T., Li, G., Hu, J., Yang, H., Li, B.: An empirical study on real bugs for machine learning programs. In: Lv, J., Zhang, H.J., Hinchey, M., Liu, X. (eds.) 24th Asia-Pacific Software Engineering Conference, APSEC 2017, Nanjing, China, 4–8 December 2017, pp. 348–357. IEEE Computer Society (2017). https://doi.org/10.1109/APSEC.2017.41

  30. Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.M.: Evaluation methods for topic models. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 14–18 June, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 1105–1112. ACM (2009). https://doi.org/10.1145/1553374.1553515

  31. Wilcox, K.: psychtm: Text mining methods for psychological research (2022). https://cran.r-project.org/web/packages/psychtm/. Accessed 7 Sept. 2022

  32. Wilcox, K., Jacobucci, R., Zhang, Z., Ammerman, B.: Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates. PsyArXiv (2021). https://doi.org/10.31234/osf.io/62tc3

  33. Zhang, H., Gong, L., Versteeg, S.: Predicting bug-fixing time: an empirical study of commercial software projects. In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) 35th International Conference on Software Engineering, ICSE ’13, San Francisco, 18–26 May 2013. pp. 1042–1051. IEEE Computer Society (2013). https://doi.org/10.1109/ICSE.2013.6606654

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pasquale Ardimento .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ardimento, P., Boffoli, N. (2023). Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with Covariates. In: Kaindl, H., Mannion, M., Maciaszek, L.A. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2022. Communications in Computer and Information Science, vol 1829. Springer, Cham. https://doi.org/10.1007/978-3-031-36597-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36597-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36596-6

  • Online ISBN: 978-3-031-36597-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics