Abstract
The expected bug-fixing resolution time is one of the most important factors in bug triage, as an accurate prediction of bug-fixing times of newly submitted bugs helps to support both resource allocation and the triage process. Our approach treats the problem of bug-fix time estimation as a text categorization problem. To address this problem, we used Latent Dirichlet Allocation (LDA) model, a hierarchical statistical model based on what are called topics. Formally, a topic is a probability distribution over terms in a vocabulary. Such topic models provide useful descriptive statistics for a collection, which facilitates tasks like classification. Here we build a classification model on latent Dirichlet allocation (LDA). In LDA, we treat the topic proportions for a bug report as a draw from a Dirichlet distribution. We obtain the words in the bug report by repeatedly choosing a topic assignment from those proportions, then drawing a word from the corresponding topic. In supervised latent Dirichlet allocation (SLDA), we add to LDA a response variable associated with each document. Finally, we consider the supervised latent Dirichlet allocation with covariates (SLDAX) model, a generalization of SLDA, that incorporates manifest variables and latent topics as predictors of an outcome. We evaluated the proposed approach on a large dataset, composed of data gathered from defect tracking systems of five well-known open-source systems. Results show that SLDAX provides a better recall than those provided by topic models LDA-based.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alenezi, M., Banitaan, S., Zarour, M.: Using categorical features in mining bug tracking systems to assign bug reports. arXiv preprint arxiv:1804.07803 (2018)
Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Osterweil, L.J., Rombach, H.D., Soffa, M.L. (eds.) 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, 20–28 May 2006, pp. 361–370. ACM (2006). https://doi.org/10.1145/1134285.1134336
Ardimento., P., Boffoli., N.: A supervised generative topic model to predict bug-fixing time on open source software projects. In: Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - ENASE, pp. 233–240. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011113100003176
Ardimento, P., Dinapoli, A.: Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS 2017). Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3102254.3102275
Ardimento, P., Mele, C.: Using BERT to predict bug-fixing time. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, Bari, Italy, 27–29 May 2020, pp. 1–7. IEEE (2020). https://doi.org/10.1109/EAIS48028.2020.9122781
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: A review for statisticians. arXiv preprint arxiv:1601.00670 (2016)
Blei, D.M., McAuliffe, J.D.: Supervised topic models. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 3–6 December 2007, pp. 121–128. Curran Associates, Inc. (2007). https://proceedings.neurips.cc//paper/2007/hash/d56b9fc4b0f1be8871f5e1c40c0067e7-Abstract.html
Blei, D.M., McAuliffe, J.D.: Supervised Topic Models (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://jmlr.org/papers/v3/blei03a.html
Du, J., Ren, X., Li, H., Jiang, F., Yu, X.: Prediction of bug-fixing time based on distinguishable sequences fusion in open source software. J. Softw.: Evol. Process (2022). https://doi.org/10.1002/smr.2443
Du, J., Ren, X., Li, H., Jiang, F., Yu, X.: Prediction of bug-fixing time based on distinguishable sequences fusion in open source software. J. Softw.: Evol. Process e2443 (2022). https://doi.org/10.1002/smr.2443
Eclipse: Bugzilla installation for eclipse project (2022). https://bugs.eclipse.org/bugs/. Accessed 7 Sept 2022
Giger, E., Pinzger, M., Gall, H.C.: Predicting the fix time of bugs. In: Holmes, R., Robillard, M.P., Walker, R.J., Zimmermann, T. (eds.) Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE 2010, Cape Town, South Africa, 4 May 2010, pp. 52–56. ACM (2010). https://doi.org/10.1145/1808920.1808933
Hamdy, A., El-Laithy, A.R.: Semantic categorization of software bug repositories for severity assignment automation. In: Jarzabek, S., Poniszewska-Marańda, A., Madeyski, L. (eds.) Integrating Research and Practice in Software Engineering. SCI, vol. 851, pp. 15–30. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26574-8_2
Kim, S., Whitehead, E.J.: How long did it take to fix bugs? In: Proceedings of the 2006 International Workshop on Mining Software Repositories (MSR 2006), pp. 173–174. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1137983.1138027
LibreOffice. Bugzilla installation for livecode project (2022). https://quality.livecode.com/. Accessed 7 Sept. 2022
Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)
Meng, D., et al.: Bran: Reduce vulnerability search space in large open source repositories by learning bug symptoms. In: Cao, J., Au, M.H., Lin, Z., Yung, M. (eds.) ASIA CCS ’21: ACM Asia Conference on Computer and Communications Security, Virtual Event, Hong Kong, 7–11 June 2021. pp. 731–743. ACM (2021). https://doi.org/10.1145/3433210.3453115
Mogotsi, I.C.: Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval, vol. 482, pp. 192–195. Cambridge University Press, Cambridge (2008). ISBN: 978-0-521-86571-5. Inf. Retr. 13(2) (2010). https://doi.org/10.1007/s10791-009-9115-y
Mohsin, H., Shi, C.: SPBC: A self-paced learning model for bug classification from historical repositories of open-source software. Expert Syst. Appl. 167, 113808 (2021). https://doi.org/10.1016/j.eswa.2020.113808
Mozilla: Bugzilla installation for mozilla project (2022). https://bugzilla.mozilla.org/home. Accessed 7 Sept. 2022
NetBeans: Bugzilla installation for netbeans project (2022). https://bz.apache.org/netbeans/. Accessed 7 Sept. 2022
Novell: Bugzilla installation for novell project (2022). https://bugzilla.novell.com/index.cgi. Accessed 7 Sept. 2022
Panjer, L.D.: Predicting eclipse bug lifetimes. In: Proceedings of the Fourth International Workshop on Mining Software Repositories (MSR 2007), p. 29. IEEE Computer Society, USA (2007). https://doi.org/10.1109/MSR.2007.25
RProject: The r project for statistical computing (2022). https://www.r-project.org/. Accessed 7 Sept. 2022
Silva, C.C., Galster, M., Gilson, F.: Topic modeling in software engineering research. Empir. Softw. Eng. 26(6), 1–62 (2021)
Sontag, D.A., Roy, D.M.: Complexity of inference in latent dirichlet allocation. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain. pp. 1008–1016 (2011). https://proceedings.neurips.cc/paper/2011/hash/3871bd64012152bfb53fdf04b401193f-Abstract.html
Sun, X., Zhou, T., Li, G., Hu, J., Yang, H., Li, B.: An empirical study on real bugs for machine learning programs. In: Lv, J., Zhang, H.J., Hinchey, M., Liu, X. (eds.) 24th Asia-Pacific Software Engineering Conference, APSEC 2017, Nanjing, China, 4–8 December 2017, pp. 348–357. IEEE Computer Society (2017). https://doi.org/10.1109/APSEC.2017.41
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.M.: Evaluation methods for topic models. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 14–18 June, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 1105–1112. ACM (2009). https://doi.org/10.1145/1553374.1553515
Wilcox, K.: psychtm: Text mining methods for psychological research (2022). https://cran.r-project.org/web/packages/psychtm/. Accessed 7 Sept. 2022
Wilcox, K., Jacobucci, R., Zhang, Z., Ammerman, B.: Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates. PsyArXiv (2021). https://doi.org/10.31234/osf.io/62tc3
Zhang, H., Gong, L., Versteeg, S.: Predicting bug-fixing time: an empirical study of commercial software projects. In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) 35th International Conference on Software Engineering, ICSE ’13, San Francisco, 18–26 May 2013. pp. 1042–1051. IEEE Computer Society (2013). https://doi.org/10.1109/ICSE.2013.6606654
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ardimento, P., Boffoli, N. (2023). Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with Covariates. In: Kaindl, H., Mannion, M., Maciaszek, L.A. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2022. Communications in Computer and Information Science, vol 1829. Springer, Cham. https://doi.org/10.1007/978-3-031-36597-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-36597-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36596-6
Online ISBN: 978-3-031-36597-3
eBook Packages: Computer ScienceComputer Science (R0)