Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with Covariates

Ardimento, Pasquale; Boffoli, Nicola

doi:10.1007/978-3-031-36597-3_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1829))

Included in the following conference series:

International Conference on Evaluation of Novel Approaches to Software Engineering

241 Accesses
1 Citations

Abstract

The expected bug-fixing resolution time is one of the most important factors in bug triage, as an accurate prediction of bug-fixing times of newly submitted bugs helps to support both resource allocation and the triage process. Our approach treats the problem of bug-fix time estimation as a text categorization problem. To address this problem, we used Latent Dirichlet Allocation (LDA) model, a hierarchical statistical model based on what are called topics. Formally, a topic is a probability distribution over terms in a vocabulary. Such topic models provide useful descriptive statistics for a collection, which facilitates tasks like classification. Here we build a classification model on latent Dirichlet allocation (LDA). In LDA, we treat the topic proportions for a bug report as a draw from a Dirichlet distribution. We obtain the words in the bug report by repeatedly choosing a topic assignment from those proportions, then drawing a word from the corresponding topic. In supervised latent Dirichlet allocation (SLDA), we add to LDA a response variable associated with each document. Finally, we consider the supervised latent Dirichlet allocation with covariates (SLDAX) model, a generalization of SLDA, that incorporates manifest variables and latent topics as predictors of an outcome. We evaluated the proposed approach on a large dataset, composed of data gathered from defect tracking systems of five well-known open-source systems. Results show that SLDAX provides a better recall than those provided by topic models LDA-based.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alenezi, M., Banitaan, S., Zarour, M.: Using categorical features in mining bug tracking systems to assign bug reports. arXiv preprint arxiv:1804.07803 (2018)
Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Osterweil, L.J., Rombach, H.D., Soffa, M.L. (eds.) 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, 20–28 May 2006, pp. 361–370. ACM (2006). https://doi.org/10.1145/1134285.1134336
Ardimento., P., Boffoli., N.: A supervised generative topic model to predict bug-fixing time on open source software projects. In: Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering - ENASE, pp. 233–240. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011113100003176
Ardimento, P., Dinapoli, A.: Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS 2017). Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3102254.3102275
Ardimento, P., Mele, C.: Using BERT to predict bug-fixing time. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, Bari, Italy, 27–29 May 2020, pp. 1–7. IEEE (2020). https://doi.org/10.1109/EAIS48028.2020.9122781
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: A review for statisticians. arXiv preprint arxiv:1601.00670 (2016)
Blei, D.M., McAuliffe, J.D.: Supervised topic models. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 3–6 December 2007, pp. 121–128. Curran Associates, Inc. (2007). https://proceedings.neurips.cc//paper/2007/hash/d56b9fc4b0f1be8871f5e1c40c0067e7-Abstract.html
Blei, D.M., McAuliffe, J.D.: Supervised Topic Models (2010)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://jmlr.org/papers/v3/blei03a.html
Du, J., Ren, X., Li, H., Jiang, F., Yu, X.: Prediction of bug-fixing time based on distinguishable sequences fusion in open source software. J. Softw.: Evol. Process (2022). https://doi.org/10.1002/smr.2443
Article Google Scholar
Du, J., Ren, X., Li, H., Jiang, F., Yu, X.: Prediction of bug-fixing time based on distinguishable sequences fusion in open source software. J. Softw.: Evol. Process e2443 (2022). https://doi.org/10.1002/smr.2443
Eclipse: Bugzilla installation for eclipse project (2022). https://bugs.eclipse.org/bugs/. Accessed 7 Sept 2022
Giger, E., Pinzger, M., Gall, H.C.: Predicting the fix time of bugs. In: Holmes, R., Robillard, M.P., Walker, R.J., Zimmermann, T. (eds.) Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE 2010, Cape Town, South Africa, 4 May 2010, pp. 52–56. ACM (2010). https://doi.org/10.1145/1808920.1808933
Hamdy, A., El-Laithy, A.R.: Semantic categorization of software bug repositories for severity assignment automation. In: Jarzabek, S., Poniszewska-Marańda, A., Madeyski, L. (eds.) Integrating Research and Practice in Software Engineering. SCI, vol. 851, pp. 15–30. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26574-8_2
Kim, S., Whitehead, E.J.: How long did it take to fix bugs? In: Proceedings of the 2006 International Workshop on Mining Software Repositories (MSR 2006), pp. 173–174. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1137983.1138027
LibreOffice. Bugzilla installation for livecode project (2022). https://quality.livecode.com/. Accessed 7 Sept. 2022
Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)
Google Scholar
Meng, D., et al.: Bran: Reduce vulnerability search space in large open source repositories by learning bug symptoms. In: Cao, J., Au, M.H., Lin, Z., Yung, M. (eds.) ASIA CCS ’21: ACM Asia Conference on Computer and Communications Security, Virtual Event, Hong Kong, 7–11 June 2021. pp. 731–743. ACM (2021). https://doi.org/10.1145/3433210.3453115
Mogotsi, I.C.: Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval, vol. 482, pp. 192–195. Cambridge University Press, Cambridge (2008). ISBN: 978-0-521-86571-5. Inf. Retr. 13(2) (2010). https://doi.org/10.1007/s10791-009-9115-y
Mohsin, H., Shi, C.: SPBC: A self-paced learning model for bug classification from historical repositories of open-source software. Expert Syst. Appl. 167, 113808 (2021). https://doi.org/10.1016/j.eswa.2020.113808
Mozilla: Bugzilla installation for mozilla project (2022). https://bugzilla.mozilla.org/home. Accessed 7 Sept. 2022
NetBeans: Bugzilla installation for netbeans project (2022). https://bz.apache.org/netbeans/. Accessed 7 Sept. 2022
Novell: Bugzilla installation for novell project (2022). https://bugzilla.novell.com/index.cgi. Accessed 7 Sept. 2022
Panjer, L.D.: Predicting eclipse bug lifetimes. In: Proceedings of the Fourth International Workshop on Mining Software Repositories (MSR 2007), p. 29. IEEE Computer Society, USA (2007). https://doi.org/10.1109/MSR.2007.25
RProject: The r project for statistical computing (2022). https://www.r-project.org/. Accessed 7 Sept. 2022
Silva, C.C., Galster, M., Gilson, F.: Topic modeling in software engineering research. Empir. Softw. Eng. 26(6), 1–62 (2021)
Article Google Scholar
Sontag, D.A., Roy, D.M.: Complexity of inference in latent dirichlet allocation. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain. pp. 1008–1016 (2011). https://proceedings.neurips.cc/paper/2011/hash/3871bd64012152bfb53fdf04b401193f-Abstract.html
Sun, X., Zhou, T., Li, G., Hu, J., Yang, H., Li, B.: An empirical study on real bugs for machine learning programs. In: Lv, J., Zhang, H.J., Hinchey, M., Liu, X. (eds.) 24th Asia-Pacific Software Engineering Conference, APSEC 2017, Nanjing, China, 4–8 December 2017, pp. 348–357. IEEE Computer Society (2017). https://doi.org/10.1109/APSEC.2017.41
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.M.: Evaluation methods for topic models. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 14–18 June, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 1105–1112. ACM (2009). https://doi.org/10.1145/1553374.1553515
Wilcox, K.: psychtm: Text mining methods for psychological research (2022). https://cran.r-project.org/web/packages/psychtm/. Accessed 7 Sept. 2022
Wilcox, K., Jacobucci, R., Zhang, Z., Ammerman, B.: Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates. PsyArXiv (2021). https://doi.org/10.31234/osf.io/62tc3
Zhang, H., Gong, L., Versteeg, S.: Predicting bug-fixing time: an empirical study of commercial software projects. In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) 35th International Conference on Software Engineering, ICSE ’13, San Francisco, 18–26 May 2013. pp. 1042–1051. IEEE Computer Society (2013). https://doi.org/10.1109/ICSE.2013.6606654

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Bari Aldo Moro, Via Orabona 4, Bari, Italy
Pasquale Ardimento & Nicola Boffoli

Authors

Pasquale Ardimento
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Boffoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pasquale Ardimento .

Editor information

Editors and Affiliations

TU Wien, Vienna, Austria
Hermann Kaindl
Glasgow Caledonian University, Glasgow, UK
Mike Mannion
Wroclaw University of Economics, Wroclaw, Poland
Leszek A. Maciaszek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ardimento, P., Boffoli, N. (2023). Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with Covariates. In: Kaindl, H., Mannion, M., Maciaszek, L.A. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2022. Communications in Computer and Information Science, vol 1829. Springer, Cham. https://doi.org/10.1007/978-3-031-36597-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-36597-3_7
Published: 08 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36596-6
Online ISBN: 978-3-031-36597-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with Covariates