Skip to main content

Advertisement

Log in

Tab: template-aware bug report title generation via two-phase fine-tuned models

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Bug reports play a critical role in the software development lifecycle by helping developers identify and resolve defects efficiently. However, the quality of bug report titles, particularly in open-source communities, can vary significantly, which complicates the bug triage and resolution processes. Existing approaches, such as iTAPE, treat title generation as a one-sentence summarization task using sequence-to-sequence models. While these methods show promise, they face two major limitations: (1) they do not consider the distinct components of bug reports, treating the entire report as a homogeneous input, and (2) they struggle to handle the variability between template-based and non-template-based reports, often resulting in suboptimal titles. To address these limitations, we propose TAB, a hybrid framework that combines a Document Component Analyzer based on a pre-trained BERT model and a Title Generation Model based on CodeT5. TAB addresses the first limitation by segmenting bug reports into four components-Description, Reproduction, Expected Behavior, and Others-to ensure better alignment between input and output. For the second limitation, TAB uses a divergent approach: for template-based reports, titles are generated directly, while for non-template reports, DCA extracts key components to improve title relevance and clarity. We evaluate TAB on both template-based and non-template-based bug reports, demonstrating that it significantly outperforms existing methods. Specifically, TAB achieves average improvements of 170.4–389.5% in METEOR, 67.8–190.0% in ROUGE-L, and 65.7–124.5% in chrF(AF) compared to baseline approaches on template-based reports. Additionally, on non-template-based reports, TAB shows an average improvement of 64% in METEOR, 3.6% in ROUGE-L, and 14.8% in chrF(AF) over the state-of-the-art. These results confirm the robustness of TAB in generating high-quality titles across diverse bug report formats.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

  1. https://github.com/Maluuba/nlg-eval.

  2. https://pypi.org/project/rouge/

  3. https://github.com/m-popovic/chrF.

References

  • Abebe, S.L., Ali, N., Hassan, A.E.: An empirical study of software release notes. Empir. Softw. Eng. 21(3), 1107–1142 (2016)

    MATH  Google Scholar 

  • Anonymous.: (2024). https://anonymous.4open.science/r/TAB-7E70/

  • Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  • Bettenburg, N., Just, S., Schröter, A., Weiss, C., Premraj, R., Zimmermann, T.: What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 308–318 (2008)

  • Bhattacharya, P., Ulanova, L., Neamtiu, I., Koduru, S.C.: An empirical analysis of bug reports and bug fixing in open source android apps. In: 2013 17th European Conference on Software Maintenance and Reengineering, pp. 133–143 (2013). https://doi.org/10.1109/CSMR.2013.23

  • Chaparro, O., Lu, J., Zampetti, F., Moreno, L., Di Penta, M., Marcus, A., Bavota, G., Ng, V.: Detecting missing information in bug descriptions. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 396–407 (2017)

  • Chaparro, O., Bernal-Cárdenas, C., Lu, J., Moran, K., Marcus, A., Di Penta, M., Poshyvanyk, D., Ng, V.: Assessing the quality of the steps to reproduce in bug reports. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 86–96 (2019)

  • Chaparro, O., Plorez, J.M., Singh, U., Marcus, A.: Reformulating queries for duplicate bug report detection. In: In Proceedings of The26th International Conference on Software Analysis,Evolution and Reengineering, pp. 218–229, IEEE (2019)

  • Chaparro, O., Plorez, J.M., Singh, U., Marcus, A.: Deeptriage:explor-ing the effectiveness of deep learning for bug triaging. In: In Proceedings of the Indiajoint International Conference on Data Science and Management of Data, pp. 171–179, Association for Computing Machinery (2019)

  • Chen, S., Xie, X., Yin, B., Ji, Y., Chen, L., Xu, B.: Stay professional and efficient: automatically generate titles for your bug reports. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 385–397, IEEE (2020)

  • Davies, S., Roper, M.: What’s in a bug report? In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–10 (2014)

  • Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  • Devlin, M.C., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186, Association for Computational Linguistics (2019)

  • Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971)

    MATH  Google Scholar 

  • Guo, S.L., N. Duan, Y.W., M. Zhou, J.Y.: Unixcoder: Unified cross-modal pre-training for code representation. In: in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, S. Muresan, P. Nakov, and A. Villavicencio, Eds, pp. 7212–7225, Association for Computational Linguistics (2022)

  • Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: Reverse Engineering (WCRE), 2010 17th Working Conference On, pp. 35–44, IEEE (2010)

  • Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25, 2179 (2019)

    MATH  Google Scholar 

  • Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, pp. 200–210 (2018)

  • Huang, Q., Xia, X., Lo, D., Murphy, G.C.: Automating intention mining. IEEE Trans. Softw. Eng. 46(10), 1098–1119 (2018)

    Google Scholar 

  • Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2073–2083 (2016)

  • Jiang, H., Zhang, J., Ma, H., Nazar, N., Ren, Z.: Mining authorship characteristics in bug repositories. Sci. China Inf. Sci. 60(1), 1–16 (2017)

    MATH  Google Scholar 

  • Karim, M.R., Ihara, A., Yang, X., Iida, H., Matsumoto, K.: Understanding key features of high-impact bug reports. In: 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP), pp. 53–58, IEEE (2017)

  • Ko, A.J., Chilana, P.K.: Design, discussion, and dissent in open bug reports. In: Proceedings of the 2011 IConference. iConference ’11, pp. 106–113. Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1940761.1940776

  • Ko, A.J., Myers, B.A., Chau, D.H.: A linguistic analysis of how people describe software problems. In: Visual Languages and Human-Centric Computing (VL/HCC’06), pp. 127–134, IEEE (2006)

  • Lavie, A., Agarwal, A.: Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation. StatMT ’07, pp. 228–231. Association for Computational Linguistics, USA (2007)

  • Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)

    MATH  Google Scholar 

  • Li, H., Yan, M., Sun, W., Liu, X., Wu, Y.: A first look at bug report templates on GitHub. J. Syst. Softw. 202, 111709 (2023)

    Google Scholar 

  • Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004) (2004)

  • Liu, P., Fu, J., Hayashi, H., et al.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)

    MATH  Google Scholar 

  • Liu, Z., Xia, X., Hassan, A.E., Lo, D., Xing, Z., Wang, X.: Neural-machine-translation-based commit message generation: how far are we? In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 373–384 (2018)

  • Liu, Q., Liu, Z., Zhu, H., Fan, H., Du, B., Qian, Y.: Generating commit messages from diffs using pointer-generator network. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 299–309, IEEE (2019)

  • Lotufo, R., Malik, Z., Czarnecki, K.: Modelling the hurried bug report reading process to summarize bug reports. Empir. Softw. Eng. 20(2), 516–548 (2015)

    Google Scholar 

  • Mani, S., Catherine, R., Sinha, V.S., Dubey, A.: Ausum: approach for unsupervised bug report summarization. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp. 1–11 (2012)

  • McBurney, P.W., McMillan, C.: Automatic source code summarization of context for java methods. IEEE Trans. Softw. Eng. 42(2), 103–119 (2016)

    MATH  Google Scholar 

  • McBurney, P.W., McMillan, C.: Automatic documentation generation via source code summarization of method context. In: Proceedings of the 22nd International Conference on Program Comprehension, pp. 279–290, ACM (2014)

  • Mills, C., Pantiuchina, J., Parra, E., Bavota, G., Haiduc, S.: Are bug reports enough for text retrieval-based bug localiza-tion? In: In Proceedings of the International Conference on Software Maintenance and Evolution, pp. 381–392, IEEE (2018)

  • Moreno, L., Bavota, G., Di Penta, M., Oliveto, R., Marcus, A., Canfora, G.: Arena: an approach for the automated generation of release notes. IEEE Trans. Softw. Eng. 43(2), 106–127 (2016)

    Google Scholar 

  • Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L., Vijay-Shanker, K.: Automatic generation of natural language summaries for java classes. In: Program Comprehension (ICPC), 2013 IEEE 21st International Conference On, pp. 23–32, IEEE (2013)

  • Moreno, L., Bavota, G., Di Penta, M., Oliveto, R., Marcus, A., Canfora, G.: Automatic generation of release notes. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 484–495, ACM (2014)

  • Nijkamp, E., Pang, B., Hayashi, L. H. Tu, Wang, H., Zhou, Y., Savarese, S., Xiong, C.: Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022)

  • Popović, M.: chrf: character n-gram f-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395 (2015)

  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2023). https://arxiv.org/abs/1910.10683

  • Rastkar, S., Murphy, G.C., Murray, G.: Automatic summarization of bug reports. IEEE Trans. Softw. Eng. 40(4), 366–380 (2014)

    MATH  Google Scholar 

  • Rastkar, S., Murphy, G.C., Murray, G.: Summarizing software artifacts: a case study of bug reports. In: 2010 ACM/IEEE 32nd International Conference on Software Engineering, vol. 1, pp. 505–514, IEEE (2010)

  • Roy, D., Fakhoury, S., Arnaoudova, V.: Reassessing automatic evaluation metrics for code summarization tasks. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1105–1116 (2021)

  • Ruan, H., Chen, B., Peng, X., Zhao, W.: Deeplink: re-covering issue-commit links based on deep learning. J. Syst. Softw. 158, 110406 (2019)

    Google Scholar 

  • Sahoo, S.K., Criswell, J., Adve, V.: An empirical study of reported bugs in server software with implications for automated bug diagnosis. In: 2010 ACM/IEEE 32nd International Conference on Software Engineering, vol. 1, pp. 485–494 (2010). https://doi.org/10.1145/1806799.1806870

  • Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)

  • Sharma, S., El Asri, L., Schulz, H., Zumer, J.: Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. CoRR abs/1706.09799 (2017)

  • Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., Vijay-Shanker, K.: Towards automatically generating summary comments for java methods. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 43–52, ACM (2010)

  • Sun, Y., Wang, Q., Yang, Y.: Frlink: improving the recovery of miss-ing issue-commit links by revisiting file relevance. Inf. Sofiw. Technol. 84, 33–47 (2017)

    MATH  Google Scholar 

  • Sureka, A., Indukuri, K.V.: Linguistic analysis of bugreport titles with respect to the dimension of bug importance. In: In Proceedings of the 3rd Annual Bangalore Conference, pp. 1–6, Association for Computing Machinery (2010)

  • Tabassum, J., Maddela, M., Xu, W., Ritter, A.: Code and named entity recognition in stackoverflow. arXiv preprint arXiv:2005.01634 (2020)

  • Tian, Y., Sun, C., Lo, D.: Improved duplicate bug re-port identification. In: In Proceedings of the 16th European Conference on SofiwareMaintenance and Reengineering, pp. 385–390, IEEE (2012)

  • Wan, Y., Zhao, Z., Yang, M., Xu, G., Ying, H., Wu, J., Yu, P.S.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 397–407 (2018)

  • Wang, M.W., Y. Liu, Y.W., Shenyang, R.W.: Understanding and facilitating the co-evolution of production and test code. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 272–283, IEEE (2021)

  • Wang, J., Zhang, H.: Predicting defect numbers based on defect state transition models. In: Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 191–200 (2012). https://doi.org/10.1145/2372251.2372287

  • Wei, B.: Retrieve and refine: Exemplar-based neural comment generation. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1250–1252 (2019). https://doi.org/10.1109/ASE.2019.00152

  • Wong, E., Yang, J., Tan, L.: Autocomment: Mining question and answer sites for automatic comment generation. In: Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference On, pp. 562–567, IEEE (2013)

  • Xuan, J., Jiang, H., Ren, Z., Zou, W.: Developer prioritization in bug repositories. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 25–35 (2012). https://doi.org/10.1109/ICSE.2012.6227209

  • Zhang, T., Chen, J., Luo, X., Li, T.: Bug reports for desktop software and mobile apps in GitHub: What’s the difference? IEEE Softw. 36(1), 63–71 (2017)

    MATH  Google Scholar 

  • Zhang, J., Wang, X., Zhang, H., Sun, H., Liu, X.: Retrieval-based neural source code summarization. In: Proceedings of the 42nd International Conference on Software Engineering (2020)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 62372071), the Scientific and Technological Research Program of Chongqing Municipal Education Commission (No. KJQN202300547), the Chongqing Municipal Construction Science and Technology Plan Project (Chengke Zi 2024 No. 8-7), the State Key Laboratory of Intelligent Vehicle Safety Technology (No. IVSTSKL-202412) and the Natural Science Foundation of Chongqing (No. CSTB2023NSCQ-MSX0914).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weifeng Sun or Meng Yan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Xu, Y., Sun, W. et al. Tab: template-aware bug report title generation via two-phase fine-tuned models. Autom Softw Eng 32, 32 (2025). https://doi.org/10.1007/s10515-025-00505-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-025-00505-9

Keywords