Improving the quality of software issue report descriptions in Turkish: An industrial case study at Softtech

Aktas, Ethem Utku; Cakmak, Ebru; Inan, Mete Cihad; Yilmaz, Cemal

doi:10.1007/s10664-023-10434-4

Improving the quality of software issue report descriptions in Turkish: An industrial case study at Softtech

Published: 12 February 2024

Volume 29, article number 43, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

221 Accesses
Explore all metrics

Abstract

Issue reports are an important part of the software development process. They help developers identify and fix problems in their code. However, problems described in these reports often lack important information, such as the Observed Behavior (OB), Expected Behavior (EB), and Steps to Reproduce (S2R). This can lead to valuable developer time being wasted on gathering the relevant information. This study aims to address this issue by developing a tool that guides reporters in providing the necessary information in an industrial setting. The study is conducted at Softtech, a software subsidiary of the largest private bank in Turkey. The proposed approach is developed for issue reports written specifically in Turkish language. It is motivated by the need for issue report classification tools that can handle the unique characteristics of the Turkish language, such as the presence of many compound words. We first manually analyze and label 1, 041 issue reports for the existence of OB, S2R, and EB, and then present the specific patterns we found describing the related information. Next, we use morphological analysis to extract keywords and suffixes, and then use them for classification with a machine learning based approach. In addition, we conduct a feasibility study to assess the potential of using large language models for issue report classification tasks as a direction for future research. The results indicate that the tool using the machine learning-based approach can be used to guide in improving the quality of issue reports at Softtech, thereby saving valuable developer time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting non-natural language artifacts for de-noising bug reports

Article Open access 24 August 2022

Predicting the objective and priority of issue reports in software repositories

Article 01 February 2022

A multi-model framework for semantically enhancing detection of quality-related bug report descriptions

Article 11 February 2023

Data Availability

Due to commercial and legal restrictions, supporting data is not available.

Notes

References

Akin AA, Akin MD (2007) Zemberek, an open source nlp framework for turkic languages. Structure 10(2007):1–5
Google Scholar
Aktas EU, Yilmaz C (2020) Automated issue assignment: results and insights from an industrial case. Empir Soft Eng 25(5):3544–3589
Article Google Scholar
Aktas EU, Yilmaz C (2022) Using screenshot attachments in issue reports for triaging. Empir Soft Eng 27(7):1–40
Google Scholar
Aktas EU, Cakmak E, Inan MC, Yilmaz C (2023). Issue report validation in an industrial context. Accepted for publication. In: Proceedings of the 31st ACM joint european software engineering conference and symposium on the foundations of software engineering
Behrang F, Orso A (2018) Test migration for efficient large-scale assessment of mobile app coding assignments. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis pp. 164-175
Bishop CM (2006) Pattern recognition and machine learning. Springer
Google Scholar
Breiman L (2001) Random forests. Machine Learning 45(1):5–32
Article Google Scholar
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Amodei D (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Google Scholar
Chantree F, Nuseibeh B, De Roeck A, Willis A (2006) Identifying nocuous ambiguities in natural language requirements. In: 14th IEEE international requirements engineering conference (RE’06) pp. 59-68. IEEE
Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017) Detecting missing information in bug descriptions. In Proceedings of the 2017 11th joint meeting on foundations of software engineering pp. 396-407
Chaparro O, Florez J M, Marcus A (2017) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: 2017 IEEE international conference on software maintenance and evolution (ICSME) pp. 376-387. IEEE
Chaparro O, Florez J M, Singh U, Marcus A (2019) Reformulating queries for duplicate bug report detection. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER) pp. 218-229. IEEE
Chaparro O, Florez JM, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empir Soft Eng 24(5):2947–3007
Article Google Scholar
Chaparro O, Bernal-Cárdenas C, Lu J, Moran K, Marcus A, Di Penta M, Poshyvanyk D, Ng V (2019). Assessing the quality of the steps to reproduce in bug reports. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering pp. 86-96
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
CoreNLP (2021) https://stanfordnlp.github.io/CoreNLP/
Çöltekin Ç (2010) A freely available morphological analyzer for Turkish. In Proceedings of the seventh international conference on language resources and evaluation, Vol 2, pp 19-28 (LREC’10)
Çöltekin Ç (2014) A Set of Open Source Tools for Turkish Natural Language. In Proceedings of the ninth international conference on language resources and evaluation, pp. 1079-1086 (LREC’14)
Devlin J, Chang M W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Dougherty G (2012) Pattern recognition and classification: an introduction. Springer Science & Business Media
Fazzini M, Moran K, Bernal-Cardenas C, Wendland T, Orso A, Poshyvanyk D (2022) Enhancing mobile app bug reporting via real-time understanding of reproduction steps. IEEE Trans Soft Eng 49(3):1246–1272
Article Google Scholar
Femmer H, Fernández DM, Juergens E, Klose M, Zimmer I, Zimmer J (2014). Rapid requirements checks with requirements smells: Two case studies. In Proceedings of the 1st International Workshop on Rapid Continuous Software Engineering (pp. 10-19)
Feng S, Chen C (2023) Prompting Is All Your Need: Automated Android Bug Replay with Large Language Models. arXiv:2306.01987
Gao J, Galley M, Li L (2018). Neural approaches to conversational AI. In The 41st international ACM SIGIR conference on research and development in information retrieval pp. 1371-1374
Hata M, Nishimoto M, Nishiyama K, Kawabata H, Hironaka T (2019) OSAIFU: A Source Code Factorizer on Android Studio. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) pp. 422-425. IEEE
Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th european conference on machine learning, Springer-Verlag, ECML’98, pp 137–142
Joulin A, Grave E, Bojanowski P and Mikolov T (2017) Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics: Volume 2, Short Papers, Association for Computational Linguistics, pp 427–431
Kallis R, Di Sorbo A, Canfora G, Panichella S (2019) Ticket tagger: Machine learning driven issue classification. In 2019 IEEE international conference on software maintenance and evolution (ICSME) pp. 406-409. IEEE
Kallis R, Di Sorbo A, Canfora G, Panichella S (2021) Predicting issue types on GitHub. Sci Comput Program 205:102598
Article Google Scholar
Kallis R, Chaparro O, Di Sorbo A, Panichella S (2022) Nlbse’22 tool competition. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) pp. 25-28. IEEE
Kang S, Yoon J, Yoo S (2023) Large language models are few-shot testers: Exploring llm-based general bug reproduction. In 2023 IEEE/ACM 45th international conference on software engineering (ICSE) pp 2312-2323. IEEE
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Lear Res 18(1):559–563
Google Scholar
Maiya AS (2022) ktrain: A low-code library for augmented machine learning. J Mach Lear Res 23(1):7070–7075
MathSciNet Google Scholar
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT press
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press
Book Google Scholar
Oflazer K (1994) Two-level description of Turkish morphology. Literary Linguist Comput 9(2):137–148
Article Google Scholar
Oflazer K (2014) Turkish and its challenges for language processing. Lang Resour Eval 48(4):639–653
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12(Oct):2825–2830
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108
Shokripour R, Anvik J, Kasirun ZM, Zamani S (2015) A time-based approach to automatic bug report assignment. J Syst Soft 102:109–122
Article Google Scholar
Song Y, Chaparro O (2020) Bee: a tool for structuring and analyzing bug reports. In Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering pp 1551-1555
Song Y, Mahmud J, Zhou Y, Chaparro O, Moran K, Marcus A, Poshyvanyk, D (2022) Toward interactive bug reporting for (android app) end-users. In Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering pp. 344-356
Song Y, Mahmud J, De Silva N, Zhou Y, Chaparro O, Moran K, Marcus A, Poshyvanyk D (2023) BURT: A Chatbot for Interactive Bug Reporting. arXiv:2302.06050
Thompson S K (2012) Sampling (Vol. 755). John Wiley & Sons
Zeller A (2009) Why programs fail: a guide to systematic debugging. Elsevier
Google Scholar
Zhang Z, Winn R, Zhao Y, Yu T, Halfond WG (2023) Automatically Reproducing Android Bug Reports Using Natural Language Processing and Reinforcement Learning. arXiv:2301.07775
Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? IEEE Trans Soft Eng 36(5):618–643
Article Google Scholar

Download references

Author information

Ebru Cakmak contributed to the study while working at Softtech

Authors and Affiliations

Softtech Inc., Research and Development Center, 34947, Istanbul, Turkey
Ethem Utku Aktas & Mete Cihad Inan
Microsoft EMEA, Istanbul, Turkey
Ebru Cakmak
Faculty of Engineering and Natural Sciences, Sabanci University, 34956, Istanbul, Turkey
Cemal Yilmaz

Authors

Ethem Utku Aktas
View author publications
You can also search for this author in PubMed Google Scholar
Ebru Cakmak
View author publications
You can also search for this author in PubMed Google Scholar
Mete Cihad Inan
View author publications
You can also search for this author in PubMed Google Scholar
Cemal Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ethem Utku Aktas.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Bibi Stamatia || Bowen Xu || Xiaofei Xie || Maxime Cordy

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Aktas, E.U., Cakmak, E., Inan, M.C. et al. Improving the quality of software issue report descriptions in Turkish: An industrial case study at Softtech. Empir Software Eng 29, 43 (2024). https://doi.org/10.1007/s10664-023-10434-4

Download citation

Accepted: 10 December 2023
Published: 12 February 2024
DOI: https://doi.org/10.1007/s10664-023-10434-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the quality of software issue report descriptions in Turkish: An industrial case study at Softtech

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting non-natural language artifacts for de-noising bug reports

Predicting the objective and priority of issue reports in software repositories

A multi-model framework for semantically enhancing detection of quality-related bug report descriptions

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improving the quality of software issue report descriptions in Turkish: An industrial case study at Softtech

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting non-natural language artifacts for de-noising bug reports

Predicting the objective and priority of issue reports in software repositories

A multi-model framework for semantically enhancing detection of quality-related bug report descriptions

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation