From Reinvention to Reuse: An Empirical Example Study on Technical Debt Dataset

Rantala, Leevi; Mäntylä, Mika V.; Sridharan, Murali

doi:10.1007/978-3-031-78386-9_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15452))

Included in the following conference series:

International Conference on Product-Focused Software Process Improvement

276 Accesses

Abstract

Self-Admitted Technical Debt (SATD) is a subset of Technical Debt (TD), where the developer leaves a comment on the source, thus marking the place where debt has been taken. Previous research on SATD relies on either the creation of new datasets or the reuse of existing ones. One seminal SATD dataset containing over 4,000 SATD comments and their classification into five different TD categories was published by Maldonado et al. [14]. The drawback of the dataset is its lack of any other information, e.g. static analysis, seriously limiting its possible use cases. We remedy this situation by reforming the dataset. We combine the original comments with contextual information and static analysis from the source codes and recreate the dataset as an SQLite database. Our reformed dataset contains over 13,000 files, nearly 14,000 classes, almost 100,000 methods, and over 650,000 code violation instances. The reformed dataset allows varied and detailed analyses in the future, which we demonstrate by examining the relationship of SATD comments to code violations. The results show that on the method level, the most important predictors are the number of code violations in total as well as the number of violations labelled as Priority 3 or belonging to the Documentation Rule Set. On the file level, LOC is an important predictor alongside the number of violations from the Documentation Rule Set or having a Priority 2 classification. Overall, our example study demonstrates the potential of what reforming existing datasets can have.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

AlOmar, E.A., et al.: SATDBailiff-mining and tracking self-admitted technical debt. Sci. Comput. Program. 213, 102693 (2022)
Article Google Scholar
Anand, R., Jeffrey David, U.: Mining of Massive Datasets. Cambridge university press (2011)
Google Scholar
Avgeriou, P., Kruchten, P., Ozkaya, I., Seaman, C.: Managing technical debt in software engineering (DAGSTUHL seminar 16162). In: DAGSTUHL reports. vol. 6. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)
Google Scholar
Broder, A.Z.: Identifying and filtering near-duplicate documents. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 1–10. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45123-4_1
Chapter Google Scholar
Huang, Q., Shihab, E., Xia, X., Lo, D., Li, S.: Identifying self-admitted technical debt in open source projects using text mining. Empir. Softw. Eng. 23, 418–451 (2018)
Article Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Google Scholar
Li, Z., Avgeriou, P., Liang, P.: A systematic mapping study on technical debt and its management. J. Syst. Softw. 101, 193–220 (2015)
Article Google Scholar
da Maldonado, E.S., Shihab, E.: Detecting and quantifying different types of self-admitted technical debt. In: 2015 IEEE 7Th International Workshop on Managing Technical Debt (MTD), pp. 9–15. IEEE (2015)
Google Scholar
Potdar, A., Shihab, E.: An exploratory study on self-admitted technical debt. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 91–100. IEEE (2014)
Google Scholar
Rantala, L.: From reuse to reform: a sample study on technical debt dataset, final dataset (2024). https://doi.org/10.6084/m9.figshare.22778606
Rantala, L.: From reuse to reform: a sample study on technical debt dataset, replication package (2024). https://doi.org/10.6084/m9.figshare.21959882
Ren, X., Xing, Z., Xia, X., Lo, D., Wang, X., Grundy, J.: Neural network-based detection of self-admitted technical debt: from performance to explainability. ACM Trans. Softw. Eng. Methodol. (TOSEM) 28(3), 1–45 (2019)
Article Google Scholar
Rice, W.R.: Analyzing tables of statistical tests. Evolution 43(1), 223–225 (1989)
Article Google Scholar
da Silva Maldonado, E., Shihab, E., Tsantalis, N.: Using natural language processing to automatically detect self-admitted technical debt. IEEE Trans. Software Eng. 43(11), 1044–1062 (2017)
Article Google Scholar
Singh, D., Sekar, V.R., Stolee, K.T., Johnson, B.: Evaluating how static analysis tools can reduce code review effort. In: 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 101–105. IEEE (2017)
Google Scholar
Skryseth, D., Shivashankar, K., Pilán, I., Martini, A.: Technical debt classification in issue trackers using natural language processing based on transformers. In: 2023 ACM/IEEE International Conference on Technical Debt (TechDebt), pp. 92–101 (2023). https://doi.org/10.1109/TechDebt59074.2023.00017
Sridharan, M., Mäntylä, M., Claes, M., Rantala, L.: SoCCMiner: a source code-comments and comment-context miner. In: Proceedings of the 19th International Conference on Mining Software Repositories, pp. 242–246 (2022)
Google Scholar
Stol, K.J., Fitzgerald, B.: The ABC of software engineering research. ACM Trans. Softw. Eng. Methodol. (TOSEM) 27(3), 1–51 (2018)
Article Google Scholar
Sutoyo, E., Capiluppi, A.: SATDAUG–a balanced and augmented dataset for detecting self-admitted technical debt. In: Proceedings of the 21st International Conference on Mining Software Repositories (2024)
Google Scholar
Trautsch, A., Herbold, S., Grabowski, J.: A longitudinal study of static analysis warning evolution and the effects of PMD on software quality in apache open source projects. Empir. Softw. Eng. 25(6), 5137–5192 (2020)
Article Google Scholar
Xiao, T., Zeng, Z., Wang, D., Hata, H., McIntosh, S., Matsumoto, K.: Quantifying and characterizing clones of self-admitted technical debt in build systems. Empir. Softw. Eng. 29(2), 1–31 (2024)
Article Google Scholar
Yan, M., Xia, X., Shihab, E., Lo, D., Yin, J., Yang, X.: Automating change-level self-admitted technical debt determination. IEEE Trans. Software Eng. 45(12), 1211–1229 (2018)
Article Google Scholar
Yu, J., Zhao, K., Liu, J., Liu, X., Xu, Z., Wang, X.: Exploiting gated graph neural network for detecting and explaining self-admitted technical debts. J. Syst. Softw. 187, 111219 (2022)
Article Google Scholar
Zhu, K., Yin, M., Zhu, D., Zhang, X., Gao, C., Jiang, J.: SCGRU: a general approach for identifying multiple classes of self-admitted technical debt with text generation oversampling. J. Syst. Softw. 195, 111514 (2023)
Article Google Scholar

Download references

Acknowledgments

The authors have been supported by Academy of Finland (grant number 328058).

Author information

Authors and Affiliations

University of Oulu, Oulu, Finland
Leevi Rantala & Murali Sridharan
University of Helsinki, Helsinki, Finland
Mika V. Mäntylä

Authors

Leevi Rantala
View author publications
You can also search for this author in PubMed Google Scholar
Mika V. Mäntylä
View author publications
You can also search for this author in PubMed Google Scholar
Murali Sridharan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leevi Rantala .

Editor information

Editors and Affiliations

University of Tartu, Tartu, Estonia
Dietmar Pfahl
Blekinge Institute of Technology, Karlskrona, Sweden
Javier Gonzalez Huerta
Leibniz Universität Hannover, Hannover, Germany
Jil Klünder
University of Tartu, Tartu, Estonia
Hina Anwar

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rantala, L., Mäntylä, M.V., Sridharan, M. (2025). From Reinvention to Reuse: An Empirical Example Study on Technical Debt Dataset. In: Pfahl, D., Gonzalez Huerta, J., Klünder, J., Anwar, H. (eds) Product-Focused Software Process Improvement. PROFES 2024. Lecture Notes in Computer Science, vol 15452. Springer, Cham. https://doi.org/10.1007/978-3-031-78386-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-78386-9_8
Published: 27 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78385-2
Online ISBN: 978-3-031-78386-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Reinvention to Reuse: An Empirical Example Study on Technical Debt Dataset