A two-stage approach for identifying and interpreting self-admitted technical debt

Yin, Ming; Wang, Jiaze; Zhu, Dan; Gao, Cunzhi

doi:10.1007/s10489-023-04941-6

A two-stage approach for identifying and interpreting self-admitted technical debt

Published: 25 August 2023

Volume 53, pages 26592–26602, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ming Yin¹,
Jiaze Wang ORCID: orcid.org/0000-0001-9338-0249¹,
Dan Zhu² &
…
Cunzhi Gao¹

152 Accesses
Explore all metrics

Abstract

A major current focus in software quality is how to identify and interpret Self-admitted technical debt(SATD). While many methods have been proposed to identify SATD, these methods are neither interpretable nor generic. There remains a need for an efficient method that can interpret SATD. In this paper, we propose a two-stage approach to identify and interpret SATD using interpretable methods. In the first stage, the decision tree model is combined into an integrated model to identify SATD better. We apply SHAP, LIME, and Anchors models in the second stage to interpret the result. The experiments of 10 projects show that our method not only can effectively detect and explain SATD both in within-project and cross-project experiments, but also has a good explanation for self-generated data outside the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Understanding automated and human-based technical debt identification approaches-a two-phase study

Article Open access 08 June 2019

An Analysis of Automated Technical Debt Measurement

Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities

Article 26 August 2020

Notes

code:https://github.com/Wxxxxx2023/SATD-code.git

References

Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends 2(01):20–28
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Educ 16:321–357
MATH Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Cunningham W (1992) The wycash portfolio management system. ACM SIGPLAN OOPS Messenger 4(2):29–30
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics 28(2):337–407
Article MathSciNet MATH Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232
Guo Y, Seaman C (2011) A portfolio approach to technical debt management. In: Proceedings of the 2nd workshop on managing technical debt, pp 31–34
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887
Huang Q, Shihab E, Xia X et al (2018) Identifying self-admitted technical debt in open source projects using text mining. Empir Softw Eng 23(1):418–451
Article Google Scholar
Jalilifard A, Caridá VF, Mansano AF, et al (2021) Semantic sensitive tf-idf to determine word relevance in documents. In: Advances in computing and network communications. Springer, pp 327–337
Ke G, Meng Q, Finley T et al (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30:3146–3154
Google Scholar
Khoshgoftaar TM, Fazelpour A, Dittman DJ, et al (2015) Ensemble vs. data sampling: Which option is best suited to improve classification performance of imbalanced bioinformatics data? In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI), IEEE, pp 705–712
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751
Lisboa PJ (2013) Interpretability in machine learning–principles and practice. In: International workshop on fuzzy logic and applications, Springer, pp 15–21
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems, pp 4768–4777
Lundberg SM, Erion GG, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888
Maldonado EdS, Shihab E (2015) Detecting and quantifying different types of self-admitted technical debt. In: 2015 IEEE 7Th international workshop on managing technical debt (MTD), IEEE, pp 9–15
Martens D, Vanthienen J, Verbeke W et al (2011) Performance of classification models from a user perspective. Decis Support Syst 51(4):782–793
Article Google Scholar
Mehrolia S, Alagarsamy S, Solaikutty VM (2021) Customers response to online food delivery services during covid-19 outbreak using binary logistic regression. Int J Consum Stud 45(3):396–408
Article Google Scholar
Miller T (2019) Explanation in artificial intelligence: Insights from the social sciences. Artif Intell 267:1–38
Article MathSciNet MATH Google Scholar
Mosavi A, Hosseini FS, Choubin B et al (2021) Ensemble boosting and bagging based machine learning models for groundwater potential prediction. Water Resour Manag 35(1):23–37
Article Google Scholar
Pecorelli F, Di Nucci D, De Roover C, et al (2019) On the role of data balancing for machine learning-based code smell detection. In: Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation, pp 19–24
Potdar A, Shihab E (2014) An exploratory study on self-admitted technical debt. In: 2014 IEEE international conference on software maintenance and evolution, IEEE, pp 91–100
Ren X, Xing Z, Xia X et al (2019) Neural network-based detection of self-admitted technical debt: From performance to explainability. ACM Trans Softw Eng Methodol (TOSEM) 28(3):1–45
Ribeiro MT, Singh S, Guestrin C (2016) " why should i trust you?" explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI conference on artificial intelligence
Rish I, et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, pp 41–46
Rutkowski L, Jaworski M, Pietruczuk L et al (2014) The cart decision tree for mining data streams. Inform Sci 266:1–15
Article MATH Google Scholar
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21(3):660–674
Article MathSciNet Google Scholar
Seaman C, Guo Y, Zazworka N, et al (2012) Using technical debt data in decision making: Potential decision approaches. In: 2012 third international workshop on managing technical debt (MTD), IEEE, pp 45–48
da Silva Maldonado E, Shihab E, Tsantalis N (2017) Using natural language processing to automatically detect self-admitted technical debt. IEEE Trans Softw Eng 43(11):1044–1062
Article Google Scholar
Smiti S, Soui M (2020) Bankruptcy prediction using deep learning approach based on borderline smote. Inf Syst Front 22:1067–1083
Article Google Scholar
Soltanzadeh P, Hashemzadeh M (2021) Rcsmote: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inform Sci 542:92–111
Article MathSciNet MATH Google Scholar
Sterling C (2010) Managing Software Debt: Building for Inevitable Change (Adobe Reader). Addison-Wesley Professional
Tang S, Ghorbani A, Yamashita R et al (2021) Data valuation for medical imaging using shapley value and application to a large-scale chest x-ray dataset. Scientific reports 11(1):1–9
Google Scholar
Wang X, Liu J, Li L, et al (2020) Detecting and explaining self-admitted technical debts with attention-based neural networks. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering, pp 871–882
Wehaibi S, Shihab E, Guerrouj L (2016) Examining the impact of self-admitted technical debt on software quality. In: 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), IEEE, pp 179–188
Yu J, Zhou X, Liu X et al (2023) Detecting multi-type self-admitted technical debt with generative adversarial network-based neural networks. Inf Softw Technol 158(107):190
Google Scholar
Zazworka N, Shaw MA, Shull F, et al (2011) Investigating the impact of design debt on software quality. In: Proceedings of the 2nd workshop on managing technical debt, pp 17–23
Zhang H (2005) Exploring conditions for the optimality of naive bayes. Int J Pattern Recognit Artif Intell 19(02):183–198
Article Google Scholar

Download references

Acknowledgements

The work reported here was supported by Natural Science Basic Research Program of Shaanxi (Program No.2023-JC-YB-615) and Shaanxi Social Science Fund in China (Program No.2023R102). The authors thank all the participants in the experiment and the professors for their advice. The authors would like to thank the anonymous reviewers for their valuable suggestion and constructive comments.

Author information

Authors and Affiliations

School of Software, Northwestern Polytechnical University, 127 Youyi St. West, Xi’an, 710072, Shannxi, People’s Republic of China
Ming Yin, Jiaze Wang & Cunzhi Gao
Debbie and Jerry Ivy College of Business, Iowa State University, 2200 Gerdin Building 2167 Union Drive, Ames, IA, USA
Dan Zhu

Authors

Ming Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jiaze Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Cunzhi Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ming Yin and Jiaze Wang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yin, M., Wang, J., Zhu, D. et al. A two-stage approach for identifying and interpreting self-admitted technical debt. Appl Intell 53, 26592–26602 (2023). https://doi.org/10.1007/s10489-023-04941-6

Download citation

Accepted: 03 August 2023
Published: 25 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04941-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-stage approach for identifying and interpreting self-admitted technical debt

Abstract

Access this article

Similar content being viewed by others

Understanding automated and human-based technical debt identification approaches-a two-phase study

An Analysis of Automated Technical Debt Measurement

Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A two-stage approach for identifying and interpreting self-admitted technical debt

Abstract

Access this article

Similar content being viewed by others

Understanding automated and human-based technical debt identification approaches-a two-phase study

An Analysis of Automated Technical Debt Measurement

Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation