Abstract
A major current focus in software quality is how to identify and interpret Self-admitted technical debt(SATD). While many methods have been proposed to identify SATD, these methods are neither interpretable nor generic. There remains a need for an efficient method that can interpret SATD. In this paper, we propose a two-stage approach to identify and interpret SATD using interpretable methods. In the first stage, the decision tree model is combined into an integrated model to identify SATD better. We apply SHAP, LIME, and Anchors models in the second stage to interpret the result. The experiments of 10 projects show that our method not only can effectively detect and explain SATD both in within-project and cross-project experiments, but also has a good explanation for self-generated data outside the dataset.
Similar content being viewed by others
References
Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends 2(01):20–28
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Educ 16:321–357
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Cunningham W (1992) The wycash portfolio management system. ACM SIGPLAN OOPS Messenger 4(2):29–30
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics 28(2):337–407
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232
Guo Y, Seaman C (2011) A portfolio approach to technical debt management. In: Proceedings of the 2nd workshop on managing technical debt, pp 31–34
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887
Huang Q, Shihab E, Xia X et al (2018) Identifying self-admitted technical debt in open source projects using text mining. Empir Softw Eng 23(1):418–451
Jalilifard A, Caridá VF, Mansano AF, et al (2021) Semantic sensitive tf-idf to determine word relevance in documents. In: Advances in computing and network communications. Springer, pp 327–337
Ke G, Meng Q, Finley T et al (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30:3146–3154
Khoshgoftaar TM, Fazelpour A, Dittman DJ, et al (2015) Ensemble vs. data sampling: Which option is best suited to improve classification performance of imbalanced bioinformatics data? In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI), IEEE, pp 705–712
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751
Lisboa PJ (2013) Interpretability in machine learning–principles and practice. In: International workshop on fuzzy logic and applications, Springer, pp 15–21
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems, pp 4768–4777
Lundberg SM, Erion GG, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888
Maldonado EdS, Shihab E (2015) Detecting and quantifying different types of self-admitted technical debt. In: 2015 IEEE 7Th international workshop on managing technical debt (MTD), IEEE, pp 9–15
Martens D, Vanthienen J, Verbeke W et al (2011) Performance of classification models from a user perspective. Decis Support Syst 51(4):782–793
Mehrolia S, Alagarsamy S, Solaikutty VM (2021) Customers response to online food delivery services during covid-19 outbreak using binary logistic regression. Int J Consum Stud 45(3):396–408
Miller T (2019) Explanation in artificial intelligence: Insights from the social sciences. Artif Intell 267:1–38
Mosavi A, Hosseini FS, Choubin B et al (2021) Ensemble boosting and bagging based machine learning models for groundwater potential prediction. Water Resour Manag 35(1):23–37
Pecorelli F, Di Nucci D, De Roover C, et al (2019) On the role of data balancing for machine learning-based code smell detection. In: Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation, pp 19–24
Potdar A, Shihab E (2014) An exploratory study on self-admitted technical debt. In: 2014 IEEE international conference on software maintenance and evolution, IEEE, pp 91–100
Ren X, Xing Z, Xia X et al (2019) Neural network-based detection of self-admitted technical debt: From performance to explainability. ACM Trans Softw Eng Methodol (TOSEM) 28(3):1–45
Ribeiro MT, Singh S, Guestrin C (2016) " why should i trust you?" explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI conference on artificial intelligence
Rish I, et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, pp 41–46
Rutkowski L, Jaworski M, Pietruczuk L et al (2014) The cart decision tree for mining data streams. Inform Sci 266:1–15
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21(3):660–674
Seaman C, Guo Y, Zazworka N, et al (2012) Using technical debt data in decision making: Potential decision approaches. In: 2012 third international workshop on managing technical debt (MTD), IEEE, pp 45–48
da Silva Maldonado E, Shihab E, Tsantalis N (2017) Using natural language processing to automatically detect self-admitted technical debt. IEEE Trans Softw Eng 43(11):1044–1062
Smiti S, Soui M (2020) Bankruptcy prediction using deep learning approach based on borderline smote. Inf Syst Front 22:1067–1083
Soltanzadeh P, Hashemzadeh M (2021) Rcsmote: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inform Sci 542:92–111
Sterling C (2010) Managing Software Debt: Building for Inevitable Change (Adobe Reader). Addison-Wesley Professional
Tang S, Ghorbani A, Yamashita R et al (2021) Data valuation for medical imaging using shapley value and application to a large-scale chest x-ray dataset. Scientific reports 11(1):1–9
Wang X, Liu J, Li L, et al (2020) Detecting and explaining self-admitted technical debts with attention-based neural networks. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering, pp 871–882
Wehaibi S, Shihab E, Guerrouj L (2016) Examining the impact of self-admitted technical debt on software quality. In: 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), IEEE, pp 179–188
Yu J, Zhou X, Liu X et al (2023) Detecting multi-type self-admitted technical debt with generative adversarial network-based neural networks. Inf Softw Technol 158(107):190
Zazworka N, Shaw MA, Shull F, et al (2011) Investigating the impact of design debt on software quality. In: Proceedings of the 2nd workshop on managing technical debt, pp 17–23
Zhang H (2005) Exploring conditions for the optimality of naive bayes. Int J Pattern Recognit Artif Intell 19(02):183–198
Acknowledgements
The work reported here was supported by Natural Science Basic Research Program of Shaanxi (Program No.2023-JC-YB-615) and Shaanxi Social Science Fund in China (Program No.2023R102). The authors thank all the participants in the experiment and the professors for their advice. The authors would like to thank the anonymous reviewers for their valuable suggestion and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ming Yin and Jiaze Wang contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yin, M., Wang, J., Zhu, D. et al. A two-stage approach for identifying and interpreting self-admitted technical debt. Appl Intell 53, 26592–26602 (2023). https://doi.org/10.1007/s10489-023-04941-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04941-6