Skip to main content
Log in

A two-stage approach for identifying and interpreting self-admitted technical debt

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

A major current focus in software quality is how to identify and interpret Self-admitted technical debt(SATD). While many methods have been proposed to identify SATD, these methods are neither interpretable nor generic. There remains a need for an efficient method that can interpret SATD. In this paper, we propose a two-stage approach to identify and interpret SATD using interpretable methods. In the first stage, the decision tree model is combined into an integrated model to identify SATD better. We apply SHAP, LIME, and Anchors models in the second stage to interpret the result. The experiments of 10 projects show that our method not only can effectively detect and explain SATD both in within-project and cross-project experiments, but also has a good explanation for self-generated data outside the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. code:https://github.com/Wxxxxx2023/SATD-code.git

References

  1. Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends 2(01):20–28

    Article  Google Scholar 

  2. Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Educ 16:321–357

    MATH  Google Scholar 

  3. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794

  4. Cunningham W (1992) The wycash portfolio management system. ACM SIGPLAN OOPS Messenger 4(2):29–30

    Article  Google Scholar 

  5. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics 28(2):337–407

    Article  MathSciNet  MATH  Google Scholar 

  6. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232

  7. Guo Y, Seaman C (2011) A portfolio approach to technical debt management. In: Proceedings of the 2nd workshop on managing technical debt, pp 31–34

  8. Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887

  9. Huang Q, Shihab E, Xia X et al (2018) Identifying self-admitted technical debt in open source projects using text mining. Empir Softw Eng 23(1):418–451

    Article  Google Scholar 

  10. Jalilifard A, Caridá VF, Mansano AF, et al (2021) Semantic sensitive tf-idf to determine word relevance in documents. In: Advances in computing and network communications. Springer, pp 327–337

  11. Ke G, Meng Q, Finley T et al (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30:3146–3154

    Google Scholar 

  12. Khoshgoftaar TM, Fazelpour A, Dittman DJ, et al (2015) Ensemble vs. data sampling: Which option is best suited to improve classification performance of imbalanced bioinformatics data? In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI), IEEE, pp 705–712

  13. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751

  14. Lisboa PJ (2013) Interpretability in machine learning–principles and practice. In: International workshop on fuzzy logic and applications, Springer, pp 15–21

  15. Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems, pp 4768–4777

  16. Lundberg SM, Erion GG, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888

  17. Maldonado EdS, Shihab E (2015) Detecting and quantifying different types of self-admitted technical debt. In: 2015 IEEE 7Th international workshop on managing technical debt (MTD), IEEE, pp 9–15

  18. Martens D, Vanthienen J, Verbeke W et al (2011) Performance of classification models from a user perspective. Decis Support Syst 51(4):782–793

    Article  Google Scholar 

  19. Mehrolia S, Alagarsamy S, Solaikutty VM (2021) Customers response to online food delivery services during covid-19 outbreak using binary logistic regression. Int J Consum Stud 45(3):396–408

    Article  Google Scholar 

  20. Miller T (2019) Explanation in artificial intelligence: Insights from the social sciences. Artif Intell 267:1–38

    Article  MathSciNet  MATH  Google Scholar 

  21. Mosavi A, Hosseini FS, Choubin B et al (2021) Ensemble boosting and bagging based machine learning models for groundwater potential prediction. Water Resour Manag 35(1):23–37

    Article  Google Scholar 

  22. Pecorelli F, Di Nucci D, De Roover C, et al (2019) On the role of data balancing for machine learning-based code smell detection. In: Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation, pp 19–24

  23. Potdar A, Shihab E (2014) An exploratory study on self-admitted technical debt. In: 2014 IEEE international conference on software maintenance and evolution, IEEE, pp 91–100

  24. Ren X, Xing Z, Xia X et al (2019) Neural network-based detection of self-admitted technical debt: From performance to explainability. ACM Trans Softw Eng Methodol (TOSEM) 28(3):1–45

  25. Ribeiro MT, Singh S, Guestrin C (2016) " why should i trust you?" explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144

  26. Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386

  27. Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI conference on artificial intelligence

  28. Rish I, et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, pp 41–46

  29. Rutkowski L, Jaworski M, Pietruczuk L et al (2014) The cart decision tree for mining data streams. Inform Sci 266:1–15

    Article  MATH  Google Scholar 

  30. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21(3):660–674

    Article  MathSciNet  Google Scholar 

  31. Seaman C, Guo Y, Zazworka N, et al (2012) Using technical debt data in decision making: Potential decision approaches. In: 2012 third international workshop on managing technical debt (MTD), IEEE, pp 45–48

  32. da Silva Maldonado E, Shihab E, Tsantalis N (2017) Using natural language processing to automatically detect self-admitted technical debt. IEEE Trans Softw Eng 43(11):1044–1062

    Article  Google Scholar 

  33. Smiti S, Soui M (2020) Bankruptcy prediction using deep learning approach based on borderline smote. Inf Syst Front 22:1067–1083

    Article  Google Scholar 

  34. Soltanzadeh P, Hashemzadeh M (2021) Rcsmote: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inform Sci 542:92–111

    Article  MathSciNet  MATH  Google Scholar 

  35. Sterling C (2010) Managing Software Debt: Building for Inevitable Change (Adobe Reader). Addison-Wesley Professional

  36. Tang S, Ghorbani A, Yamashita R et al (2021) Data valuation for medical imaging using shapley value and application to a large-scale chest x-ray dataset. Scientific reports 11(1):1–9

    Google Scholar 

  37. Wang X, Liu J, Li L, et al (2020) Detecting and explaining self-admitted technical debts with attention-based neural networks. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering, pp 871–882

  38. Wehaibi S, Shihab E, Guerrouj L (2016) Examining the impact of self-admitted technical debt on software quality. In: 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), IEEE, pp 179–188

  39. Yu J, Zhou X, Liu X et al (2023) Detecting multi-type self-admitted technical debt with generative adversarial network-based neural networks. Inf Softw Technol 158(107):190

    Google Scholar 

  40. Zazworka N, Shaw MA, Shull F, et al (2011) Investigating the impact of design debt on software quality. In: Proceedings of the 2nd workshop on managing technical debt, pp 17–23

  41. Zhang H (2005) Exploring conditions for the optimality of naive bayes. Int J Pattern Recognit Artif Intell 19(02):183–198

    Article  Google Scholar 

Download references

Acknowledgements

The work reported here was supported by Natural Science Basic Research Program of Shaanxi (Program No.2023-JC-YB-615) and Shaanxi Social Science Fund in China (Program No.2023R102). The authors thank all the participants in the experiment and the professors for their advice. The authors would like to thank the anonymous reviewers for their valuable suggestion and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ming Yin and Jiaze Wang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, M., Wang, J., Zhu, D. et al. A two-stage approach for identifying and interpreting self-admitted technical debt. Appl Intell 53, 26592–26602 (2023). https://doi.org/10.1007/s10489-023-04941-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04941-6

Keywords

Navigation