Skip to main content
Log in

Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Technical debt is a metaphor for seeking short-term gains at expense of long-term code quality. Previous studies have shown that self-admitted technical debt, which is introduced intentionally, has strong negative impacts on software development and incurs high maintenance overheads. To help developers identify self-admitted technical debt, researchers have proposed many state-of-the-art methods. However, there is still room for improvement about the effectiveness of the current methods, as self-admitted technical debt comments have the characteristics of length variability, low proportion and style diversity. Therefore, in this paper, we propose a novel approach based on the bidirectional long short-term memory (BiLSTM) networks with the attention mechanism to automatically detect self-admitted technical debt by leveraging source code comments. In BiLSTM, we utilize a balanced cross entropy loss function to overcome the class unbalance problem. We experimentally investigate the performance of our approach on a public dataset including 62, 566 code comments from ten open source projects. Experimental results show that our approach achieves 81.75% in terms of precision, 72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the state-of-the-art text mining-based method by 8.14%, 5.49% and 6.64%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Mensah S, Keung J, Svajlenko J, Bennin K E, Mi Q. On the value of a prioritization scheme for resolving self-admitted technical debt. Journal of Systems and Software, 2018, 135: 37–54

    Article  Google Scholar 

  2. Cunningham W. The WyCash portfolio management system. ACM SIG-PLAN OOPS Messenger, 1992, 4(2): 29–30

    Article  Google Scholar 

  3. Lim E, Taksande N, Seaman C. A balancing act: what software practitioners have to say about technical debt. IEEE Software, 2012, 29(6): 22–27

    Article  Google Scholar 

  4. Yli-Huumo J, Maglyas A, Smolander K. How do software development teams manage technical debt? an empirical study. Journal of Systems and Software, 2016, 120: 195–218

    Article  Google Scholar 

  5. Zazworka N, Shaw M A, Shull F, Seaman C. Investigating the impact of design debt on software quality. In: Proceedings of the 2nd Workshop on Managing Technical Debt. 2011, 17–23

  6. Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220

    Article  Google Scholar 

  7. Maldonado E S, Shihab E, Tsantalis N. Using natural language processing to automatically detect self-admitted technical debt. IEEE Transactions on Software Engineering, 2017, 43(11): 1044–1062

    Article  Google Scholar 

  8. Huang Q, Shihab E, Xia X, Lo D, Li S. Identifying self-admitted technical debt in open source projects using text mining. Empirical Software Engineering, 2018, 23(1): 418–451

    Article  Google Scholar 

  9. Potdar A, Shihab E. An exploratory study on self-admitted technical debt. In: Proceedings of 2014 IEEE International Conference on Software Maintenance and Evolution. 2014, 91–100

  10. Maldonado E S, Shihab E. Detecting and quantifying different types of self-admitted technical debt. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 9–15

  11. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780

    Article  Google Scholar 

  12. Yu R, Gao J, Yu M, Lu W, Xu T, Zhao M, Zhang J, Zhang R, Zhang Z. LSTM-EFG for wind power forecasting based on sequential correlation features. Future Generation Computer Systems, 2019, 93: 33–42

    Article  Google Scholar 

  13. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5–6): 602–610

    Article  Google Scholar 

  14. Zhang S, Zheng D, Hu X, Yang M. Bidirectional long short-term memory networks for relation classification. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 2015, 73–78

  15. Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 2019, 337: 325–338

    Article  Google Scholar 

  16. Zeng D, Liu K, Chen Y, Zhao J. Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 1753–1762

  17. Schnabel T, Labutov I, Mimno D, Joachims T. Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 298–307

  18. Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1532–1543

  19. Kotsiantis S, Kanellopoulos D, Pintelas P. Handling imbalanced datasets: a review. GESTS International Transactions on Computer Science and Engineering, 2006, 30(1): 25–36

    Google Scholar 

  20. Wasikowski M, Chen X. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10): 1388–1400

    Article  Google Scholar 

  21. Xie S, Tu Z. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1395–1403

  22. Bajpai P, Kumar M. Genetic algorithm-an approach to solve global optimization problems. Indian Journal of Computer Science and Engineering, 2010, 1(3): 199–206

    Google Scholar 

  23. Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

  24. Zampetti F, Noiseux C, Antoniol G, Khomh F, Di Penta M. Recommending when design technical debt should be self-admitted. In: Proceedings of 2017 IEEE International Conference on Software Maintenance and Evolution. 2017, 216–226

  25. Liu Z, Huang Q, Xia X, Shihab E, Lo D, Li S. Satd detector: a text-mining-based self-admitted technical debt detection tool. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 2018, 9–12

  26. Lee M L, Ling T W, Low W L. IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2000, 290–294

  27. Gelfand A E. Model determination using sampling-based methods. Markov chain Monte Carlo in practice, 1996, 145–161

  28. Jiang H, Zhang J, Li X, Ren Z, Lo D. A more accurate model for finding tutorial segments explaining APIs. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 157–167

  29. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324

    Article  Google Scholar 

  30. Sierra G, Shihab E, Kamei Y. A survey of self-admitted technical debt. Journal of Systems and Software, 2019, 152: 70–82

    Article  Google Scholar 

  31. Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220

    Article  Google Scholar 

  32. Fontana F A, Ferme V, Spinelli S. Investigating the impact of code smells debt on quality code evaluation. In: Proceedings of the 3rd International Workshop on Managing Technical Debt. 2012, 15–22

  33. Tom E, Aurum A K, Vidgen R. An exploration of technical debt. Journal of Systems and Software, 2013, 86(6): 1498–1516

    Article  Google Scholar 

  34. Zazworka N, Spínola R O, Vetro’ A, Shull F, Seaman C. A case study on effectively identifying technical debt. In: Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. 2013, 42–47

  35. Alves N S R, Mendes T S, Mendonça M G, Spínola R O, Shull F, Seaman C. Identification and management of technical debt: a systematic mapping study. Information and Software Technology, 2016, 70: 100–121

    Article  Google Scholar 

  36. Farias M A F, Mendonça M G, Silva A B, Sp nola R O. A contextualized vocabulary model for identifying technical debt on code comments. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 25–32

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grants Nos. 61100043, 61902096 and 61702144) and Key Project of Science and Technology of Zhejiang Province (2017C01010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongjin Yu.

Additional information

Dongjin Yu is currently a professor at Hangzhou Dianzi University, China. His research efforts include intelligent software engineering, data engineering and service computing. He is the director of Big Data Institute, and the director of Computer Software Institute of Hangzhou Dianzi University. He is a member of IEEE, and a senior member of China Computer Federation (CCF). He is also a member of Technical Committee of Software Engineering CCF (TCSE CCF) and a member of Technical Committee of Service Computing CCF (TCSC CCF).

Lin Wang received the Bachelor Degree in 2017 from the School of computer science, Hangzhou Dianzi University, China. She is currently a graduate student at Hangzhou Dianzi University, China. Her current research interests mainly include mining software repositories and software maintenance.

Xin Chen received the PhD degree in software engineering in 2018 from the School of Software, Dalian University of Technology, China. He is currently a lecturer of Hangzhou Dianzi University, China. His research interests include mining software repositories, search based software engineering, and evolutionary computation. He is a member of the CCF and the ACM.

Jie Chen is an assistant professor in the College of Computer Science at Hangzhou Dianzi University, China. She received the PhD degree from the Lab of Internet Software Technologies, Institute of Software, Chinese Academy of Sciences (ISCAS), China in 2016. She was a visiting scholar in the Department of Computer Science, University of Massachusetts Amherst, USA from September 2012 to September 2013. Her research interests are in software process simulation, resource scheduling and code analysis.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, D., Wang, L., Chen, X. et al. Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt. Front. Comput. Sci. 15, 154208 (2021). https://doi.org/10.1007/s11704-020-9281-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-9281-z

Keywords

Navigation