Skip to main content
Log in

A Clustering Approach Towards Cross-Project Technical Debt Forecasting

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Technical debt (TD) describes quality compromises that can yield short-term benefits but may negatively affect the quality of software products in the long run. A wide range of tools and techniques have been introduced over the years in order for the developers to be able to determine and manage TD. However, being able to also predict its future evolution is of equal importance to avoid its accumulation, and, in turn, the unlikely event of making the project unmaintainable. Although recent research endeavors have showcased the feasibility of building accurate project-specific TD forecasting models, there is a gap in the field regarding cross-project TD forecasting. Cross-project TD forecasting is of practical importance, since it would enable the application of pre-existing forecasting models on previously unknown software projects, especially new projects that do not exhibit sufficient commit history to enable the construction of project-specific models. To this end, in the present paper, we focus on cross-project TD forecasting, and we examine whether the consideration of similarities between software projects could be the key for more accurate forecasting. More specifically, we propose an approach based on data clustering. In fact, a relatively large repository of software projects is divided into clusters of similar projects with respect to their TD aspects, and specific TD forecasting models are built for each cluster, using regression algorithms. According to our approach, previously unknown software projects are assigned to one of the defined clusters and the cluster-specific TD forecasting model is applied to predict future TD values. The approach was evaluated through several experiments based on real-world applications. The results of the analysis suggest that the proposed approach comprises a promising solution for accurate cross-project TD forecasting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://sdk4ed.eu/.

  2. https://github.com/clowee/The-Technical-Debt-Dataset.

  3. https://www.sonarqube.org/.

  4. https://scikit-learn.org/stable/.

References

  1. Alves NSR, Mendes TS, Mendonça MGd, Spínola RO, Shull F, Seaman C. Identification and management of technical debt: a systematic mapping study. Inf Softw Technol. 2016;70:100–21. https://doi.org/10.1016/j.infsof.2015.10.008. http://www.sciencedirect.com/science/article/pii/S0950584915001743.

  2. Ampatzoglou A, Ampatzoglou A, Avgeriou P, Chatzigeorgiou A. Establishing a framework for managing interest in technical debt. In: 5th International Symposium on Business Modeling and Software Design (BMSD). Citeseer (2015). https://doi.org/10.5220/0005885700750085.

  3. Ampatzoglou A, Michailidis A, Sarikyriakidis C, Ampatzoglou A, Chatzigeorgiou A, Avgeriou P. A framework for managing interest in technical debt: an industrial validation. In: Proceedings of the 2018 International Conference on Technical Debt (TechDebt), 2018. https://doi.org/10.1145/3194164.3194175.

  4. Arisholm E, Briand LC. Predicting fault-prone components in a java legacy system. In: Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2006, p. 8–17. ACM . https://doi.org/10.1145/1159733.1159738.

  5. Arvanitou EM, Ampatzoglou A, Bibi S, Chatzigeorgiou A, Stamelos I. Monitoring technical debt in an industrial setting. In: Proceedings of the Evaluation and Assessment on Software Engineering, 2019, p. 123–32. Association for Computing Machinery.

  6. Bellman R. Dynamic programming. Dover books on computer science series. Dover Publications; 2003. https://books.google.gr/books?id=fyVtp3EMxasC.

  7. Boehm BW. others: Software engineering economics. IEEE Trans Softw Eng. 1984;SE–10(1):4–21. https://doi.org/10.1109/TSE.1984.5010193.

    Article  Google Scholar 

  8. Brown N, Cai Y, Guo Y, Kazman R, Kim M, Kruchten P, Lim E, MacCormack A, Nord R, Ozkaya I. others: Managing technical debt in software-reliant systems. In: Proceedings of the Workshop on Future of Software Engineering Research (FSE/SDP), 2010, p. 47–52. ACM . https://doi.org/10.1145/1882362.1882373.

  9. Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S. Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, 2013, p. 252–61.

  10. Chaikalis T, Chatzigeorgiou A. Forecasting java software evolution trends employing network models. IEEE Trans Softw Eng. 2015;41(6):582–602. https://doi.org/10.1109/TSE.2014.2381249.

    Article  Google Scholar 

  11. Challagulla VUB, Bastani FB, Paul aRA. Empirical assessment of machine learning based software defect prediction techniques. In: 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS), 2005, p. 263–70. https://doi.org/10.1109/WORDS.2005.32.

  12. Chatzigeorgiou A, Ampatzoglou A, Ampatzoglou A, Amanatidis T. Estimating the breaking point for technical debt. In: IEEE 7th International Workshop on Managing Technical Debt (MTD), 2015, p. 53–6. IEEE. https://doi.org/10.1109/MTD.2015.7332625.

  13. Chug A, Malhotra R. Benchmarking framework for maintainability prediction of open source software using object oriented metrics. Int J Innov Comput Inf Control. 2016;12(2):615–34.

    Google Scholar 

  14. Cunningham W. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger. 1993;4(2):29–30. https://doi.org/10.1145/157710.157715.

    Article  Google Scholar 

  15. Dietterich TG. Machine learning for sequential data: a review. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), 2002, p. 5–30. Springer. https://doi.org/10.1007/3-540-70659-3_2.

  16. Digkas G, Lungu M, Avgeriou P, Chatzigeorgiou A, Ampatzoglou A. How do developers fix issues and pay back technical debt in the apache ecosystem. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018, p. 153–63. IEEE. https://doi.org/10.1109/SANER.2018.8330205.

  17. Digkas G, Lungu M, Chatzigeorgiou A, Avgeriou P. The evolution of technical debt in the apache ecosystem. In: European Conference on Software Architecture (ECSA), 2017, p. 51–66. Springer. https://doi.org/10.1007/978-3-319-65831-5_4.

  18. Elish MO. Elish K. Application of TreeNet in predicting object-oriented software maintainability: a comparative study. In: 2009 13th European Conference on Software Maintenance and Reengineering (CSMR), 2009, p. 69–78. https://doi.org/10.1109/CSMR.2009.57. ISSN: 1534-5351.

  19. Fontana FA, Ferme V, Spinelli S. Investigating the impact of code smells debt on quality code evaluation. In: Proceedings of the Third International Workshop on Managing Technical Debt (MTD), 2012, p. 15–22. IEEE Press. https://doi.org/10.1109/MTD.2012.6225993.

  20. Fontana FA, Mäntylä MV, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng. 2016;21(3):1143–91. https://doi.org/10.1007/s10664-015-9378-4.

    Article  Google Scholar 

  21. Fowler M. Refactoring: improving the design of existing code. Boston: Addison-Wesley Professional; 1999.

    MATH  Google Scholar 

  22. Gall HC, Lanza M. Software evolution: analysis and visualization. In: Proceedings of the 28th International Conference on Software Engineering (ICSE), 2006, p. 1055–6. ACM. https://doi.org/10.1145/1134285.1134502.

  23. Gelenbe E, Zhang Y. Performance optimization with energy packets. IEEE Syst J. 2019;13(4):3770–80.

    Article  Google Scholar 

  24. Gelenbe E, Zhang Y. Sharing energy for optimal edge performance. In: SOFSEM 2020: Theory and Practice of Computer Science, 2020, p. 24–36. Springer International Publishing, Cham.

  25. Giger E, Pinzger M, Gall HC. Can we predict types of code changes? An empirical analysis. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), 2012, p. 217–26. https://doi.org/10.1109/MSR.2012.6224284. ISSN: 2160-1852.

  26. Godfrey MW, German DM. The past, present, and future of software evolution. In: Frontiers of software maintenance (FoSM), 2008, p. 129–38. IEEE. https://doi.org/10.1109/FOSM.2008.4659256.

  27. Gondra I. Applying machine learning to software fault-proneness prediction. J Syst Softw. 2008;81(2):186–95. https://doi.org/10.1016/j.jss.2007.05.035.

    Article  Google Scholar 

  28. Goulão M, Fonte N, Wermelinger M, e Abreu FB. Software evolution prediction using seasonal time analysis: a comparative study. In: 16th European Conference on Software Maintenance and Reengineering (CSMR), 2012, p. 213–22. IEEE. https://doi.org/10.1109/CSMR.2012.30.

  29. Griffith I, Reimanis D, Izurieta C, Codabux Z, Deo A, Williams B. The correspondence between software quality models and technical debt estimation approaches. In: Sixth International Workshop on Managing Technical Debt (MTD), 2014, p. 19–26. IEEE. https://doi.org/10.1109/MTD.2014.13.

  30. He Z, Shu F, Yang Y, Li M, Wang Q. An investigation on the feasibility of cross-project defect prediction. Automat Softw Eng. 2012;19(2):167–99 (Publisher: Springer).

    Article  Google Scholar 

  31. Izurieta C, Vetrò A, Zazworka N, Cai Y, Seaman C, Shull F. Organizing the technical debt landscape. In: Proceedings of the Third International Workshop on Managing Technical Debt (MTD), MTD ’12, 2012, p. 23–26. IEEE Press. https://doi.org/10.5555/2666036.2666040. Event-place: Zurich, Switzerland.

  32. Kadioglu YM, Gelenbe E. Product-form solution for cascade networks with intermittent energy. IEEE Syst J. 2018;13(1):918–27.

    Article  Google Scholar 

  33. Kalouptsoglou I, Siavvas M, Tsoukalas D, Kehagias D. Cross-project vulnerability prediction based on software metrics and deep learning. In: Computational Science and its Applications—ICCSA 2020, 2020, p. 877–93. Springer International Publishing, Cham.

  34. Kenmei B, Antoniol G, Di Penta M. Trend analysis and issue prediction in large-scale open source systems. In: 12th European Conference on Software Maintenance and Reengineering (CSMR), 2008, p. 73–82. IEEE. https://doi.org/10.1109/CSMR.2008.4493302.

  35. Khoshgoftaar TM, Allen EB, Deng J. Using regression trees to classify fault-prone software modules. IEEE Trans Reliab. 2002;51(4):455–62. https://doi.org/10.1109/TR.2002.804488.

    Article  Google Scholar 

  36. Kitchenham BA, Mendes E, Travassos GH. Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng. 2007;33(5):316–29.

    Article  Google Scholar 

  37. Kouros P, Chaikalis T, Arvanitou EM, Chatzigeorgiou A, Ampatzoglou A, Amanatidis T. Jcaliper: search-based technical debt management. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, 2019, p. 1721–1730.

  38. Lehman MM. Programs, life cycles, and laws of software evolution. Proc IEEE. 1980;68(9):1060–76. https://doi.org/10.1109/PROC.1980.11805.

    Article  Google Scholar 

  39. Lenarduzzi V, Saarimäki N, Taibi D. The technical debt dataset. In: Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering, 2019, p. 2–11.

  40. Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. J Syst Softw. 2015;101:193–220. http://www.sciencedirect.com/science/article/pii/S09505849150017430.

    Article  Google Scholar 

  41. Malhotra R, Lata K. On the application of cross-project validation for predicting maintainability of open source software using machine learning techniques. In: 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), 2018, p. 175–81. IEEE. https://doi.org/10.1109/ICRITO.2018.8748749.

  42. Maragos K, Lentaris G, Soudris D. In-the-field mitigation of process variability for improved FPGA performance. IEEE Trans Comput. 2019;68(7):1049–63.

    Article  MathSciNet  Google Scholar 

  43. Marinescu R. Assessing technical debt by identifying design flaws in software systems. IBM J Res Dev. 2012;56(5):1–9. http://www.sciencedirect.com/science/article/pii/S09505849150017431.

    Article  Google Scholar 

  44. Menzies T, Butcher0 A, Marcus A, Zimmermann T, Cok D. Local vs. global models for effort estimation and defect prediction. In: 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), 2011, p. 343–51.

  45. Nagappan N, Ball T, Zeller A. Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering (ICSE), 2006, p. 452–61. ACM. https://doi.org/10.1145/1134285.1134349.

  46. Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng. 2018;23(3):1188–221. http://www.sciencedirect.com/science/article/pii/S09505849150017432.

    Article  Google Scholar 

  47. Papadopoulos L, Marantos C, Digkas G, Ampatzoglou A, Chatzigeorgiou A, Soudris D. Interrelations between software quality metrics, performance and energy consumption in embedded applications. In: Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems, 2018, p. 62–5.

  48. Raja U, Hale DP, Hale JE. Modeling software evolution defects: a time series approach. J Softw Maint Evol Res Pract. 2009;21(1):49–71. http://www.sciencedirect.com/science/article/pii/S09505849150017433.

    Article  Google Scholar 

  49. Roumani Y, Nwankpa JK, Roumani YF. Time series modeling of vulnerabilities. Comput Secur. 2015;51:32–40. https://doi.org/10.1016/j.cose.2015.03.003. http://www.sciencedirect.com/science/article/pii/S09505849150017434.

  50. Sas D, Avgeriou P, Fontana FA. Investigating instability architectural smells evolution: an exploratory case study. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2019, p. 557–67. IEEE.

  51. Siavvas M, Gelenbe E. Optimum checkpointing for long-running programs. In: 15th China-Europe International Symposium on Software Engineering Education, 2019.

  52. Siavvas M, Gelenbe E. Optimum checkpoints for programs with loops. Simul Model Pract Theory. 2019;97:101951.

    Article  Google Scholar 

  53. Siavvas M, Gelenbe E. Optimum interval for application-level checkpoints. In: 2019 6th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2019 5th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), 2019, p. 145–50. IEEE.

  54. Siavvas M, Gelenbe E, Kehagias D, Tzovaras D. Static analysis-based approaches for secure software development. In: Security in Computer and Information Sciences, 2018, p. 142–57. Springer International Publishing, Cham.

  55. Siavvas M, Marantos C, Papadopoulos L, Kehagias D, Soudris D, Tzovaras D. On the Relationship between software security and energy consumption. In: 15th China-Europe International Symposium on Software Engineering Education, 2019.

  56. Siavvas M, Tsoukalas D, Jankovic M, Kehagias D, Chatzigeorgiou A, Tzovaras D, Anicic N, Gelenbe E. An empirical evaluation of the relationship between technical debt and software security. In: 9th International Conference on Information Society and Technology (ICIST), vol. 2019, 2019.

  57. Siavvas M, Tsoukalas D, Jankovic M, Kehagias D, Tzovaras D. Technical debt as an indicator of software security risk: a machine learning approach for software development enterprises. Enterp Inf Syst. 2020. http://www.sciencedirect.com/science/article/pii/S09505849150017435.

    Article  Google Scholar 

  58. Skourletopoulos G, Mavromoustakis CX, Bahsoon R, Mastorakis G, Pallis E. Predicting and quantifying the technical debt in cloud software engineering. In: 19th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 2014, p. 36–40. IEEE. https://doi.org/10.1109/CAMAD.2014.7033201.

  59. Tan J, Lungu M, Avgeriou P. Towards studying the evolution of technical debt in the python projects from the apache software ecosystem. In: 17th Belgium-Netherlands Software Evolution Workshop (BENEVOL), 2018, p. 43–5.

  60. Tsoukalas D, Jankovic M, Siavvas M, Kehagias D, Chatzigeorgiou A, Tzovaras D. On the applicability of time series models for technical debt forecasting. In: 15th China-Europe International Symposium on Software Engineering Education (CEISEE 2019), 2019. https://doi.org/10.13140/RG.2.2.33152.79367. (In press).

  61. Tsoukalas D, Kehagias D, Siavvas M, Chatzigeorgiou A. Technical Debt Forecasting: an empirical study on open-source repositories. J Syst Softw. 2020;170:110777. https://doi.org/10.1016/j.jss.2020.110777. http://www.sciencedirect.com/science/article/pii/S09505849150017436.

  62. Tsoukalas D, Mathioudaki M, Siavvas M, Kehagias D, Chatzigeorgiou A. A clustering approach towards cross-project technical debt forecasting—supporting material, 2020. https://sites.google.com/view/clusteringtd-forecasting/appendix. Accessed 1 Oct 2020. 

  63. Tsoukalas D, Siavvas M, Jankovic M, Kehagias D, Chatzigeorgiou A, Tzovaras D. Methods and tools for TD estimation and forecasting: a state-of-the-art survey. In: International Conference on Intelligent Systems (IS 2018), 2018, p. 698–705. IEEE. https://doi.org/10.1109/IS.2018.8710521.

  64. Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng. 2009;14(5):540–78 (Publisher: Springer).

    Article  Google Scholar 

  65. Vetro’ A. Using automatic static analysis to identify technical debt. In: Proceedings of the 34th International Conference on Software Engineering (ICSE), ICSE ’12, 2012, p. 1613–5. IEEE Press. https://doi.org/10.5555/2337223.2337499. Event-place: Zurich, Switzerland.

  66. Watanabe S, Kaiya H, Kaijiri K. Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, PROMISE ’08, pp. 19–24. Association for Computing Machinery, New York, NY, USA, 2008. https://doi.org/10.1145/1370788.1370794. Event-place: Leipzig, Germany.

  67. Xygkis A, Soudris D, Papadopoulos L, Yous S, Moloney D. Efficient winograd-based convolution kernel implementation on edge devices. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 2018, p. 1–6. IEEE.

  68. Yazdi HS, Mirbolouki M, Pietsch P, Kehrer T, Kelter U. Analysis and prediction of design model evolution using time series. In: International Conference on Advanced Information Systems Engineering (CAiSE), 2014, p. 1–15. Springer. https://doi.org/10.1007/978-3-319-07869-4_1.

  69. Zazworka N, Izurieta C, Wong S, Cai Y, Seaman C, Shull F. others: Comparing four approaches for technical debt identification. Softw Quality J. 2014;22(3):403–26. http://www.sciencedirect.com/science/article/pii/S09505849150017438.

    Article  Google Scholar 

Download references

Acknowledgements

This work is funded by the European Union’s Horizon 2020 Research and Innovation Programme through SDK4ED project under Grant Agreement no. 780572.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitrios Tsoukalas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection Interaction between Energy Consumption, Quality of Service, Reliability and Security, Maintainability of Computer Systems and Network guest edited by Erol Gelenbe.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsoukalas, D., Mathioudaki, M., Siavvas, M. et al. A Clustering Approach Towards Cross-Project Technical Debt Forecasting. SN COMPUT. SCI. 2, 22 (2021). https://doi.org/10.1007/s42979-020-00408-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00408-4

Keywords

Navigation