Skip to main content
Log in

Detecting Semantic Clones in Microservices Using Components

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

The growing popularity of enterprise technologies for decentralized systems leads to commonalities in using components. This direction, however, opens new challenges to code clone detection. Approaches can no longer look at the low-level code but must deal with the higher-level component semantics. Yet, not many works addressed this trend. One of the quality issues that can be identified in large systems is duplicated behavior with different syntactic structures. It is crucial to detect these issues for enterprises where software’s codebase(s) grows and evolves, and maintenance costs rise significantly. This issue is referred to as a semantic clone. The detection of semantic clones requires semantic information about the given program. Unfortunately, while many code clone detection techniques are proposed, there is a lack of solutions targeted explicitly toward enterprise systems and even fewer solutions dedicated to semantic clones. To reason about semantic clones, we consider different pairs of component call-graphs in the system. Since different component types are common in enterprise systems, we can ensure that only relevant fragments are matched, using targeted enterprise metadata. When applied to an established system benchmark, our method indicates high accuracy in detecting semantic clones. We also assessed different system versions to elaborate on the method’s applicability to decentralized system evolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The dataset generated in this work is available at https://doi.org/10.5281/zenodo.7632839 and https://doi.org/10.5281/zenodo.7632842. Our prototype tools are available at GitHub as open source: Semantic Clone Detector: https://github.com/cloudhubs/Distributed-Systems-Semantic-Clone-Detector  Gradle Plugin:  https://github.com/cloudhubs/prophet-gradle-plugin,  Interactive Tool: https://github.com/svacina/prophet.

Notes

  1. Train-Ticket benchmark: https://github.com/FudanSELab/train-ticket, accessed on 2/5/2023.

  2. Wanxin benchmark: https://github.com/mikuhuyo/wanxin-p2p, accessed on 2/5/2023.

  3. Swarm benchmark: https://github.com/macrozheng/mall-swarm, accessed on 2/5/2023.

  4. Syntactic Clone results from [22]: https://microservicedata.github.io, accessed on 2/5/2023.

  5. Note that while the examples and implementation demonstrations of our method are specific to the Java platform, it is not limited to just this platform

  6. The domain of this binary output is usually modeled in one of two ways: \(\{1,0\}\) or \(\{1,-1\}\) for positive and negative classes respectively. We assume the \(\{1,0\}\) model for the purposes of the explanation.

  7. Our Prototype:https://github.com/cloudhubs/Distributed-Systems-Semantic-Clone-Detector, accessed on 2/5/2023.

  8. WS4J: https://github.com/Sciss/ws4j, accessed on 2/5/2023.

  9. Our Semantic Clone Dataset (V1): https://zenodo.org/record/7632839, accessed on 2/11/2023.

  10. Our Semantic Clone Dataset (V2): https://zenodo.org/record/7632842, accessed on 2/11/2023.

  11. SonarQube: https://www.sonarqube.org, accessed on 2/5/2023.

  12. Our Interactive Tool: https://github.com/svacina/prophet, accessed on 2/5/2023.

  13. VSCode IDE: https://code.visualstudio.com, accessed on 2/5/2023.

  14. Our Plugin: https://github.com/cloudhubs/prophet-gradle-plugin, accessed on 2/5/2023.

References

  1. Besker T, Martini A, Bosch J. Technical debt cripples software developer productivity: A longitudinal study on developers’ daily software development work. In: Proceedings of the 2018 International Conference on Technical Debt. TechDebt ’2018:18,105–114; Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3194164.3194178

  2. Ain QU, Butt WH, Anwar MW, Azam F, Maqbool B. A systematic review on code clone detection. IEEE Access. 2019;7:86121–44.

    Article  Google Scholar 

  3. Baker BS. On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd working conference on reverse engineering, 1995;86–95 https://doi.org/10.1109/WCRE.1995.514697. IEEE

  4. Ducasse S, Rieger M, Demeyer S. A language independent approach for detecting duplicated code. In: Proceedings IEEE international conference on software maintenance-1999 (ICSM’99).’Software maintenance for business change’(Cat. No. 99CB36360), 1999;109–118 . https://doi.org/10.1109/ICSM.1999.792593. IEEE

  5. Higo Y, Kusumoto S, Inoue K. A metric-based approach to identifying refactoring opportunities for merging code clones in a java software system. J Softw Maint Evol: Res Pract. 2008;20(6):435–61. https://doi.org/10.1002/smr.394.

    Article  Google Scholar 

  6. Kumar A, Yadav R, Kumar K. A systematic review of semantic clone detection techniques in software systems. In: IOP conference series: materials science and engineering, 2021;1022:012074 https://doi.org/10.1088/1757-899X/1022/1/012074. IOP Publishing

  7. Vislavski, T., Rakić, G., Cardozo, N., Budimac, Z.: Licca: A tool for cross-language clone detection. In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER), pp. 512–516 (2018). https://doi.org/10.1109/SANER.2018.8330250. IEEE

  8. Saini V, Farmahinifarahani F, Lu Y, Baldi P, Lopes CV.: Oreo: Detection of clones in the twilight zone. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp. 2018;354–365 https://doi.org/10.5281/zenodo.1317760

  9. Svacina J, Bushong V, Das D, Cernỳ, T. Semantic code clone detection method for distributed enterprise systems. In: CLOSER, pp. 27–37 (2022). https://doi.org/10.5220/0011032200003200

  10. Roy CK, Cordy JR. A survey on software clone detection research. Queen’Sch Comput TR. 2007;541(115):64–8.

    Google Scholar 

  11. Svajlenko J, Roy CK Evaluating clone detection tools with bigclonebench. In: 2015 IEEE international conference on software maintenance and evolution (ICSME), pp. 131–140 (2015). https://doi.org/10.1109/ICSM.2015.7332459. IEEE

  12. Nasirloo H, Azimzadeh F Semantic code clone detection using abstract memory states and program dependency graphs. In: 2018 4th international conference on web research (ICWR) 2018:19–27 https://doi.org/10.1109/ICWR.2018.8387232. IEEE

  13. Wu, Y., Zou, D., Dou, S., Yang, S., Yang, W., Cheng, F., Liang, H., Jin, H.: Scdetector: software functional clone detection based on semantic tokens analysis. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering, pp. 821–833 (2020). https://doi.org/10.1145/3324884.3416562

  14. Vislavski T, Rakić G, Cardozo N, Budimac Z. Licca: A tool for cross-language clone detection. In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER), 2018;512–516 https://doi.org/10.1109/SANER.2018.8330250. IEEE

  15. Alomari HW, Stephan M. Clone detection through srcclone: a program slicing based approach. J Syst Softw. 2022;184: 111115. https://doi.org/10.1016/j.jss.2021.111115.

    Article  Google Scholar 

  16. Juergens E, Deissenboeck F, Hummel B Code similarities beyond copy & paste. In: 2010 14th European conference on software maintenance and reengineering,2010;78–87 : https://doi.org/10.1109/CSMR.2010.33. IEEE

  17. Sheneamer A, Kalita J. A survey of software clone detection techniques. Int J Comput Appl. 2016;137(10):1–21.

    Google Scholar 

  18. Marcus, A., Maletic, J.I.: Identification of high-level concept clones in source code. In: Proceedings 16th annual international conference on automated software engineering (ASE 2001), pp. 107–114 (2001). https://doi.org/10.1109/ASE.2001.989796. IEEE

  19. Sheneamer A, Roy S, Kalita J. A detection framework for semantic code clones and obfuscated code. Expert Syst Appl. 2018;97:405–20. https://doi.org/10.1016/j.eswa.2017.12.040.

    Article  Google Scholar 

  20. Fang C, Liu Z, Shi Y, Huang J, Shi Q. Functional code clone detection with syntax and semantics fusion learning. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp. 2020;516–527 https://doi.org/10.1145/3395363.3397362

  21. Alrabaee S, Wang L, Debbabi M. Bingold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (sfgs). Digit Investig. 2016;18:11–22. https://doi.org/10.1016/j.diin.2016.04.002.

    Article  Google Scholar 

  22. Zhao Y, Mo R, Zhang Y, Zhang S, Xiong P. Exploring and understanding cross-service code clones in microservice projects. In: 2022 IEEE/ACM 30th international conference on program comprehension (ICPC), 2022:449–459 ; https://doi.org/10.1145/3524610.3527925. IEEE

  23. Kamiya T, Kusumoto S, Inoue K. A token-based code clone detection tool-ccfinder and its empirical evaluation. Techinal report, Osaka University, Department of Information and Computer Scineces, Inoue Laboratory (2000)

  24. Papadimitriou CH, Raghavan P, Tamaki H, Vempala S. Latent semantic indexing: a probabilistic analysis. J Comput Syst Sci. 2000;61(2):217–35. https://doi.org/10.1006/jcss.2000.1711.

    Article  MathSciNet  MATH  Google Scholar 

  25. Hou C, Nie F, Li X, Yi D, Wu Y. Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans Cybern. 2013;44(6):793–804. https://doi.org/10.1109/TCYB.2013.2272642.

    Article  Google Scholar 

  26. Baldi P, Chauvin Y. Neural networks for fingerprint recognition. Neural Comput. 1993;5(3):402–18. https://doi.org/10.1162/neco.1993.5.3.402.

    Article  Google Scholar 

  27. Weiser M. Program slicing. IEEE Trans Softw Eng SE. 1984;10(4):352–7. https://doi.org/10.1109/TSE.1984.5010248.

    Article  MATH  Google Scholar 

  28. Alomari HW, Collard ML, Maletic JI, Alhindawi N, Meqdadi O. srcslice: very efficient and scalable forward static slicing. J Softw: Evol Proc. 2014;26(11):931–61. https://doi.org/10.1002/smr.1651.

    Article  Google Scholar 

  29. Rakić G. Extendable and adaptable framework for input language independent static analysis. PhD thesis, University of Novi Sad (Serbia) 2015

  30. Koschke R, Falke R, Frenzel P. Clone detection using abstract syntax suffix trees. In: 2006 13th Working conference on reverse engineering, pp. 2006;253–262 https://doi.org/10.1109/WCRE.2006.18. IEEE

  31. Cordy JR, Roy CK. The nicad clone detector. In: 2011 IEEE 19th international conference on program comprehension, pp. 219–220 (2011). https://doi.org/10.1109/ICPC.2011.26. IEEE

  32. Koschke R, Baxter ID, Conradt M, Cordy JR. Software clone management towards industrial application (dagstuhl seminar 12071). In: Dagstuhl Reports, vol. 2 (2012). https://doi.org/10.4230/DagRep.2.2.21. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

  33. Schiewe M, Curtis J, Bushong V, Cerny T. Advancing static code analysis with language-agnostic component identification. IEEE Access. 2022;10:30743–61. https://doi.org/10.1109/ACCESS.2022.3160485.

    Article  Google Scholar 

  34. JBoss: Javassist : Java bytecode engineering toolkit (2020). https://www.javassist.org Accessed 2021-06-18

  35. Christiane F, Brown K. Wordnet and wordnets. In: Encyclopedia of Language and Linguistics. UK, Oxford: Elsevier; 2005. p. 665–70.

    Google Scholar 

  36. Bishop CM, Nasrabadi NM. Pattern recognition and machine learning 4(4) (2006)

  37. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning 2013;112

  38. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

    MathSciNet  MATH  Google Scholar 

  39. Manning CD, Introduction to information retrieval 2008.

  40. Lemnaru C, Potolea R Imbalanced classification problems: systematic study, issues and best practices. In: Enterprise information systems: 13th international conference, ICEIS 2011, Beijing, China, June 8-11, 2011, Revised Selected Papers 13, pp. 35–50 (2012). https://doi.org/10.1007/978-3-642-29958-2_3. Springer

  41. Abu-Mostafa YS, Magdon-Ismail M, Lin H-T. Learn Data, vol. 4. NY, USA: AMLBook New York; 2012.

    Google Scholar 

  42. Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A. Experimentation in Software Engineering: An Introduction. Germany: The Kluwer International Series In Software Engineering. Springer; 2000.

    Book  MATH  Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1854049 and a grant from Red Hat Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomas Cerny.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Cloud Computing and Services Science” guest edited by Donald F. Ferguson, Claus Pahl and Maarten van Steen.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdelfattah, A.S., Rodriguez, A., Walker, A. et al. Detecting Semantic Clones in Microservices Using Components. SN COMPUT. SCI. 4, 470 (2023). https://doi.org/10.1007/s42979-023-01910-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-01910-1

Keywords

Navigation