skip to main content
10.1145/3569966.3570091acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsseConference Proceedingsconference-collections
research-article

A systematic literature review of clone evolution

Published: 20 December 2022 Publication History

Abstract

Code clones are identical or nearly similar code fragments often introduced into software systems by programmers with software modification and maintenance. During the evolution of the software system, code clones may experience multiple changes, such as the increase in number, disappearance, location change, etc. These changes increase the difficulty of clone management and possibly introduce bugs into the software, leading to the high price of clone management and maintenance. Therefore, it is necessary to study the clone evolution. In this paper, we summarize the research works in code clone evolution in recent decades. Based on the previous review and survey, we found a total of 47 relevant papers and divided them into five categories with the help of the LDA model. We present our analysis of the current research and discussion about the possible future progress in this paper. The final result of the debate is that we believe the future work will divide into two aspects. On the one hand, developing clone management tools based on the current results become a possible direction; on the other hand, development and improvement may appear in existing tools with more theoretical support due to more knowledge of the evolutionary characteristics of clones.

References

[1]
Liliane Barbour, Foutse Khomh, and Ying Zou. 2011. Late propagation in software clones. In 2011 27th IEEE International Conference on Software Maintenance (ICSM). IEEE Computer Society, Los Alamitos, CA, USA, 273–282.
[2]
Saman Bazrafshan. 2012. Evolution of near-miss clones. In 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation. IEEE Computer Society, Los Alamitos, CA, USA, 74–83.
[3]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
[4]
Dongxiang Cai and Miryung Kim. 2011. An empirical study of long-lived code clones. In International Conference on Fundamental Approaches to Software Engineering. Springer Berlin Heidelberg, Berlin, Heidelberg, 432–446.
[5]
Elder Vicente de Paulo Sobrinho, Andrea De Lucia, and Marcelo de Almeida Maia. 2018. A systematic literature review on bad smells–5 w’s: which, when, what, who, where. IEEE Transactions on Software Engineering 47, 1 (2018), 17–66.
[6]
Anjie Fang, Craig Macdonald, Iadh Ounis, and Philip Habel. 2016. Examining the coherence of the top ranked tweet topics. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 825–828.
[7]
Katia Romero Felizardo, Emilia Mendes, Marcos Kalinowski, Érica Ferreira Souza, and Nandamudi L Vijaykumar. 2016. Using forward snowballing to update systematic reviews in software engineering. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. Association for Computing Machinery, New York, NY, USA, 1–6.
[8]
Nils Göde and Jan Harder. 2011. Clone stability. In 2011 15th European Conference on Software Maintenance and Reengineering. IEEE Computer Society, Los Alamitos, CA, USA, 65–74.
[9]
Nils Göde and Rainer Koschke. 2011. Frequency and risks of changes to clones. In Proceedings of the 33rd International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 311–320.
[10]
Michael W Godfrey, Daniel M German, Julius Davies, and Abram Hindle. 2011. Determining the provenance of software artifacts. In Proceedings of the 5th International Workshop on Software Clones. Association for Computing Machinery, New York, NY, USA, 65–66.
[11]
Anfernee Goon, Yuhao Wu, Makoto Matsushita, and Katsuro Inoue. 2017. Evolution of code clone ratios throughout development history of open-source C and C++ programs. In 2017 IEEE 11th International Workshop on Software Clones (IWSC). IEEE Computer Society, Los Alamitos, CA, USA, 1–7.
[12]
Jan Harder. 2013. How multiple developers affect the evolution of code clones. In 2013 IEEE International Conference on Software Maintenance. IEEE Computer Society, Los Alamitos, CA, USA, 30–39.
[13]
Mahedi Hasan, Anichur Rahman, Md Karim, Md Khan, Saikat Islam, Md Islam, 2021. Normalized approach to find optimal number of topics in Latent Dirichlet Allocation (LDA). In Proceedings of International Conference on Trends in Computational and Cognitive Engineering. Springer Singapore, Singapore, 341–354.
[14]
Hirotaka Honda, Shogo Tokui, Kazuki Yokoi, Eunjong Choi, Norihiro Yoshida, and Katsuro Inoue. 2019. CCEvovis: A clone evolution visualization system for software maintenance. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE Computer Society, Los Alamitos, CA, USA, 122–125.
[15]
Bin Hu, Yijian Wu, Xin Peng, Jun Sun, Nanjie Zhan, and Jun Wu. 2021. Assessing Code Clone Harmfulness: Indicators, Factors, and Counter Measures. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 225–236.
[16]
Katsuro Inoue, Yusuke Sasaki, Pei Xia, and Yuki Manabe. 2012. Where does this code come from and where does it go?—Integrated code history tracker for open source systems. In 2012 34th International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 331–341.
[17]
Jaweria Kanwal, Hamid Abdul Basit, and Onaiza Maqbool. 2018. Structural clones: An evolution perspective. In 2018 IEEE 12th International Workshop on Software Clones (IWSC). IEEE Computer Society, Los Alamitos, CA, USA, 9–15.
[18]
Jaweria Kanwal, Katsuro Inoue, and Onaiza Maqbool. 2017. Refactoring patterns study in code clones during software evolution. In 2017 IEEE 11th International Workshop on Software Clones (IWSC). IEEE Computer Society, Los Alamitos, CA, USA, 1–2.
[19]
Jens Krinke. 2008. Is cloned code more stable than non-cloned code?. In 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation. IEEE Computer Society, Los Alamitos, CA, USA, 57–66.
[20]
Manishankar Mondai, Chanchal K Roy, and Kevin A Schneider. 2018. Micro-clones in evolving software. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 50–60.
[21]
Debajyoti Mondal, Manishankar Mondal, Chanchal K Roy, Kevin A Schneider, Shisong Wang, and Yukun Li. 2019. Towards visualizing large scale evolving clones. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE Computer Society, Los Alamitos, CA, USA, 302–303.
[22]
Manishankar Mondal, Md Saidur Rahman, Chanchal K Roy, and Kevin A Schneider. 2018. Is cloned code really stable?Empirical Software Engineering 23, 2 (2018), 693–770.
[23]
Manishankar Mondal, Banani Roy, Chanchal K Roy, and Kevin A Schneider. 2019. An empirical study on bug propagation through code cloning. Journal of Systems and Software 158 (2019), 110407.
[24]
Manishankar Mondal, Banani Roy, Chanchal K Roy, and Kevin A Schneider. 2019. Investigating context adaptation bugs in code clones. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 157–168.
[25]
Manishankar Mondal, Banani Roy, Chanchal K Roy, and Kevin A Schneider. 2020. Associating code clones with association rules for change impact analysis. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 93–103.
[26]
Manishankar Mondal, Banani Roy, Chanchal K Roy, and Kevin A Schneider. 2020. Investigating near-miss micro-clones in evolving software. In Proceedings of the 28th International Conference on Program Comprehension. Association for Computing Machinery, New York, NY, USA, 208–218.
[27]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2014. Automatic identification of important clones for refactoring and tracking. In 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. IEEE Computer Society, Los Alamitos, CA, USA, 11–20.
[28]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2014. A fine-grained analysis on the evolutionary coupling of cloned code. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE Computer Society, Los Alamitos, CA, USA, 51–60.
[29]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2014. Prediction and ranking of co-change candidates for clones. In Proceedings of the 11th Working Conference on Mining Software Repositories. Association for Computing Machinery, New York, NY, USA, 32–41.
[30]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2015. SPCP-Miner: A tool for mining code clones that are important for refactoring or tracking. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 484–488.
[31]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2017. Bug propagation through code cloning: An empirical study. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 227–237.
[32]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2017. Identifying code clones having high possibilities of containing bugs. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE Computer Society, Los Alamitos, CA, USA, 99–109.
[33]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2018. Bug-proneness and late propagation tendency of code clones: A comparative study on different clone types. Journal of Systems and Software 144 (2018), 41–59.
[34]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2020. A fine-grained analysis on the inconsistent changes in code clones. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 220–231.
[35]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2020. A survey on clone refactoring and tracking. Journal of Systems and Software 159 (2020), 110429.
[36]
Md Nadim, Manishankar Mondal, and Chanchal K Roy. 2020. Evaluating Performance of Clone Detection Tools in Detecting Cloned Cochange Candidates. In 2020 IEEE 14th International Workshop on Software Clones (IWSC). IEEE Computer Society, Los Alamitos, CA, USA, 15–21.
[37]
Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H Pham, Jafar Al-Kofahi, and Tien N Nguyen. 2011. Clone management for evolving software. IEEE transactions on software engineering 38, 5 (2011), 1008–1026.
[38]
Jeremy R Pate, Robert Tairas, and Nicholas A Kraft. 2013. Clone evolution: a systematic review. Journal of software: Evolution and Process 25, 3 (2013), 261–283.
[39]
Md Saidur Rahman and Chanchal K Roy. 2014. A change-type based empirical study on the stability of cloned code. In 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. IEEE Computer Society, Los Alamitos, CA, USA, 31–40.
[40]
Ripon K Saha, Chanchal K Roy, and Kevin A Schneider. 2011. Visualizing the evolution of code clones. In Proceedings of the 5th International Workshop on Software Clones. Association for Computing Machinery, New York, NY, USA, 71–72.
[41]
Alireza Savand. 2015. stop-words. https://github.com/Alir3z4/stop-words.
[42]
Niko Schwarz. 2012. Hot clones: Combining search-driven development, clone management, and code provenance. In 2012 34th International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 1628–1629.
[43]
Patanamon Thongtanunam, Weiyi Shang, and Ahmed E Hassan. 2019. Will this clone be short-lived? Towards a better understanding of the characteristics of short-lived clones. Empirical Software Engineering 24, 2 (2019), 937–972.
[44]
Brent van Bladel, Alessandro Murgia, and Serge Demeyer. 2017. An empirical study of clone density evolution and developer cloning tendency. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 551–552.
[45]
Stefan Wagner, Asim Abdulkhaleq, Kamer Kaya, and Alexander Paar. 2016. On the relationship of inconsistent software clones and faults: An empirical study. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE Computer Society, Los Alamitos, CA, USA, 79–89.
[46]
Wei Wang and Michael W Godfrey. 2011. A study of cloning in the Linux SCSI drivers. In 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation. IEEE Computer Society, Los Alamitos, CA, USA, 95–104.
[47]
Xiaoyin Wang, Yingnong Dang, Lu Zhang, Dongmei Zhang, Erica Lan, and Hong Mei. 2012. Can I clone this piece of code here?. In Proceedings of the 27th IEEE/ACM international conference on automated software engineering. Association for Computing Machinery, New York, NY, USA, 170–179.
[48]
Xiaoyin Wang, Yingnong Dang, Lu Zhang, Dongmei Zhang, Erica Lan, and Hong Mei. 2014. Predicting consistency-maintenance requirement of code clonesat copy-and-paste time. IEEE Transactions on Software Engineering 40, 8 (2014), 773–794.
[49]
Shuai Xie, Foutse Khomh, and Ying Zou. 2013. An empirical study of the fault-proneness of clone mutation and clone migration. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE Computer Society, Los Alamitos, CA, USA, 149–158.
[50]
Yuki Yamanaka, Eunjong Choi, Norihiro Yoshida, Katsuro Inoue, and Tateki Sano. 2012. Industrial application of clone change management system. In 2012 6th International Workshop on Software Clones (IWSC). IEEE Computer Society, Los Alamitos, CA, USA, 67–71.
[51]
Fanlong Zhang, Siau-cheng Khoo, and Xiaohong Su. 2017. Predicting change consistency in a clone group. Journal of Systems and Software 134 (2017), 105–119.
[52]
Xunhui Zhang, Tao Wang, Yue Yu, Yanzhi Zhang, Yan Zhong, and Huaimin Wang. 2022. The Development and Prospect of Code Clone. arXiv preprint arXiv:2202.08497(2022).
[53]
Radim Řehůřek. 2009. LDA model of gensim. https://radimrehurek.com/gensim/auto_examples/tutorials/run_lda.html.

Cited By

View all
  • (2025)Governing the commons: code ownership and code-clones in large-scale software developmentEmpirical Software Engineering10.1007/s10664-024-10598-730:2Online publication date: 1-Mar-2025
  • (2023)A Comprehensive Study on Code Clones in Automated Driving Software2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00053(1073-1085)Online publication date: 11-Sep-2023
Index terms have been assigned to the content through auto-classification.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CSSE '22: Proceedings of the 5th International Conference on Computer Science and Software Engineering
October 2022
753 pages
ISBN:9781450397780
DOI:10.1145/3569966
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clone evolution
  2. code clone
  3. systematic literature review

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CSSE 2022

Acceptance Rates

Overall Acceptance Rate 33 of 74 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Governing the commons: code ownership and code-clones in large-scale software developmentEmpirical Software Engineering10.1007/s10664-024-10598-730:2Online publication date: 1-Mar-2025
  • (2023)A Comprehensive Study on Code Clones in Automated Driving Software2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00053(1073-1085)Online publication date: 11-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media