Skip to main content

Advertisement

Log in

CloneRipples: predicting change propagation between code clone instances by graph-based deep learning

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Code clones are recognized as a code smell that may require additional effort for simultaneous changes of multiple clone instances during software maintenance. To alleviate quality threats caused by inconsistent changes in clone instances, it is essential to accurately and efficiently make the decisions of change propagation between code clone instances. Our exploratory study has revealed that a clone class can have both propagation-required changes and propagation-free changes and thus fine-grained change propagation decisions are required. Based on the findings, we propose a graph-based deep learning approach to predict the change propagation requirements of clone instances. We design a deep learning model that employs a Relational Graph Convolutional Network (R-GCN) to predict the clone change propagation requirement. In order to evaluate our approach, we construct a dataset that includes 24,672 pairs of matched changes and 38,041 non-matched changes based on 51 open-source Java projects. Experiment results show that the approach achieves high precision (83.1%), recall (81.2%), and F1-score (82.1%). We implemented an IntelliJ IDEA tool called CloneRipples to assist developers to decide the necessity of change propagation between code clone instances seamlessly in development environment. Manual inspection identified the chances for purifying the dataset by rectifying the data labels of non-matched changes. Extended experiments for various data purification strategies reveal feasible ways to improve the prediction effectiveness and generality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

All datasets used in this work are available at URL: https://github.com/FudanSELab/CodeCloneChangesDataset.

Notes

  1. URL: https://github.com/FudanSELab/CodeCloneChangesDataset

  2. Maven: https://github.com/apache/maven

  3. Ant: https://github.com/apache/ant

  4. DBeaver: https://github.com/dbeaver/dbeaver

  5. Tomcat: https://github.com/apache/tomcat

  6. Camel: https://github.com/apache/camel

  7. In this study, we focus on a specific type of late propagation, called late consistent changes (LCC) proposed by Hu et al. (2021), which consists of only a pair of matched change sets of the clones in two different commits. These matched changes are concrete evidence that they are actually propagated between the clone instances. More complicated types (Barbour et al. 2013) of late propagation, such as a change set matches part of another change set, are difficult to detect precisely. So we leave general late propagation for future work.

  8. https://github.com/YoshikiHigo/TinyPDG

References

  • Aversano L, Cerulo L, Penta MD (2007) How clones are maintained: An empirical study. In: Krikhaar RL, Verhoef C, Lucca GAD (eds) 11th European Conference on Software Maintenance and Reengineering, Software Evolution in Complex Software Intensive Systems, CSMR 2007, 21–23 March 2007. IEEE Comput Soc, Amsterdam, The Netherlands, pp 81–90

    Google Scholar 

  • Barbour L, Khomh F, Zou Y (2011) Late propagation in software clones. In: IEEE 27th International Conference on Software Maintenance, ICSM 2011, Williamsburg, VA, USA, September 25-30, 2011, IEEE Comput Soc pp 273–282

  • Barbour L, Khomh F, Zou Y (2013) An empirical study of faults in late propagation clone genealogies. J Softw Evol Process 25(11):1139–1165

    Article  Google Scholar 

  • Barbour L, An L, Khomh F, Zou Y, Wang S (2018) An investigation of the fault-proneness of clone evolutionary patterns. Softw Qual J 26(4):1187–1222

    Article  Google Scholar 

  • Bazrafshan S (2012) Evolution of near-miss clones. In: 12th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2012, Riva del Garda, Italy, September 23-24, 2012, IEEE Comput Soc pp 74–83

  • Cheng X, Zhong H, Chen Y, Hu Z, Zhao J (2016) Rule-directed code clone synchronization. In: 24th IEEE International Conference on Program Comprehension, ICPC 2016, Austin, TX, USA, May 16-17, 2016, IEEE Comput Soc pp 1–10

  • Duala-Ekoko E, Robillard MP (2007) Tracking code clones in evolving software. In: 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007, IEEE Comput Soc pp 158–167

  • Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) Codebert: A pre-trained model for programming and natural languages. CoRR abs/2002.08155

  • Ferrante J, Ottenstein KJ, Warren JD (1987) The program dependence graph and its use in optimization. ACM Trans Program Lang Syst 9(3):319–349. https://doi.org/10.1145/24039.24041

    Article  MATH  Google Scholar 

  • Göde N (2009) Evolution of type-1 clones. In: Ninth IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2009, Edmonton, Alberta, Canada, September 20-21, 2009, IEEE Comput Soc pp 77–86

  • Göde N, Harder J (2011) Oops! . . . I changed it again. In: Cordy JR, Inoue K, Jarzabek S, Koschke R (eds) Proceeding of the 5th ICSE International Workshop on Software Clones, IWSC 2011, Waikiki, Honolulu, HI, USA, May 23, 2011, ACM, pp 14–20

  • Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Taylor RN, Gall HC, Medvidovic N (eds) Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011, ACM, pp 311–320

  • Hata H, Mizuno O, Kikuno T (2011) Historage: fine-grained version control system for java. In: Cleve A, Robbes R (eds) Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution, EVOL/IWPSE 2011, Szeged, Hungary, September 5-6, 2011, ACM, pp 96–100

  • Higo Y, Kusumoto S (2011) Code clone detection on specialized pdgs with heuristics. In: Mens T, Kanellopoulos Y, Winter A (eds) 15th European Conference on Software Maintenance and Reengineering, CSMR 2011, 1–4 March 2011. IEEE Comput Soc, Oldenburg, Germany, pp 75–84

    Chapter  MATH  Google Scholar 

  • Hu B, Wu Y, Peng X, Sun J, Zhan N, Wu J (2021) Assessing code clone harmfulness: Indicators, factors, and counter measures. In: 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, Honolulu, HI, USA, March 9-12, 2021, IEEE, pp 225–236

  • Hu B, Wu Y, Peng X, Sha C, Wang X, Fu B, Zhao W (2022) Predicting change propagation between code clone instances by graph-based deep learning. In: Rastogi A, Tufano R, Bavota G, Arnaoudova V, Haiduc S (eds) Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Virtual Event, May 16-17, 2022, ACM, pp 425–436

  • Huang K, Chen B, Peng X, Zhou D, Wang Y, Liu Y, Zhao W (2018) Cldiff: generating concise linked code differences. In: Huchard M, Kästner C, Fraser G (eds) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, ACM, pp 679–690

  • Inoue K, Higo Y, Yoshida N, Choi E, Kusumoto S, Kim K, Park W, Lee E (2012) Experience of finding inconsistently-changed bugs in code clones of mobile software. In: Cordy JR, Inoue K, Koschke R, Krinke J, Roy CK (eds) Proceeding of the 6th International Workshop on Software Clones, IWSC 2012, Zurich, Switzerland, June 4, 2012, ACM, pp 94–95

  • Islam MR, Zibran MF (2018) On the characteristics of buggy code clones: A code quality perspective. In: 12th IEEE International Workshop on Software Clones, IWSC 2018, Campobasso, Italy, March 20, 2018, IEEE Comput Soc pp 23–29

  • Jablonski P, Hou D (2010) Aiding software maintenance with copy-and-paste clone-awareness. In: The 18th IEEE International Conference on Program Comprehension, ICPC 2010, Braga, Minho, Portugal, June 30-July 2, 2010, IEEE Computer Society, pp 170–179

  • Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Software Eng 28(7):654–670

    Article  MATH  Google Scholar 

  • Kapser C, Godfrey MW (2006) “cloning considered harmful" considered harmful. 13th Working Conference on Reverse Engineering (WCRE 2006), 23–27 October 2006. Italy, IEEE Computer Society, Benevento, pp 19–28

    Chapter  Google Scholar 

  • Kapser C, Godfrey MW (2008) “cloning considered harmful" considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645–692

    Article  Google Scholar 

  • Kim M, Sazawal V, Notkin D, Murphy GC (2005) An empirical study of code clone genealogies. In: Wermelinger M, Gall HC (eds) Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, ACM, pp 187–196

  • Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

  • Krinke J (2007) A study of consistent and inconsistent changes to code clones. 14th Working Conference on Reverse Engineering (WCRE 2007), 28–31 October 2007. BC, Canada, IEEE Computer Society, Vancouver, pp 170–178

    Chapter  Google Scholar 

  • Krinke J (2008) Is cloned code more stable than non-cloned code? Eighth IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2008), 28–29 September 2008. China, IEEE Computer Society, Beijing, pp 57–66

    MATH  Google Scholar 

  • Li G, Wu Y, Roy CK, Sun J, Peng X, Zhan N, Hu B, Ma J (2020) SAGA: efficient and large-scale detection of near-miss clones with GPU acceleration. In: 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020, IEEE, pp 272–283

  • Lin Y, Peng X, Xing Z, Zheng D, Zhao W (2015) Clone-based and interactive recommendation for modifying pasted code. In: Nitto ED, Harman M, Heymans P (eds) Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015, ACM, pp 520–531

  • Lozano A, Wermelinger M, Nuseibeh B (2007) Evaluating the harmfulness of cloning: A change based experiment. In: Fourth International Workshop on Mining Software Repositories, MSR 2007 (ICSE Workshop), Minneapolis, MN, USA, May 19-20, 2007, Proceedings, IEEE Computer Society, p 18

  • Mondal M, Roy CK, Schneider KA (2014a) Automatic ranking of clones for refactoring through mining association rules. In: Demeyer S, Binkley DW, Ricca F (eds) 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, CSMR-WCRE 2014, Antwerp, Belgium, February 3-6, 2014, IEEE Computer Society, pp 114–123

  • Mondal M, Roy CK, Schneider KA (2014) Prediction and ranking of co-change candidates for clones. In: Devanbu PT, Kim S, Pinzger M (eds) 11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31 - June 1, 2014. ACM, Hyderabad, India, pp 32–41

    Google Scholar 

  • Mondal M, Roy CK, Schneider KA (2016) A comparative study on the intensity and harmfulness of late propagation in near-miss code clones. Softw Qual J 24(4):883–915

    Article  MATH  Google Scholar 

  • Mondal M, Rahman MS, Roy CK, Schneider KA (2018) Is cloned code really stable? Empir Softw Eng 23(2):693–770

    Article  Google Scholar 

  • Mondal M, Roy CK, Schneider KA (2018) Bug-proneness and late propagation tendency of code clones: A comparative study on different clone types. J Syst Softw 144:41–59

    Article  MATH  Google Scholar 

  • Nguyen HA, Nguyen TT, Pham NH, Al-Kofahi JM, Nguyen TN (2012) Clone management for evolving software. IEEE Trans Software Eng 38(5):1008–1026

    Article  MATH  Google Scholar 

  • Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi JM, Nguyen TN (2009) Clone-aware configuration management. In: ASE 2009, 24th IEEE/ACM International Conference on Automated Software Engineering, Auckland, New Zealand, November 16-20, 2009, IEEE Computer Society, pp 123–134

  • Ragkhitwetsagul C, Krinke J, Clark D (2018) A comparison of code similarity analysers. Empir Softw Eng 23(4):2464–2519

    Article  MATH  Google Scholar 

  • Rahman F, Bird C, Devanbu PT (2012) Clones: what is that smell? Empir Softw Eng 17(4–5):503–530. https://doi.org/10.1007/s10664-011-9195-3

    Article  Google Scholar 

  • Ruff L, Görnitz N, Deecke L, Siddiqui SA, Vandermeulen RA, Binder A, Müller E, Kloft M (2018) Deep one-class classification. In: Dy JG, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, PMLR, Proceedings of Machine Learning Research, vol 80, pp 4390–4399

  • Saha RK, Roy CK, Schneider KA (2011) An automatic framework for extracting and classifying near-miss clone genealogies. In: IEEE 27th International Conference on Software Maintenance, ICSM 2011, Williamsburg, VA, USA, September 25-30, 2011, IEEE Computer Society, pp 293–302

  • Sajnani H, Saini V, Svajlenko J, Roy CK, Lopes CV (2016) Sourcerercc: scaling code clone detection to big-code. In: Proceedings of the 38th Internationa Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016, pp 1157–1168

  • Schlichtkrull MS, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: Gangemi A, Navigli R, Vidal M, Hitzler P, Troncy R, Hollink L, Tordai A, Alam M (eds) The Semantic Web - 15th International Conference, ESWC 2018, Proceedings, Springer, Lecture Notes in Computer Science, vol 10843, pp 593–607

  • Tokui S, Yoshida N, Choi E, Inoue K (2020) Clone notifier: Developing and improving the system to notify changes of code clones. In: Kontogiannis K, Khomh F, Chatzigeorgiou A, Fokaefs M, Zhou M (eds) 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020, IEEE, pp 642–646

  • Uemura K, Mori A, Choi E, Iida H (2019) Tracking method-level clones and a case study. In: Choi E, Hou D (eds) 13th IEEE International Workshop on Software Clones, IWSC 2019, Hangzhou, China, February 24, 2019, IEEE, pp 27–33

  • Wang X, Dang Y, Zhang L, Zhang D, Lan E, Mei H (2014) Predicting consistency-maintenance requirement of code clones at copy-and-paste time. IEEE Trans Software Eng 40(8):773–794

    Article  MATH  Google Scholar 

  • de Wit M, Zaidman A, van Deursen A (2009) Managing code clones using dynamic change tracking and resolution. 25th IEEE International Conference on Software Maintenance (ICSM 2009), September 20–26, 2009. Alberta, Canada, IEEE Computer Society, Edmonton, pp 169–178

    MATH  Google Scholar 

  • Yue R, Gao Z, Meng N, Xiong Y, Wang X, Morgenthaler JD (2018) Automatic clone recommendation for refactoring based on the present and the past. In: 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018, IEEE Computer Society, pp 115–126

  • Zhang F, Khoo S (2021) An empirical study on clone consistency prediction based on machine learning. Inf Softw Technol 136:106573

    Article  MATH  Google Scholar 

  • Zhang F, Khoo SC, Su X (2016) Predicting consistent clone change. In: 27th IEEE International Symposium on Software Reliability Engineering, ISSRE 2016, Ottawa, ON, Canada, October 23-27, 2016, IEEE Computer Society, pp 353–364

  • Zhang F, Khoo SC, Su X (2017) Predicting change consistency in a clone group. J Syst Software 134:105–119

    Article  MATH  Google Scholar 

  • Zhang F, Khoo SC, Su X (2020) Improving maintenance-consistency prediction during code clone creation. IEEE Access 8:82085–82099

    Article  Google Scholar 

  • Zou Y, Ban B, Xue Y, Xu Y (2020) Ccgraph: a pdg-based code clone detector with approximate graph matching. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020, IEEE, pp 931–942

Download references

Funding

This work was supported by National Natural Science Foundation of China (62172099).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yijian Wu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Venera Arnaoudova, Sonia Haiduc, Gabriele Bavota.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Chen, Y., Peng, X. et al. CloneRipples: predicting change propagation between code clone instances by graph-based deep learning. Empir Software Eng 30, 14 (2025). https://doi.org/10.1007/s10664-024-10567-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10567-0

Keywords