Skip to main content
Log in

An empirical study on the maintenance of source code clones

Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Code cloning has been very often indicated as a bad software development practice. However, many studies appearing in the literature indicate that this is not always the case. In fact, either changes occurring in cloned code are consistently propagated, or cloning is used as a sort of templating strategy, where cloned source code fragments evolve independently. This paper (a) proposes an automatic approach to classify the evolution of source code clone fragments, and (b) reports a fine-grained analysis of clone evolution in four different Java and C software systems, aimed at investigating to what extent clones are consistently propagated or they evolve independently. Also, the paper investigates the relationship between the presence of clone evolution patterns and other characteristics such as clone radius, clone size and the kind of change the clones underwent, i.e., corrective maintenance or enhancement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://argouml.tigris.org

  2. http://www.jboss.org

  3. http://www.openssh.com

  4. http://www.postgresql.org

  5. http://www.blue-edge.bg/simscan/

  6. http://www.bauhaus-stuttgart.de/clones/

  7. http://www.rcost.unisannio.it/mdipenta/clone-evol-rawdata.zip

References

  • Al-Ekram R, Kasper C, Holt R, Godfrey M (2005) Cloning by accident: an empirical study of source code cloning across software systems. In: International symposium on empirical software engineering (ISESE 2005), pp 376–385

  • Alkhatib G (1992) The maintenance problem of application software: an empirical analysis. J Softw Maint 4(2):83–104

    Article  Google Scholar 

  • Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983

    Article  Google Scholar 

  • Antoniol G, Merlo E, Villano U, Di Penta M (2002) Analyzing cloning evolution in the Linux Kernel. Inf Softw Technol 44:755–765

    Article  Google Scholar 

  • Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: 11th European conference on software maintenance and reengineering, software evolution in complex software intensive systems, CSMR 2007, 21–23 March 2007, Amsterdam, The Netherlands. IEEE Computer Society, pp 81–90

  • Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of the working conference on reverse engineering (WCRE ’95). IEEE Computer Society, pp 86–95

  • Bakota T, Ferenc R, Gyimóthy T (2007) Clone smells in software evolution. In: Proceedings of the international conference on software maintenance (ICSM ’07). Paris, France. IEEE Computer Society, pp 24–33

  • Balazinska M, Merlo E, Dagenais M, Laguë B, Kontogiannis K (2000) Advanced clone-analysis to support object-oriented system refactoring. In: Proceedings of the working conference on reverse engineering. IEEE Computer Society, pp 98–107

  • Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance. IEEE Computer Society, pp 368–377

  • Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591

    Article  Google Scholar 

  • Bouktif S, Antoniol G, Merlo E (2006) A feedback based quality assessment to support open source software evolution: the GRASS case study. In: 22nd IEEE international conference on software maintenance (ICSM 2006), 24–27 September 2006, Philadelphia, Pennsylvania, USA. IEEE Computer Society, pp 155–165

  • Bouktif S, Gueheneuc Y-G, Antoniol G (2006) Extracting change-patterns from cvs repositories. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, pp 221–230

  • Canfora G, Cerulo L, Di Penta M (2007) Identifying changed source code lines from version repositories. In: Proceedings of the fourth international workshop on mining software repositories, MSR 2007 (ICSE Workshop), Minneapolis, MN, USA, 19–20 May 2007. IEEE Computer Society, p 14

  • Casazza G, Antoniol G, Villano U, Merlo E, Di Penta M (2001) Identifying clones in the Linux Kernel. In Proceedings of the international workshop on source code analysis and manipulation. IEEE Computer Society, pp 90–97

  • Cordy JR (2003) Comprehending reality—practical barriers to industrial adoption of software maintenance automation. In: 11th international workshop on program comprehension (IWPC 2003), 10–11 May 2003, Portland, Oregon, USA. IEEE Computer Society, pp 196–206

  • Duala-Ekoko E, Robillard MP (2007) Tracking code clones in evolving software. In: ICSE ’07: proceedings of the 29th international conference on software engineering, Minneapolis, MN, USA. IEEE Computer Society, pp 158–167

  • Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM ’03: proceedings of 19th IEEE international conference on software maintenance, Amsterdam, Netherlands. IEEE Computer Society, pp 23–32

  • Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: 30th international conference on software engineering (ICSE 2008), Leipzig, Germany, 10–18 May 2008. ACM, New York, pp 321–330

    Google Scholar 

  • Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the international conference on software maintenance, pp 190–197

  • Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, p 13

  • Geiger R, Fluri B, Gall HC, Pinzger M (2006) Relation of code clones and change couplings. In: Proceedings of the 9th international conference of funtamental approaches to software engineering (FASE). Lecture notes in computer science, vol. 3922, Vienna, Austria. Springer, New York, pp 411–425

    Chapter  Google Scholar 

  • Godfrey MW, Tu Q (2000) Evolution in open source software:a case study. In: Proceedings of the 2000 international conference on software maintenance, pp 131–142

  • Godfrey MW, Svetinovic D, Tu Q (2000) Evolution, growth, and cloning in Linux: a case study. In: CASCON workshop on detecting duplicated and near duplicated structures in largs software systems: methods and applications

  • Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, 20–26 May 2007. IEEE Computer Society, pp 96–105

  • Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670

    Article  Google Scholar 

  • Kapser C, Godfrey MW (2004) Aiding comprehension of cloning through categorization. In: 7th international workshop on principles of software evolution (IWPSE 2004), 6–7 September 2004, Kyoto, Japan. IEEE Computer Society, pp 85–94

  • Kapser C, Godfrey MW (2005) Improved tool support for the investigation of duplication in software. In: 21st IEEE international conference on software maintenance (ICSM 2005), 25–30 September 2005, Budapest, Hungary. IEEE Computer Society, pp 305–314

  • Kapser C, Godfrey MW (2006) ‘Cloning considered harmful’ considered harmful. In: Proceedings of the 2006 working conference on reverse engineering, Benevento, Italy. IEEE Computer Society, pp 19–28

  • Kapser C, Anderson P, Godfrey M, Koschke R, Rieger M, van Rysselberghe F, Weißgerber P (2007) Subjectivity in clone judgment: can we ever agree? In: Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings. Internationales Begegnungs-und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany

    Google Scholar 

  • Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of the European software engineering conference and the ACM symposium on the foundations of software engineering, Lisbon, Portugal. ACM, New York, pp 187–196

    Google Scholar 

  • Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of the working conference on reverse engineering, Stuttgart, Germany, pp 301–309

  • Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: 14th working conference on reverse engineering (WCRE 2007), 28–31 October 2007, Vancouver, BC, Canada, Los Alamitos, CA, USA. IEEE Computer Society, pp 170–178

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl (10):707–710

    MathSciNet  Google Scholar 

  • Li Z, Lu S, Myagmar S, Zhou Y (2006) Copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176–192

    Article  Google Scholar 

  • Lozano A, Wermelinger M, Nuseibeh B (2007a) Assessing the impact of bad smells using historical information. In: IWPSE ’07: ninth international workshop on principles of software evolution, New York, NY, USA. ACM, New York, pp 31–34

    Google Scholar 

  • Lozano A, Wermelinger M, Nuseibeh B (2007b) Evaluating the harmfulness of cloning: a change based experiment. In: Proceedings of the fourth international workshop on mining software repositories, MSR 2007 (ICSE Workshop), Minneapolis, MN, USA, 19–20 May 2007. IEEE Computer Society, p 18

  • Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering (ICSE), Portland, OR, USA. IEEE Computer Society, pp 124–135

  • Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the international conference on software maintenance, Monterey, CA. IEEE Computer Society, pp 244–253

  • Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: Proceedings of the international conference on software maintenance. IEEE Computer Society

  • Reiss SP (2007) Automatic code stylizing. In: 22nd IEEE/ACM international conference on automated software engineering (ASE 2007), 5–9 November 2007, Atlanta, Georgia, USA. ACM, New York, pp 74–83

    Google Scholar 

  • Ueda Y, Kamiya T, Kusumoto S, Inoue K (2002) Gemini: maintenance support environment based on code clone analysis. In: 8th IEEE international software metrics symposium (METRICS 2002), 4–7 June 2002, Ottawa, Canada. IEEE Computer Society, pp 67–76

  • van Emden E, Moonen L (2002) Java quality assurance by detecting code smells. In: 9th working conference on reverse engineering (WCRE 2002), 28 October–1 November 2002, Richmond, VA, USA. IEEE Computer Society, pp 97–107

  • Xie Y, Engler DR (2002) Using redundancies to find errors. In: Proceedings of the 10th ACM SIGSOFT international symposium on foundations of software engineering, pp 51–60

  • Yin RK (2002) Case study research: design and methods, 3rd edn. Sage, London

    Google Scholar 

  • Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: proceedings of the 26th international conference on software engineering, pp 563–572

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their very constructive comments on early versions of this manuscript. We also thank William Harris for his review comments that helped us to improve the draft. Luigi Cerulo, Lerina Aversano, and Massimiliano Di Penta are partially supported by the project METAMORPHOS (MEthods and Tools for migrAting software systeMs towards web and service Oriented aRchitectures: exPerimental evaluation, usability, and tecHnOlogy tranSfer), funded by MiUR (Ministero dell’Università e della Ricerca) under grant PRIN2006-2006098097. Suresh Thummalapenta is partially supported by NSF grant CCF-0725190 and ARO grant W911NF-08-1-0443.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimiliano Di Penta.

Additional information

Editor: Murray Wood

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thummalapenta, S., Cerulo, L., Aversano, L. et al. An empirical study on the maintenance of source code clones. Empir Software Eng 15, 1–34 (2010). https://doi.org/10.1007/s10664-009-9108-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-009-9108-x

Keywords

Navigation