Abstract
Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system’s design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71% of the clones could be considered to have a positive impact on the maintainability of the software system.
Similar content being viewed by others
References
Antoniol G, Villano U, Merlo E, Penta MD (2002) Analyzing cloning evolution in the linux kernel. Inf Softw Technol 44(13):755–765
Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: CSMR ’07: proceedings of the 11th european conference on software maintenance and reengineering. IEEE Computer Society, Los Alamitos, pp 81–90
Baker BS (1995) On finding duplication and near-duplication in large software systems. In: WCRE ’95: proceedings of the second working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 86–95
Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999a) Measuring clone based reengineering opportunities. In: Proceedings of the sixth international software metrics symposium. IEEE Computer Society, Los Alamitos, pp 292–303
Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999b) Partial redesign of java software systems based on clone analysis. In: The proceedings of the 6th. working conference on reverse engineering. IEEE Computer Society, Los Alamitos, pp 326–336
Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (2000) Advanced clone analysis to support object-oriented system refactoring. In: Proceedings of the 7th. working conference on reverse engineering. IEEE Computer Society, Los Alamitos, pp 98–107
Basit HA, Rajapakse DC, Jarzabek S (2005) Beyond templates: a study of clones in the STL and some general implications. In: ICSE ’05: proceedings of the 27th international conference on software engineering. ACM, New York, pp 451–459
Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: ICSM ’98: proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, DC, p 368
Bellon S (2002) Detection of software clones—tool comparison experiment. In: International workshop on source code analysis and manipulation. Montreal, October 2002
Brown WJ, Malveau RC, McCormick HW III, Mowbray TJ (1998) AntiPatterns: refactoring software, architectures, and projects in crisis, 1st edn. Wiley, New York
Casazza G, Antoniol G, Villano U, Merlo E, Penta MD (2001) Identifying clones in the linux kernel. In: First IEEE international workshop on source code analysis and manipulation. IEEE Computer Society Press, Los Alamitos, pp 92–100
Coplien JO (1992) Advanced C++ programming styles and idioms, 1st edn. Addison Wesley, Reading
Cordy JR (2003) Comprehending reality—practical barriers to industrial adoption of software maintenance automation. In: Proceedings of the 11th IEEE international workshop on program comprehension. IEEE Computer Society, Los Alamitos, pp 196–206
Duala-Ekoko E, Robillard M (2007) Tracking code clones in evolving software. In: 29th international conference on software engineering (ICSE 2007). IEEE Computer Society, Los Alamitos, pp 158–167
Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proceedings ICSM’99: international conference on software maintenance. IEEE Computer Society Press, Los Alamitos, pp 109–118
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code, 1st edn. Addison-Wesley Professional
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software, 1st edn. Addison-Wesley, Reading
Geiger R, Fluri B, Gall H, Pinzger M (2006) Relation of code clones and change couplings. In: Fundamental approaches to software engineering, 9th international conference, FASE 2006, Lecture notes in computer science, vol 3922. Springer, Heidelberg, pp 411–425
Godfrey MW, Tu Q (2000) Evolution in open source software: a case study. In: Proceedings of the 2000 international conference on software maintenance. IEEE, Piscataway, pp 131–142
Godfrey MW, Zou L (2005) Using origin analysis to detect merging and splitting of source code entities. IEEE Trans Softw Eng 31(2):166–181
Godfrey MW, Svetinovic D, Tu Q (2000) Evolution, growth, and cloning in Linux: a case study. A presentation at the 2000 CASCON workshop on ’Detecting duplicated and near duplicated structures in largs software systems: Methods and applications’, on November 16, 2000, chaired by Ettore Merlo. http://plg.uwaterloo.ca/~migod/papers/2000/cascon00-linuxcloning.pdf
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York
Higo Y, Kamiya T, Kusumoto S, Inoue K (2004) Aries: refactoring support environment based on code clone analysis. In: The 8th IASTED international conference on software engineering and applications (SEA 2004). MIT, Cambridge, pp 222–229
Jarzabek S, Shubiao L (2003) Eliminating redundancies with a “composition with adaptation” meta-programming technique. In: ESEC/FSE-11: proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, New York, pp 237–246
Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 96–105
Johnson JH (1994) Substring matching for clone detection and change tracking. In: Proceedings of the international conference on software maintanence. IEEE, Piscataway, pp 120–126
Kamiya T, Kusumoto S, Inoue K (2002) CCfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 8(7):654–670
Kapser C, Godfrey MW (2003) Toward a taxonomy of clones in source code: a case study. In: Evolution of large scale industrial software architectures. Amsterdam, 23 September 2003
Kapser C, Godfrey MW (2004) Aiding comprehension of cloning through categorization. In: Proc. of 2004 international workshop on principles of software evolution (IWPSE-04). IEEE Computer Society, Los Alamitos, pp 85–94
Kapser C, Godfrey MW (2005) Improved tool support for the investigation of duplication in software. In: ICSM ’05: proceedings of the 21st IEEE international conference on software maintenance (ICSM’05). IEEE Computer Society, Washington, DC, pp 305–314
Kapser C, Godfrey MW (2006a) ‘Cloning considered harmful’ considered harmful. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, Washington, DC, pp 19–28
Kapser CJ, Godfrey MW (2006b) Supporting the analysis of clones in software systems. J Softw Maint Evol Res Pract 18(2):61–82
Kiczales G, Lamping J, Menhdhekar A, Maeda C, Lopes C, Loingtier J-M, Irwin J (1997) Aspect-oriented programming. In: Akit M, Matsuoka S (eds.) Proceedings European conference on object-oriented programming, vol. 1241. Springer, Berlin Heidelberg New York, pp 220–242
Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in oopl. In: ISESE ’04: proceedings of the 2004 international symposium on empirical software engineering (ISESE’04). IEEE Computer Society, Washington, DC, pp 83–92
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: ESEC/FSE-13: proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, pp 187–196
Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: SAS ’01: proceedings of the 8th international symposium on static analysis. Springer, Heidelberg, pp 40–56
Kontogiannis K, DeMori R, Merlo E, Galler M, Bernstein M (1996) Pattern matching for clone and concept detection. Autom Softw Eng 3(1/2):77–108
Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, Washington, DC, pp 253–262
Krinke J (2001) Identifying similar code with program dependence graphs. In: WCRE ’01: proceedings of the eigth working conference on reverse engineering (WCRE 2001). ACM, New York, pp 301–309
LaToza T, Venolia G, DeLine R (2006) Maintaining mental models: a study of developer work habits. In: ICSE ’06: proceedings of the 28th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 492–501
Lozano A, Wermelinger M, Nuseibeh B (2007) Evaluating the harmfulness of cloning: a change based experiment. In: MSR 2007: proceedings of the 4th int’l workshop on mining software repositories. IEEE Computer Society, Los Alamitos, pp 18–22
Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the international conference on software maintenance. IEEE Computer Society Press, Los Alamitos, pp 244–253
Mockus A, Fielding R, Herbsleb J (2000) A case study of open source software development: the Apache Server. In: Proceedings of the 22nd international conference on software engineering (ICSE 2000). ACM, New York, pp 263–272
Rajapakse D, Stan Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings ICSE ’07: 29th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 116–126
Rysselberghe FV, Demeyer S (2003) Reconstruction of successful software evolution using clone detection. In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, p 126
Toomim M, Begel A, Graham SL (2004) Managing duplicated code with linked editing. In: VLHCC ’04: proceedings of the 2004 IEEE symposium on visual languages - human centric computing (VLHCC’04). IEEE Computer Society, Washington, DC, 173–180
Ukkonen E (1995) On-line construction of suffix trees. Algorithmica 14(3):249–260
Walenstein A, Jyoti N, Li J, Yang Y, Lakhotia A (2003) Problems creating task-relevant clone detection reference data. In: Proceedings of the 10th working conference on reverse engineering (WCRE-03). IEEE Computer Society, Los Alamitos, pp 285–294
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Massimiliano Di Penta and Susan Sim
Rights and permissions
About this article
Cite this article
Kapser, C.J., Godfrey, M.W. “Cloning considered harmful” considered harmful: patterns of cloning in software. Empir Software Eng 13, 645–692 (2008). https://doi.org/10.1007/s10664-008-9076-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-008-9076-6