Abstract
Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell (Fowler et al. 1999) and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses the relationship between cloning and defect proneness. For the four medium to large open source projects that we studied, we find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Third, we find little evidence that clones with more copies are actually more error prone. Fourth, we find little evidence to support the claim that clone groups that span more than one file or directory are more defect prone than collocated clones. Finally, we find that developers do not need to put a disproportionately higher effort to fix clone dense bugs. Our findings do not support the claim that clones are really a “bad smell” (Fowler et al. 1999). Perhaps we can clone, and breathe easily, at the same time.








Similar content being viewed by others
References
Alkhatib G (1992) The maintenance problem of application software: an empirical analysis. J Softw Maint: Res Pract 4(2):83–104. doi:10.1002/smr.4360040203
Bachmann A, Bernstein A (2009) Data retrieval, processing and linking for software process data analysis. Technical report, University of Zurich. http://www.ifi.uzh.ch/ddis/people/adrian-bachmann/pdq/. Accessed May 2009
Baker BS (1995) On finding duplication and near-duplication in large software systems. In: WCRE ’95: proceedings of the 2nd working conference on reverse engineering. IEEE Computer Society, Washington, pp 86–95. http://portal.acm.org/citation.cfm?id=836911
Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999) Partial redesign of java software systems based on clone analysis. In: WCRE ’99: proceedings of the 6th working conference on reverse engineering. IEEE Computer Society, Washington, pp 326–336. http://portal.acm.org/citation.cfm?id=837061
Barbour L, Khomh F, Zou Y (2011) Late propagation in software clones
Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance, pp 368–377. doi:10.1109/ICSM.1998.738528
Berkus J (2007) The 5 types of open source projects. http://www.powerpostgresql.com/5_types. Accessed 20 March 2007
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: ESEC/FSE ’09: proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, New York, pp 121–130. doi:10.1145/1595696.1595716
Bruntink M, van Deursen A, van Engelen R, Tourwe T (2005) On the use of clone detection for identifying crosscutting concern code. IEEE Trans Softw Eng 31(10):804–818. doi:10.1109/TSE.2005.114
Cai D, Kim M (2011) An empirical study of long-lived code clones. Fundamental approaches to software engineering, pp 432–446
Čubranić D, Murphy GC (2003) Hipikat: recommending pertinent software development artifacts. In: ICSE ’03: proceedings of the 25th international conference on software engineering. IEEE Computer Society, Washington, pp 408–418. http://portal.acm.org/citation.cfm?id=776816.776866
Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proc. IEEE int. conf. on software maintenance 1999 (’99). Oxford, UK, pp 109–118
Ekoko ED, Robillard MP (2007) Tracking code clones in evolving software. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, pp 158–167. doi:10.1109/ICSE.2007.90
Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM ’03: proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, pp 23–32. http://portal.acm.org/citation.cfm?id=943568
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code, 1st edn. Addison-Wesley Professional. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0201485672
Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: ICSE ’08: proceedings of the 30th international conference on Software engineering. ACM, New York, pp 321–330. doi:10.1145/1368088.1368132
Geiger R, Fluri B, Gall H, Pinzger M (2006) Relation of code clones and change couplings. In: Baresi L, Heckel R (eds) Fundamental approaches to software engineering. Lecture notes in computer science, vol 3922, chap 31. Springer, Berlin/Heidelberg, pp 411–425. doi:10.1007/11693017_31
Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceeding of the 33rd international conference on software engineering. ACM, pp 311–320
Higo Y, Kamiya T, Kusumoto S, Inoue K (2005) Aries: refactoring support tool for code clone. SIGSOFT Softw Eng Notes 30(4):1–4. doi:10.1145/1082983.1083306
Jiang L, Misherghi G, Su Z, Glondu S (2007a) Deckard: scalable and accurate tree-based detection of code clones. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, pp 96–105. doi:10.1109/ICSE.2007.30
Jiang L, Su Z, Chiu E (2007b) Context-based detection of clone-related bugs. In: ESEC-FSE ’07: proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, New York, pp 55–64. doi:10.1145/1287624.1287634
Juergens E, Deissenboeck F, Hummel B, Wagner S (2009) Do code clones matter? In: ICSE ’09: proceedings of the 2009 IEEE 31st international conference on software engineering. IEEE Computer Society, Washington, pp 485–495. doi:10.1109/ICSE.2009.5070547
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670. doi:10.1109/TSE.2002.1019480
Kan S (2002) Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co., Inc., Boston
Kapser C, Godfrey M (2008) Cloning considered harmful considered harmful: patterns of cloning in software. Empir Software Eng 13(6):645–692
Kapser C, Godfrey MW (2006) “Cloning considered harmful” considered harmful. In: Working conference on reverse engineering, pp 19–28. doi:10.1109/WCRE.2006.1
Kawaguchi S, Yamashina T, Uwano H, Fushida K, Kamei Y, Nagura M, Iida H (2009) Shinobi: a tool for automatic code clone detection in the ide. In: Working conference on reverse engineering, pp 313–314. doi:10.1109/WCRE.2009.36
Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in oopl. In: International symposium on empirical software engineering, pp 83–92. doi:10.1109/ISESE.2004.1334896
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. SIGSOFT Softw Eng Notes 30(5):187–196. doi:10.1145/1095430.1081737
Kim S, Zimmermann T, Pan K, Jr J (2006) Automatic identification of bug-introducing changes. In: ASE ’06: proceedings of the 21st IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 81–90. doi:10.1109/ASE.2006.23
Kim S, Whitehead E, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Cousot P (ed) Static analysis, lecture notes in computer science, chap 3, vol 2126. Springer, Berlin, pp 40–56. doi:10.1007/3-540-47764-0_3
Komondoor R, Horwitz S (2003) Effective, automatic procedure extraction. In: IWPC ’03: proceedings of the 11th IEEE international workshop on program comprehension. IEEE Computer Society, Washington, pp 33–42. http://portal.acm.org/citation.cfm?id=857023
Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: WCRE ’07: proceedings of the 14th working conference on reverse engineering. IEEE Computer Society, Washington, pp 170–178. doi:10.1109/WCRE.2007.7
Krinke J (2008) Is cloned code more stable than non-cloned code? In: 2008 8th IEEE international working conference on source code analysis and manipulation, pp 57–66. doi:10.1109/SCAM.2008.14
Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: OSDI’04: proceedings of the 6th conference on symposium on opearting systems design & implementation. USENIX Association, Berkeley, p 20. http://portal.acm.org/citation.cfm?id=1251274
Mäntylä M, Lassenius C (2006) Subjective evaluation of software evolvability using code smells: an empirical study. Empir Software Eng 11(3):395–431. doi:10.1007/s10664-006-9002-8
Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: Proceedings international conference on software maintenance, 2000. IEEE Computer Society, Los Alamitos, pp 120–130. doi:10.1109/ICSM.2000.883028
Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi JM, Nguyen TN (2009) Clone-aware configuration management. In: ASE ’09: proceedings of the 2009 IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 123–134. doi:10.1109/ASE.2009.90
Rahman F, Bird C, Devanbu P (2010) Clones: what is that smell? In: Proceedings of the 7th working conference on mining software repositories. IEEE Computer Society
Roy C, Cordy J (2007) A survey on software clone detection research. Queens School of Computing TR 541:115
Selim G, Barbour L, Shang W, Adams B, Hassan A, Zou Y (2010) Studying the impact of clones on software defects. In: 2010 17th working conference on reverse engineering (WCRE). IEEE, pp 13–21
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR ’05: proceedings of the 2005 international workshop on mining software repositories. ACM, New York, pp 1–5. doi:10.1145/1083142.1083147
Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2009) An empirical study on the maintenance of source code clones. Empir Software Eng 15(1):1–34. doi:10.1007/s10664-009-9108-x
Toomim M, Begel A, Graham SL (2004) Managing duplicated code with linked editing. In: VLHCC ’04: proceedings of the 2004 IEEE symposium on visual languages—human centric computing. IEEE Computer Society, Washington, pp 173–180. doi:10.1109/VLHCC.2004.35
Acknowledgements
We would like to thank Adrian Bachmann and Avi Bernstein for the Univ. of Zurich bug linking data. We also thank Lingxiao Jiang, Ghassan Mishergi, Zhendong Su and Stephane Glondu for providing us DECKARD. We extend our gratitude to anonymous reviewers for valuable comments on this paper. We acknowledge support from an IBM Faculty Fellowship, and a gift from Microsoft Research. Most of all we acknowledge with gratitude support from the NSF Science of Design Program, grant No. SoD-TEAM 0613949. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Jim Whitehead and Tom Zimmermann
Rights and permissions
About this article
Cite this article
Rahman, F., Bird, C. & Devanbu, P. Clones: what is that smell?. Empir Software Eng 17, 503–530 (2012). https://doi.org/10.1007/s10664-011-9195-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-011-9195-3