Abstract
Code cloning is one of the active research areas in the software engineering community. Specifically, researchers have conducted numerous empirical studies on code cloning and reported that 7 % to 23 % of the code in a typical software system has been cloned. However, there was less awareness of code clones in dynamically-typed languages and most studies are limited to statically-typed languages such as Java, C, and C++. In addition, most previous studies did not consider different application domains such as standalone projects or web applications. As a result, very little is known about clones in dynamically-typed languages, such as JavaScript, in different application domains. In this paper, we report a large-scale clone detection experiment in a dynamically-typed programming language, JavaScript, for different application domains: web pages and standalone projects. Our experimental results showed that unlike JavaScript standalone projects, JavaScript web applications have 95 % of inter-file clones and 91–97 % of widely scattered clones. We observed that web application developers created clones intentionally and such clones may not be as risky as claimed in previous studies. Understanding the risks of cloning in web applications requires further studies, as cloning may be due to either good or bad intentions. Also, we identified unique development practices such as including browser-dependent or device-specific code in code clones of JavaScript web applications. This indicates that features of programming languages and technologies affect how developers duplicate code.
Similar content being viewed by others
Notes
References
Abd-El-Hafiz SK (2012) A metrics-based data mining approach for software clone detection. In: Proceedings of the 36th annual computer software and applications conference. IEEE, pp 35–41
Alexa Internet Inc (2013) Alexa top sites. http://www.alexa.com/topsites
Antoniol G, Casazza G, Di Penta M, Merlo E (2001) Modeling clones evolution through time series. In: Proceedings of international conference on software maintenance. IEEE, pp 273–280
Antoniol G, Villano U, Merlo E, Di Penta M (2002) Analyzing cloning evolution in the Linux kernel. Inf Softw Technol 44(13):755–765
Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: Proceedings of the 11th European conference on software maintenance and reengineering. IEEE, pp 81–90
Bakerm BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of the 2nd working conference on reverse engineering. IEEE, pp 86–95
Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999) Measuring clone based reengineering opportunities. In: Proceedings of the 6th international software metrics symposium. IEEE, pp 292–303
Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of international conference on software maintenance. IEEE, pp 368–377
Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591
Bettenburg N, Shang W, Ibrahim WM, Adams B, Zou Y, Hassan AE (2012) An empirical study on inconsistent changes to code clones at the release level. Sci Comput Program 77(6):760–776
Blanco L, Dalvi N, Machanavajjhala A (2011) Highly efficient algorithms for structural clustering of large websites. In: Proceedings of the 20th international conference on World wide web. ACM, pp 437–446
Boldyreff C, Kewish R (2001) Reverse engineering to achieve maintainable WWW sites. In: Proceedings of the 8th working conference on reverse engineering. IEEE, pp 249–257
Brito e Abreu F (1995) The MOOD metrics set. In: Proceedings of European conference on object-oriented programming, vol 95, p 267
Brixtel R, Fontaine M, Lesner B, Bazin C, Robbes R (2010) Language-independent clone detection applied to plagiarism detection. In: Proceedings of the 10th IEEE working conference on source code analysis and manipulation. IEEE, pp 77–86
Bulychev P, Minea M (2009) An evaluation of duplicate code detection using anti-unification. In: Proceedings of the 3rd international workshop on software clones. Citeseer, pp 22–27
Burd E, Bailey J (2002) Evaluating clone detection tools for use during preventative maintenance. In: Proceedings of the 2nd international workshop on source code analysis and manipulation. IEEE, pp 36–43
Cai D, Kim M (2011) An empirical study of long-lived code clones. Fundamental approaches to software engineering, pp 432–446
Calefato F, Lanubile F, Mallardo T (2004) Function clone detection in web applications: a semiautomated approach. J Web Eng 3:3–21
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Chugh R, Meister JA, Jhala R, Lerner S (2009) Staged information flow for javascript. In: ACM Sigplan Notices, vol 44. ACM, pp 50–62
Cordy JR, Dean TR, Synytskyy N (2004) Practical language-independent detection of near-miss clones. In: Proceedings of the 2004 conference of the centre for advanced studies on collaborative research. IBM Press, pp 1–12
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 12th annual symposium on computational geometry. ACM, pp 253–262
De Lucia A, Francese R, Scanniello G, Tortora G (2005) Understanding cloned patterns in web applications. In: Proceedings of the 13th international workshop on program comprehension. IEEE, pp 333–336
De Lucia A, Scanniello G, Tortora G (2007) Identifying similar pages in web applications using a competitive clustering algorithm. J Softw Maint Evol Res Pract 19(5):281–296
De Lucia A, Risi M, Scanniello G, Tortora G (2009) An investigation of clustering algorithms in the identification of similar web pages. J Web Eng 8(4):346–370
Di Lucca GA, Di Penta M, Fasolino AR (2002) An approach to identify duplicated web pages. In: Proceedings of the 26th annual international computer software and applications conference. IEEE, pp 481–486
Di Lucca GA, Fasolino AR, Tramontana P (2005) Recovering interaction design patterns in web applications. In: Proceedings of the 9th European conference on software maintenance and reengineering. IEEE, pp 366–374
Ducasse S, Nierstrasz O, Rieger M (2004) Lightweight detection of duplicated code–a language-independent approach. Institute for Applied Mathematics and Computer Science. University of Berne
Ducasse S, Nierstrasz O, Rieger M (2006) On the effectiveness of clone detection by string matching. J Softw Maint Evol Res Pract 18(1):37–58
Falke R, Frenzel P, Koschke R (2008) Empirical evaluation of clone detection using syntax suffix trees. Empir Softw Eng 13(6):601–643
Finifter M, Weinberger J, Barth A (2010) Preventing capability leaks in secure javascript subsets. In: NDSS
Fowler M, Beck K (1999) Refactoring: improving the design of existing code. Addison-Wesley, Reading
GitHub Inc (2013) JavaScript Projects in GitHub. https://github.com/trending?l=javascript
Guarnieri S, Pistoia M, Tripp O, Dolby J, Teilhet S, Berg R (2011) Saving the world wide web from vulnerable JavaScript. In: Proceedings of the 20th international symposium on software testing and analysis
Harris S (2013) Simian–similarity analyser. http://www.harukizaemon.com/simian/index.html
Hegedűs P, Bakota T, Illés L, Ladányi G, Ferenc R, Gyimóthy T (2011) Source code metrics and maintainability: a case study. In: Software engineering, business continuity, and education. Springer, Berlin Heidelberg New York, pp 272–284
Hill R, Rideout J (2004) Automatic method completion. In: Proceedings of the 19th international conference on automated software engineering. IEEE, pp 228–235
Hotta K, Sano Y, Higo Y, Kusumoto S (2010) Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software. In: Proceedings of the joint ERCIM workshop on software evolution and international workshop on principles of software evolution, pp 73–82
Islam M, Islam M, Halim T (2011) A study of code cloning in server pages of web applications developed using classic asp. net and asp. net mvc framework. In: Proceedings of the 14th international conference on computer and information technology. IEEE, pp 497–502
Jang J, Agrawal A, Brumley D (2012) ReDeBug: finding unpatched code clones in entire os distributions. In: Proceedings of symposium on security and privacy. IEEE, pp 48–62
Jiang L, Misherghi G, Su Z, Glondu S (2007a) Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th international conference on software engineering. IEEE, pp 96–105
Jiang L, Su Z, Chiu E (2007b) Context-based detection of clone-related bugs. In: Proceedings of the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 55–64
Jones MC (2011) Remix and reuse of source code in software production. PhD thesis, Citeseer
Juergens E, Deissenboeck F, Hummel B (2009a) CloneDetective–a workbench for clone detection research. In: Proceedings of the 31st international conference on software engineering. IEEE, pp 603–606
Juergens E, Deissenboeck F, Hummel B, Wagner S (2009b) Do code clones matter?. In: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, pp 485–495
Kamei Y, Sato H, Monden A, Kawaguchi S, Uwano H, Nagura M, Matsumoto Ki, Ubayashi N (2011) An empirical study of fault prediction with code clone metrics. In: Proceedings of the joint conference of the 21th international workshop on software measurement and the 6th international conference on software process and product measurement. IEEE, pp 55–61
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670
Kapser C, Godfrey M (2003) Toward a taxonomy of clones in source code: a case study. In: Proceedings of the conference on evolution of large scale industrial software architectures, pp 67–78
Kapser CJ, Godfrey MW (2008) “Cloning Considered harmful” considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645–692
Kienle HM, Müller HA, Weber A (2003) In the web of generated “clones” (position paper)
Kim H, Jung Y, Kim S, Yi K (2011) Mecc: memory comparison-based clone detector. In: Proceedings of the 33rd international conference on software engineering. IEEE, pp 301–310
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. ACM SIGSOFT Softw Eng Notes 30(5):187–196
Kontogiannis K (1997) Evaluation experiments on the detection of programming patterns using software metrics. In: Proceedings of the 4th working conference on reverse engineering. IEEE, pp 44–54
Kontogiannis K, DeMori R, Merlo E, Galler M, Bernstein M (1996) Pattern matching for clone and concept detection. Autom Softw Eng 3(1-2):77–108
Koschke R (2007) Survey of research on software clones. Duplication, redundancy, and similarity in software. http://drops.dagstuhl.de/volltexte/2007/962/
Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: Proceedings of the 13th working conference on reverse engineering. IEEE, pp 253–262
Koschke R, Baxter ID, Conradt M, Cordy JR (2012) Software clone management towards industrial application (dagstuhl seminar 12071). Dagstuhl Reports 2(2)
Kou G, Lou C (2012) Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data. Ann Oper Res 197(1):123–134
Kozlov D, Koskinen J, Sakkinen M, Markkula J (2010) Exploratory analysis of the relations between code cloning and open source software quality. In: Proceedings of the 7th international conference on the quality of information and communications technology. IEEE, pp 358–363
Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: Proceedings of the 14th working conference on reverse engineering. IEEE, pp 170–178
Krinke J (2008) Is cloned code more stable than non-cloned code?. In: Proceedings of the 8th international working conference on source code analysis and manipulation. IEEE, pp 57–66
Krinke J (2011) Is cloned code older than non-cloned code?. In: Proceedings of the 5th international workshop on software clones. ACM, pp 28–33
Lague B, Proulx D, Mayrand J, Merlo E, Hudepohl J (1997) Assessing the benefits of incorporating function clone detection in a development process. In: Proceedings of the international conference on software maintenance, vol 97
Lanubile F, Mallardo T (2003) Finding function clones in web applications. In: Proceedings of the 7th European conference on software maintenance and reengineering. IEEE, pp 379–386
Lee H, Won S, Jin J, Cho J, Ryu S (2012) SAFE: Formal specification and implementation of a scalable analysis framework for ECMAScript. In: Proceedings of the 19th international workshop on foundations of object-oriented languages
Li C, Sun J, Chen H (2014) An improved method for tree-based clone detection in web applications. In: Proceedings of the 4th international conference on digital information and communication technology and it’s applications. IEEE, pp 363–367
Li J, Ernst MD (2012) Cbcd: cloned buggy code detector. In: Proceedings of the 2012 international conference on software engineering. IEEE Press, pp 310–320
Livieri S, Higo Y, Matsushita M, Inoue K (2007) Analysis of the linux kernel evolution using code clone coverage. In: Proceedings of the 4th international workshop on mining software repositories. IEEE, pp 22–22
Lozano A, Wermelinger M, Nuseibeh B (2008) Evaluating the relation between changeability decay and the characteristics of clones and methods. In: Proceedings of the 23rd international conference on automated software engineering-workshops. IEEE, pp 100–109
Martin D, Cordy JR (2011) Analyzing web service similarity using contextual clones. In: Proceedings of the 5th international workshop on software clones. ACM, pp 41–46
Martinsen JK, Grahn H, Isberg A (2011) A comparative evaluation of javascript execution behavior. In: Web engineering. Springer, Berlin Heidelberg New York, pp 399–402
Mayrand J, Leblanc C, Merlo EM (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of international conference on software maintenance. IEEE, pp 244–253
Merlo E, Antoniol G, Di Penta M, Rollo VF (2004) Linear complexity object-oriented similarity for clone detection and software evolution analyses. In: Proceedings of the 20th international conference on software maintenance. IEEE, pp 412–416
Merlo E, Dagenais M, Bachand P, Sormani J, Gradara S, Antoniol G (2002) Investigating large software system evolution: the Linux kernel. In: Proceedings of the 26th annual international computer software and applications conference. IEEE, pp 421–426
Mondal M, Roy CK, Rahman MS, Saha RK, Krinke J, Schneider KA (2012) Comparative stability of cloned and non-cloned code: an empirical study. In: Proceedings of the 27th annual symposium on applied computing. ACM, pp 1227–1234
Monden A, Nakae D, Kamiya T, Sato S, Matsumoto K (2002) Software quality analysis by code clones in industrial legacy software. In: Proceedings of the 8th symposium on software metrics. IEEE, pp 87–94
Muhammad T, Zibran MF, Yamamoto Y, Roy CK (2013) Near-miss clone patterns in web applications: an empirical study with industrial systems. In: Canadian conference on electrical and computer engineering
Negara N, Tsantalis N, Stroulia E (2013) Feature detection in ajax-enabled web applications. In: Proceedings of the 17th European conference on software maintenance and reengineering. IEEE, pp 154–163
Nikiforakis N, Invernizzi L, Kapravelos A, Van Acker S, Joosen W, Kruegel C, Piessens F, Vigna G (2012) You are what you include: large-scale evaluation of remote javascript inclusions. In: Proceedings of the 2012 ACM conference on computer and communications security. ACM, pp 736–747
Ocariza F, Pattabiraman K, Zorn B (2011) Javascript errors in the wild: an empirical study. In: Proceedings of the 22nd international symposium on software reliability engineering. IEEE, pp 100–109
Patenaude JF, Merlo E, Dagenais M, Laguë B (1999) Extending software quality assessment techniques to Java systems. In: Proceedings of the 7th international workshop on program comprehension. IEEE, pp 49–56
PLRG@KAIST (2012) SAFE: Scalable Analysis Framework for ECMAScript. http://plrg.kaist.ac.kr/redmine/projects/jsf/repository
PMD (2013) PMD’s copy/paste detector. http://pmd.sourceforge.net/pmd-5.0.5/cpd-usage.html
Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell Empir Softw Eng 17(4-5):503–530
Rajapakse D, Jarzabek S (2005) An investigation of cloning in web applications. Web Engineering pp 252–262
Rajapakse DC, Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings of the 29th international conference on software engineering. IEEE, pp 116– 126
Ramage D, Heymann P, Manning CD, Garcia-Molina H (2009) Clustering the tagged web. In: Proceedings of the 2nd ACM international conference on web search and data mining. ACM, pp 54–63
Ratanaworabhan P, Livshits B, Zorn BG (2010) JSMeter: comparing the behavior of JavaScript benchmarks with real web applications. In: Proceedings of the 2010 USENIX conference on Web application development. USENIX Association, pp 3–3
Richards G, Hammer C, Burg B, Vitek J (2011) The eval that men do. In: Proceedings of the 25th European conference on object-oriented programming. Springer, Berlin Heidelberg New York, pp 52–78
Richards G, Lebresne S, Burg B, Vitek J (2010) An analysis of the dynamic behavior of JavaScript programs. In: Proceedings of the SIGPLAN conference on programming language design and implementation, vol 45. ACM, pp 1–12
Rieger M, Ducasse S, Lanza M (2004) Insights into system-wide code duplication. In: Proceedings of the 11th working conference on reverse engineering. IEEE, pp 100–109
Roy C, Cordy J (2007) A survey on software clone detection research. Queen’s School of Computing TR 541:115
Roy C, Cordy J (2010a) Are scripting languages really different?. In: Proceedings of the 4th international workshop on software clones. ACM, pp 17–24
Roy C, Cordy J (2010b) Near-miss function clones in open source software: an empirical study. J Softw Maint Evol Res Pract 22(3):165–189
Roy CK, Cordy JR (2008) An empirical study of function clones in open source software. In: Proceedings of the 15th working conference on reverse engineering. IEEE, pp 81–90
Roy CK, Cordy JR (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: Proceedings of the international conference on software testing, verification and validation workshops. IEEE, pp 157–166
Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74(7):470–495
Roy CK, Zibran MF, Koschke R (2014) The vision of software clone management: past, present, and future. In: Proceedings of the IEEE CSMR-18/WCRE-21 software evolution week
Rysselberghe FV, Demeyer S (2004) Evaluating clone detection techniques from a refactoring perspective. In: Proceedings of the 19th international conference on automated software engineering. IEEE Computer Society, pp 336–339
SAFE Corporation (2012) CodeMatch. http://www.safe-corp.biz/products_codematch.htm
Saha RK, Roy CK, Schneider KA (2011) An automatic framework for extracting and classifying near-miss clone genealogies. In: Proceedings of the 27th international conference on software maintenance. IEEE, pp 293–302
Schleimer S, Wilkerson DS, Aiken A (2003) Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data. ACM, pp 76–85
Selamat A, Wahid N (2007) Code clone detection using string based tree matching technique. InTech
Shawky DM, Ali AF (2010) An approach for assessing similarity metrics used in metric-based clone detection techniques. In: Proceedings of the 3rd international conference on computer science and information technology, vol 1. IEEE, pp 580–584
Stephan M, Alalfi MH, Stevenson A, Cordy JR (2013) Using mutation analysis for a model-clone detector comparison framework. In: Proceedings of the 2013 international conference on software engineering. IEEE Pres, Piscataway, pp 1261–1264
Stephan M, Alalfi MH, Cordy JR (2014) Towards a taxonomy for simulink model mutations. In: Proceedings of the 7th international conference on software testing, verification and validation workshops. IEEE, pp 206–215
Svajlenko J, Roy CK, Zibran MF, Cordy JR (2013) A mutation analysis based benchmarking framework for clone detectors. In: Proceedings of short/tool papers track of the ICSE 7th international workshop on software clones
Tairas R, Gray J (2006) Phoenix-based clone detection using suffix trees. In: Proceedings of the 44th annual southeast regional conference. ACM, pp 679–684
Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2010) An empirical study on the maintenance of source code clones. Empir Softw Eng 15(1):1–34
Van Welie M, Van der Veer GC (2003) Pattern languages in interaction design: structure and organization. In: Proceedings of interact, vol 3, pp 1–5
Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: a rigorous approach to clone evaluation. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 455–465
Wikipedia (2015) List of graphical user interface builders and rapid application development tools. http://en.wikipedia.org/wiki/List_of_graphical_user_interface_builders_and_rapid_application_development_tools
Yamanaka Y, Choi E, Yoshida N, Inoue K, Sano T (2013) Applying clone change notification system into an industrial development process. In: Proceedings of the 21st international conference on program comprehension. IEEE, pp 199–206
Zibran MF, Roy CK (2012) Ide-based real-time focused search for near-miss clones. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1235–1242
Zibran MF, Saha RK, Asaduzzaman M, Roy CK (2011) Analyzing and forecasting near-miss clones in evolving software: an empirical study. In: Proceedings of the 16th international conference on engineering of complex computer systems. IEEE, pp 295–304
Acknowledgments
This work is supported in part by Korea Ministry of Education, Science and Technology(MEST) / National Research Foundation of Korea(NRF) (Grants NRF-2014R1A2A2A01003235 and NRF-2008-0062609), Samsung Electronics, and Google.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Andrea De Lucia
Rights and permissions
About this article
Cite this article
Cheung, W.T., Ryu, S. & Kim, S. Development nature matters: An empirical study of code clones in JavaScript applications. Empir Software Eng 21, 517–564 (2016). https://doi.org/10.1007/s10664-015-9368-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-015-9368-6