Skip to main content
Log in

Development nature matters: An empirical study of code clones in JavaScript applications

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Code cloning is one of the active research areas in the software engineering community. Specifically, researchers have conducted numerous empirical studies on code cloning and reported that 7 % to 23 % of the code in a typical software system has been cloned. However, there was less awareness of code clones in dynamically-typed languages and most studies are limited to statically-typed languages such as Java, C, and C++. In addition, most previous studies did not consider different application domains such as standalone projects or web applications. As a result, very little is known about clones in dynamically-typed languages, such as JavaScript, in different application domains. In this paper, we report a large-scale clone detection experiment in a dynamically-typed programming language, JavaScript, for different application domains: web pages and standalone projects. Our experimental results showed that unlike JavaScript standalone projects, JavaScript web applications have 95 % of inter-file clones and 91–97 % of widely scattered clones. We observed that web application developers created clones intentionally and such clones may not be as risky as claimed in previous studies. Understanding the risks of cloning in web applications requires further studies, as cloning may be due to either good or bad intentions. Also, we identified unique development practices such as including browser-dependent or device-specific code in code clones of JavaScript web applications. This indicates that features of programming languages and technologies affect how developers duplicate code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://safe.kaist.ac.kr

  2. https://github.com/ruipgil/scraperjs/tree/v0.3.0

  3. https://github.com/trending?l=javascript&since=monthly

  4. http://plrg.kaist.ac.kr/research/material

  5. http://plrg.kaist.ac.kr/research/material

  6. https://developers.google.com/speed/pagespeed/module/filter-js-inline

  7. https://developer.yahoo.com/performance/rules.html#external

  8. https://developers.google.com/analytics/devguides/collection/gajs/

  9. https://developers.google.com/speed/articles/javascript-dom

References

  • Abd-El-Hafiz SK (2012) A metrics-based data mining approach for software clone detection. In: Proceedings of the 36th annual computer software and applications conference. IEEE, pp 35–41

  • Alexa Internet Inc (2013) Alexa top sites. http://www.alexa.com/topsites

  • Antoniol G, Casazza G, Di Penta M, Merlo E (2001) Modeling clones evolution through time series. In: Proceedings of international conference on software maintenance. IEEE, pp 273–280

  • Antoniol G, Villano U, Merlo E, Di Penta M (2002) Analyzing cloning evolution in the Linux kernel. Inf Softw Technol 44(13):755–765

    Article  Google Scholar 

  • Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: Proceedings of the 11th European conference on software maintenance and reengineering. IEEE, pp 81–90

  • Bakerm BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of the 2nd working conference on reverse engineering. IEEE, pp 86–95

  • Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999) Measuring clone based reengineering opportunities. In: Proceedings of the 6th international software metrics symposium. IEEE, pp 292–303

  • Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of international conference on software maintenance. IEEE, pp 368–377

  • Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591

    Article  Google Scholar 

  • Bettenburg N, Shang W, Ibrahim WM, Adams B, Zou Y, Hassan AE (2012) An empirical study on inconsistent changes to code clones at the release level. Sci Comput Program 77(6):760–776

    Article  Google Scholar 

  • Blanco L, Dalvi N, Machanavajjhala A (2011) Highly efficient algorithms for structural clustering of large websites. In: Proceedings of the 20th international conference on World wide web. ACM, pp 437–446

  • Boldyreff C, Kewish R (2001) Reverse engineering to achieve maintainable WWW sites. In: Proceedings of the 8th working conference on reverse engineering. IEEE, pp 249–257

  • Brito e Abreu F (1995) The MOOD metrics set. In: Proceedings of European conference on object-oriented programming, vol 95, p 267

  • Brixtel R, Fontaine M, Lesner B, Bazin C, Robbes R (2010) Language-independent clone detection applied to plagiarism detection. In: Proceedings of the 10th IEEE working conference on source code analysis and manipulation. IEEE, pp 77–86

  • Bulychev P, Minea M (2009) An evaluation of duplicate code detection using anti-unification. In: Proceedings of the 3rd international workshop on software clones. Citeseer, pp 22–27

  • Burd E, Bailey J (2002) Evaluating clone detection tools for use during preventative maintenance. In: Proceedings of the 2nd international workshop on source code analysis and manipulation. IEEE, pp 36–43

  • Cai D, Kim M (2011) An empirical study of long-lived code clones. Fundamental approaches to software engineering, pp 432–446

  • Calefato F, Lanubile F, Mallardo T (2004) Function clone detection in web applications: a semiautomated approach. J Web Eng 3:3–21

    Google Scholar 

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Chugh R, Meister JA, Jhala R, Lerner S (2009) Staged information flow for javascript. In: ACM Sigplan Notices, vol 44. ACM, pp 50–62

  • Cordy JR, Dean TR, Synytskyy N (2004) Practical language-independent detection of near-miss clones. In: Proceedings of the 2004 conference of the centre for advanced studies on collaborative research. IBM Press, pp 1–12

  • Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 12th annual symposium on computational geometry. ACM, pp 253–262

  • De Lucia A, Francese R, Scanniello G, Tortora G (2005) Understanding cloned patterns in web applications. In: Proceedings of the 13th international workshop on program comprehension. IEEE, pp 333–336

  • De Lucia A, Scanniello G, Tortora G (2007) Identifying similar pages in web applications using a competitive clustering algorithm. J Softw Maint Evol Res Pract 19(5):281–296

    Article  Google Scholar 

  • De Lucia A, Risi M, Scanniello G, Tortora G (2009) An investigation of clustering algorithms in the identification of similar web pages. J Web Eng 8(4):346–370

    Google Scholar 

  • Di Lucca GA, Di Penta M, Fasolino AR (2002) An approach to identify duplicated web pages. In: Proceedings of the 26th annual international computer software and applications conference. IEEE, pp 481–486

  • Di Lucca GA, Fasolino AR, Tramontana P (2005) Recovering interaction design patterns in web applications. In: Proceedings of the 9th European conference on software maintenance and reengineering. IEEE, pp 366–374

  • Ducasse S, Nierstrasz O, Rieger M (2004) Lightweight detection of duplicated code–a language-independent approach. Institute for Applied Mathematics and Computer Science. University of Berne

  • Ducasse S, Nierstrasz O, Rieger M (2006) On the effectiveness of clone detection by string matching. J Softw Maint Evol Res Pract 18(1):37–58

    Article  Google Scholar 

  • Falke R, Frenzel P, Koschke R (2008) Empirical evaluation of clone detection using syntax suffix trees. Empir Softw Eng 13(6):601–643

    Article  Google Scholar 

  • Finifter M, Weinberger J, Barth A (2010) Preventing capability leaks in secure javascript subsets. In: NDSS

  • Fowler M, Beck K (1999) Refactoring: improving the design of existing code. Addison-Wesley, Reading

    Google Scholar 

  • GitHub Inc (2013) JavaScript Projects in GitHub. https://github.com/trending?l=javascript

  • Guarnieri S, Pistoia M, Tripp O, Dolby J, Teilhet S, Berg R (2011) Saving the world wide web from vulnerable JavaScript. In: Proceedings of the 20th international symposium on software testing and analysis

  • Harris S (2013) Simian–similarity analyser. http://www.harukizaemon.com/simian/index.html

  • Hegedűs P, Bakota T, Illés L, Ladányi G, Ferenc R, Gyimóthy T (2011) Source code metrics and maintainability: a case study. In: Software engineering, business continuity, and education. Springer, Berlin Heidelberg New York, pp 272–284

  • Hill R, Rideout J (2004) Automatic method completion. In: Proceedings of the 19th international conference on automated software engineering. IEEE, pp 228–235

  • Hotta K, Sano Y, Higo Y, Kusumoto S (2010) Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software. In: Proceedings of the joint ERCIM workshop on software evolution and international workshop on principles of software evolution, pp 73–82

  • Islam M, Islam M, Halim T (2011) A study of code cloning in server pages of web applications developed using classic asp. net and asp. net mvc framework. In: Proceedings of the 14th international conference on computer and information technology. IEEE, pp 497–502

  • Jang J, Agrawal A, Brumley D (2012) ReDeBug: finding unpatched code clones in entire os distributions. In: Proceedings of symposium on security and privacy. IEEE, pp 48–62

  • Jiang L, Misherghi G, Su Z, Glondu S (2007a) Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th international conference on software engineering. IEEE, pp 96–105

  • Jiang L, Su Z, Chiu E (2007b) Context-based detection of clone-related bugs. In: Proceedings of the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 55–64

  • Jones MC (2011) Remix and reuse of source code in software production. PhD thesis, Citeseer

  • Juergens E, Deissenboeck F, Hummel B (2009a) CloneDetective–a workbench for clone detection research. In: Proceedings of the 31st international conference on software engineering. IEEE, pp 603–606

  • Juergens E, Deissenboeck F, Hummel B, Wagner S (2009b) Do code clones matter?. In: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, pp 485–495

  • Kamei Y, Sato H, Monden A, Kawaguchi S, Uwano H, Nagura M, Matsumoto Ki, Ubayashi N (2011) An empirical study of fault prediction with code clone metrics. In: Proceedings of the joint conference of the 21th international workshop on software measurement and the 6th international conference on software process and product measurement. IEEE, pp 55–61

  • Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670

    Article  Google Scholar 

  • Kapser C, Godfrey M (2003) Toward a taxonomy of clones in source code: a case study. In: Proceedings of the conference on evolution of large scale industrial software architectures, pp 67–78

  • Kapser CJ, Godfrey MW (2008) “Cloning Considered harmful” considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645–692

    Article  Google Scholar 

  • Kienle HM, Müller HA, Weber A (2003) In the web of generated “clones” (position paper)

  • Kim H, Jung Y, Kim S, Yi K (2011) Mecc: memory comparison-based clone detector. In: Proceedings of the 33rd international conference on software engineering. IEEE, pp 301–310

  • Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. ACM SIGSOFT Softw Eng Notes 30(5):187–196

    Article  Google Scholar 

  • Kontogiannis K (1997) Evaluation experiments on the detection of programming patterns using software metrics. In: Proceedings of the 4th working conference on reverse engineering. IEEE, pp 44–54

  • Kontogiannis K, DeMori R, Merlo E, Galler M, Bernstein M (1996) Pattern matching for clone and concept detection. Autom Softw Eng 3(1-2):77–108

    Article  MathSciNet  Google Scholar 

  • Koschke R (2007) Survey of research on software clones. Duplication, redundancy, and similarity in software. http://drops.dagstuhl.de/volltexte/2007/962/

  • Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: Proceedings of the 13th working conference on reverse engineering. IEEE, pp 253–262

  • Koschke R, Baxter ID, Conradt M, Cordy JR (2012) Software clone management towards industrial application (dagstuhl seminar 12071). Dagstuhl Reports 2(2)

  • Kou G, Lou C (2012) Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data. Ann Oper Res 197(1):123–134

    Article  Google Scholar 

  • Kozlov D, Koskinen J, Sakkinen M, Markkula J (2010) Exploratory analysis of the relations between code cloning and open source software quality. In: Proceedings of the 7th international conference on the quality of information and communications technology. IEEE, pp 358–363

  • Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: Proceedings of the 14th working conference on reverse engineering. IEEE, pp 170–178

  • Krinke J (2008) Is cloned code more stable than non-cloned code?. In: Proceedings of the 8th international working conference on source code analysis and manipulation. IEEE, pp 57–66

  • Krinke J (2011) Is cloned code older than non-cloned code?. In: Proceedings of the 5th international workshop on software clones. ACM, pp 28–33

  • Lague B, Proulx D, Mayrand J, Merlo E, Hudepohl J (1997) Assessing the benefits of incorporating function clone detection in a development process. In: Proceedings of the international conference on software maintenance, vol 97

  • Lanubile F, Mallardo T (2003) Finding function clones in web applications. In: Proceedings of the 7th European conference on software maintenance and reengineering. IEEE, pp 379–386

  • Lee H, Won S, Jin J, Cho J, Ryu S (2012) SAFE: Formal specification and implementation of a scalable analysis framework for ECMAScript. In: Proceedings of the 19th international workshop on foundations of object-oriented languages

  • Li C, Sun J, Chen H (2014) An improved method for tree-based clone detection in web applications. In: Proceedings of the 4th international conference on digital information and communication technology and it’s applications. IEEE, pp 363–367

  • Li J, Ernst MD (2012) Cbcd: cloned buggy code detector. In: Proceedings of the 2012 international conference on software engineering. IEEE Press, pp 310–320

  • Livieri S, Higo Y, Matsushita M, Inoue K (2007) Analysis of the linux kernel evolution using code clone coverage. In: Proceedings of the 4th international workshop on mining software repositories. IEEE, pp 22–22

  • Lozano A, Wermelinger M, Nuseibeh B (2008) Evaluating the relation between changeability decay and the characteristics of clones and methods. In: Proceedings of the 23rd international conference on automated software engineering-workshops. IEEE, pp 100–109

  • Martin D, Cordy JR (2011) Analyzing web service similarity using contextual clones. In: Proceedings of the 5th international workshop on software clones. ACM, pp 41–46

  • Martinsen JK, Grahn H, Isberg A (2011) A comparative evaluation of javascript execution behavior. In: Web engineering. Springer, Berlin Heidelberg New York, pp 399–402

  • Mayrand J, Leblanc C, Merlo EM (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of international conference on software maintenance. IEEE, pp 244–253

  • Merlo E, Antoniol G, Di Penta M, Rollo VF (2004) Linear complexity object-oriented similarity for clone detection and software evolution analyses. In: Proceedings of the 20th international conference on software maintenance. IEEE, pp 412–416

  • Merlo E, Dagenais M, Bachand P, Sormani J, Gradara S, Antoniol G (2002) Investigating large software system evolution: the Linux kernel. In: Proceedings of the 26th annual international computer software and applications conference. IEEE, pp 421–426

  • Mondal M, Roy CK, Rahman MS, Saha RK, Krinke J, Schneider KA (2012) Comparative stability of cloned and non-cloned code: an empirical study. In: Proceedings of the 27th annual symposium on applied computing. ACM, pp 1227–1234

  • Monden A, Nakae D, Kamiya T, Sato S, Matsumoto K (2002) Software quality analysis by code clones in industrial legacy software. In: Proceedings of the 8th symposium on software metrics. IEEE, pp 87–94

  • Muhammad T, Zibran MF, Yamamoto Y, Roy CK (2013) Near-miss clone patterns in web applications: an empirical study with industrial systems. In: Canadian conference on electrical and computer engineering

  • Negara N, Tsantalis N, Stroulia E (2013) Feature detection in ajax-enabled web applications. In: Proceedings of the 17th European conference on software maintenance and reengineering. IEEE, pp 154–163

  • Nikiforakis N, Invernizzi L, Kapravelos A, Van Acker S, Joosen W, Kruegel C, Piessens F, Vigna G (2012) You are what you include: large-scale evaluation of remote javascript inclusions. In: Proceedings of the 2012 ACM conference on computer and communications security. ACM, pp 736–747

  • Ocariza F, Pattabiraman K, Zorn B (2011) Javascript errors in the wild: an empirical study. In: Proceedings of the 22nd international symposium on software reliability engineering. IEEE, pp 100–109

  • Patenaude JF, Merlo E, Dagenais M, Laguë B (1999) Extending software quality assessment techniques to Java systems. In: Proceedings of the 7th international workshop on program comprehension. IEEE, pp 49–56

  • PLRG@KAIST (2012) SAFE: Scalable Analysis Framework for ECMAScript. http://plrg.kaist.ac.kr/redmine/projects/jsf/repository

  • PMD (2013) PMD’s copy/paste detector. http://pmd.sourceforge.net/pmd-5.0.5/cpd-usage.html

  • Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell Empir Softw Eng 17(4-5):503–530

    Article  Google Scholar 

  • Rajapakse D, Jarzabek S (2005) An investigation of cloning in web applications. Web Engineering pp 252–262

  • Rajapakse DC, Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings of the 29th international conference on software engineering. IEEE, pp 116– 126

  • Ramage D, Heymann P, Manning CD, Garcia-Molina H (2009) Clustering the tagged web. In: Proceedings of the 2nd ACM international conference on web search and data mining. ACM, pp 54–63

  • Ratanaworabhan P, Livshits B, Zorn BG (2010) JSMeter: comparing the behavior of JavaScript benchmarks with real web applications. In: Proceedings of the 2010 USENIX conference on Web application development. USENIX Association, pp 3–3

  • Richards G, Hammer C, Burg B, Vitek J (2011) The eval that men do. In: Proceedings of the 25th European conference on object-oriented programming. Springer, Berlin Heidelberg New York, pp 52–78

  • Richards G, Lebresne S, Burg B, Vitek J (2010) An analysis of the dynamic behavior of JavaScript programs. In: Proceedings of the SIGPLAN conference on programming language design and implementation, vol 45. ACM, pp 1–12

  • Rieger M, Ducasse S, Lanza M (2004) Insights into system-wide code duplication. In: Proceedings of the 11th working conference on reverse engineering. IEEE, pp 100–109

  • Roy C, Cordy J (2007) A survey on software clone detection research. Queen’s School of Computing TR 541:115

  • Roy C, Cordy J (2010a) Are scripting languages really different?. In: Proceedings of the 4th international workshop on software clones. ACM, pp 17–24

  • Roy C, Cordy J (2010b) Near-miss function clones in open source software: an empirical study. J Softw Maint Evol Res Pract 22(3):165–189

    Google Scholar 

  • Roy CK, Cordy JR (2008) An empirical study of function clones in open source software. In: Proceedings of the 15th working conference on reverse engineering. IEEE, pp 81–90

  • Roy CK, Cordy JR (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: Proceedings of the international conference on software testing, verification and validation workshops. IEEE, pp 157–166

  • Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74(7):470–495

    Article  MathSciNet  MATH  Google Scholar 

  • Roy CK, Zibran MF, Koschke R (2014) The vision of software clone management: past, present, and future. In: Proceedings of the IEEE CSMR-18/WCRE-21 software evolution week

  • Rysselberghe FV, Demeyer S (2004) Evaluating clone detection techniques from a refactoring perspective. In: Proceedings of the 19th international conference on automated software engineering. IEEE Computer Society, pp 336–339

  • SAFE Corporation (2012) CodeMatch. http://www.safe-corp.biz/products_codematch.htm

  • Saha RK, Roy CK, Schneider KA (2011) An automatic framework for extracting and classifying near-miss clone genealogies. In: Proceedings of the 27th international conference on software maintenance. IEEE, pp 293–302

  • Schleimer S, Wilkerson DS, Aiken A (2003) Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data. ACM, pp 76–85

  • Selamat A, Wahid N (2007) Code clone detection using string based tree matching technique. InTech

  • Shawky DM, Ali AF (2010) An approach for assessing similarity metrics used in metric-based clone detection techniques. In: Proceedings of the 3rd international conference on computer science and information technology, vol 1. IEEE, pp 580–584

  • Stephan M, Alalfi MH, Stevenson A, Cordy JR (2013) Using mutation analysis for a model-clone detector comparison framework. In: Proceedings of the 2013 international conference on software engineering. IEEE Pres, Piscataway, pp 1261–1264

  • Stephan M, Alalfi MH, Cordy JR (2014) Towards a taxonomy for simulink model mutations. In: Proceedings of the 7th international conference on software testing, verification and validation workshops. IEEE, pp 206–215

  • Svajlenko J, Roy CK, Zibran MF, Cordy JR (2013) A mutation analysis based benchmarking framework for clone detectors. In: Proceedings of short/tool papers track of the ICSE 7th international workshop on software clones

  • Tairas R, Gray J (2006) Phoenix-based clone detection using suffix trees. In: Proceedings of the 44th annual southeast regional conference. ACM, pp 679–684

  • Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2010) An empirical study on the maintenance of source code clones. Empir Softw Eng 15(1):1–34

    Article  Google Scholar 

  • Van Welie M, Van der Veer GC (2003) Pattern languages in interaction design: structure and organization. In: Proceedings of interact, vol 3, pp 1–5

  • Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: a rigorous approach to clone evaluation. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 455–465

  • Wikipedia (2015) List of graphical user interface builders and rapid application development tools. http://en.wikipedia.org/wiki/List_of_graphical_user_interface_builders_and_rapid_application_development_tools

  • Yamanaka Y, Choi E, Yoshida N, Inoue K, Sano T (2013) Applying clone change notification system into an industrial development process. In: Proceedings of the 21st international conference on program comprehension. IEEE, pp 199–206

  • Zibran MF, Roy CK (2012) Ide-based real-time focused search for near-miss clones. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1235–1242

  • Zibran MF, Saha RK, Asaduzzaman M, Roy CK (2011) Analyzing and forecasting near-miss clones in evolving software: an empirical study. In: Proceedings of the 16th international conference on engineering of complex computer systems. IEEE, pp 295–304

Download references

Acknowledgments

This work is supported in part by Korea Ministry of Education, Science and Technology(MEST) / National Research Foundation of Korea(NRF) (Grants NRF-2014R1A2A2A01003235 and NRF-2008-0062609), Samsung Electronics, and Google.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sukyoung Ryu.

Additional information

Communicated by: Andrea De Lucia

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheung, W.T., Ryu, S. & Kim, S. Development nature matters: An empirical study of code clones in JavaScript applications. Empir Software Eng 21, 517–564 (2016). https://doi.org/10.1007/s10664-015-9368-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9368-6

Keywords

Navigation