Skip to main content
Log in

Characterizing refactoring graphs in Java and JavaScript projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Refactoring is an essential activity during software evolution. Frequently, practitioners rely on such transformations to improve source code maintainability and quality. As a consequence, this process may produce new source code entities or change the structure of existing ones. Sometimes, the transformations are atomic, i.e., performed in a single commit. In other cases, they generate sequences of modifications performed over time. To study and reason about refactorings over time, we rely on refactoring graphs. Using this abstraction, we provide quantitative and qualitative investigation on 20 popular open-source Java and JavaScript-based projects. After eliminating trivial graphs, we characterize a large sample of 1,525 refactoring graphs, providing quantitative data on their size, commits, age, refactoring composition, ownership, operations over time, and refactoring graph patterns. Besides, we contact the authors of subgraphs describing large refactoring operations to understand the reasons behind their operations. We conclude by discussing applications and implications of refactoring graphs, for example, to improve code comprehension, detect refactoring patterns, and support software evolution studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35

Similar content being viewed by others

Notes

  1. https://refactoring-graph.github.io

  2. https://github.com/alinebrito/refactoring-graph-generator

  3. In Section 5.1.1, we detail the results of a precision analysis of RefDiff: Java (87%) and JavaScript (93%)

  4. https://insights.stackoverflow.com/survey/2020

  5. https://github.com/iluwatar/java-design-patterns

  6. https://github.com/airbnb/javascript

  7. https://git-scm.com/docs/git-log#Documentation/git-log.txt---first-parent

  8. https://github.com/vuejs/vue/tree/dev/packages/vue-server-renderer

  9. GSpan output does not include information about the edges, such as commit or date. The algorithm only reports the occurrence of a pattern in a set of subgraphs. As a consequence, for graph patterns involving a single element (i.e., refactoring from the same source or refactoring to the same target), it is not possible to infer they include refactorings over time.

  10. All 9,200 subgraphs presented in Table 3 minus the 289 subgraphs with cycles.

  11. All 2,141 subgraphs presented in Table 4 minus the 47 subgraphs with cycles.

  12. https://github.com/PhilJay/MPAndroidChart/commit/13104b26

  13. https://github.com/PhilJay/MPAndroidChart/commit/063c4bb0

  14. https://github.com/PhilJay/MPAndroidChart/commit/d930ac23

  15. https://github.com/quilljs/quill/commit/aee9b867

  16. https://github.com/quilljs/quill/commit/e1d76d9f

  17. https://github.com/elastic/elasticsearch/commit/9ee492a3f07

  18. https://github.com/elastic/elasticsearch/commit/11fe52ad767

  19. https://github.com/spring-projects/spring-framework/commit/794693525f

  20. https://github.com/spring-projects/spring-framework/commit/91e96d8084

  21. https://github.com/vuejs/vue/commit/351aef3c

  22. https://github.com/vuejs/vue/commit/7a2c9867

  23. https://github.com/vuejs/vue/commit/de7764a3

  24. https://github.com/vuejs/vue/commit/df82aeb0

  25. https://github.com/facebook/fresco/commit/02ef6e0f

  26. https://github.com/facebook/fresco/commit/b76f56ef

  27. https://github.com/facebook/fresco/commit/017c007b

  28. https://github.com/parcel-bundler/parcel/commit/38d4a830

  29. https://github.com/parcel-bundler/parcel/commit/e4cee192

  30. https://github.com/parcel-bundler/parcel/commit/dd3ea464

  31. https://github.com/square/okhttp/commit/daf2ec6b9

  32. https://github.com/square/okhttp/commit/c5a26fefd

  33. https://github.com/square/okhttp/commit/a32b1044a

  34. https://github.com/facebook/react/commit/50988911

  35. https://github.com/facebook/react/commit/9fe10312

  36. https://github.com/bumptech/glide/commit/6bbe4343c

  37. https://github.com/bumptech/glide/commit/c572847b4

  38. https://github.com/spring-projects/spring-framework/commit/c43acd7675

  39. https://github.com/ReactiveX/RxJava/commit/320495fde

  40. https://docs.google.com/spreadsheets/d/1eBsZW37z1w1dt77S6DIukdgGZF9fndrsVZ2vYyIh5pg

  41. https://refactoring-graph.github.io/#/request/request/0

  42. https://refactoring-graph.github.io/#/square/okhttp/485

  43. https://refactoring-graph.github.io/#/ReactiveX/RxJava/784

  44. https://refactoring-graph.github.io/#/elastic/elasticsearch/308

  45. https://refactoring-graph.github.io/#/spring-projects/spring-framework/2820

  46. https://docs.google.com/spreadsheets/d/1eBsZW37z1w1dt77S6DIukdgGZF9fndrsVZ2vYyIh5pg

References

  • AlOmar EA, Mkaouer M, Ouni A (2021) Toward the automatic classification of self-affirmed refactoring. J Syst Soft (JSS) 171:110821

    Article  Google Scholar 

  • Alves ELG, Song M, Kim M (2014) RefDistiller: A refactoring aware code review tool for inspecting manual refactoring edits. In: 22nd international symposium on foundations of software engineering (FSE), pp 751–754

  • Avelino G, Passos L, Hora A, Valente MT (2016) A novel approach for estimating truck factors. In: 24th international conference on program comprehension (ICPC), pp 1–10

  • Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: 35th international conference on software engineering (ICSE), pp 712–721

  • Bavota G, De Carluccio B, De Lucia A, Di Penta M, Oliveto R, Strollo O (2012) When does a refactoring induce bugs? an empirical study. In: 12th international working conference on source code analysis and manipulation (SCAM), pp 104–113

  • Bavota G, Lucia AD, Penta MD, Oliveto R, Palomba F (2015) An experimental investigation on the innate relationship between quality and refactoring. J Syst Softw 107(C):1–14

    Article  Google Scholar 

  • Bibiano A, Soares V, Coutinho D, Fernandes E, Correia J, Santos K, Oliveira A, Garcia A, Gheyi R, BaldoinoFonseca, Ribeiro M, Silva C, Oliveira D (2020) How does incomplete composite refactoring affect internal quality attributes. In: 28th international conference on program comprehension (ICPC), pp 149–159

  • Bibiano AC, Garcia EFDOA, Kalinowski M, Fonseca B, Oliveira R, Oliveira A, Cedrim D (2019) A quantitative study on characteristics and effect of batch refactoring on code smells. In: 13th international symposium on empirical software engineering and measurement (ESEM), pp 1–11

  • Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of GitHub repositories. In: 32nd international conference on software maintenance and evolution (ICSME), pp 334–344

  • Borges H, Valente MT (2018) What’s in a GitHub star? Understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129

    Article  Google Scholar 

  • Brito A, Hora A, Valente MT (2020) Refactoring graphs: Assessing refactoring over time. In: International conference on software analysis, evolution and reengineering (SANER), pp 367–377

  • Brito A, Xavier L, Hora A, Valente MT (2018) APIDiff: Detecting API breaking changes. In: 25th international conference on software analysis, evolution and reengineering (saner), tool track, pp 507–511

  • Brito R, Valente MT (2020) RefDiff4Go: Detecting refactorings in Go. In: 14th brazilian symposium on software components, architectures, and reuse (SBCARS), pp 101–110

  • Brito R, Valente MT (2021) RAID - Refactoring aware and intelligent diffs. In: 29th international conference on program comprehension (ICPC), pp 265–275

  • Catolino G, Palomba F, Tamburri DA, Serebrenik A, Ferrucci F (2020) Refactoring community smells in the wild: The practitioner’s field manual. In: 42nd international conference on software engineering: companion proceedings (ICSE), pp 25–34

  • Cedrim D (2018) Understanding and improving batch refactoring in software systems. Ph.D. thesis, PUC-Rio

  • Chaparro O, Bavota G, Marcus A, Penta MD (2014) On the impact of refactoring operations on code quality metrics. In: 30th international conference on software maintenance and evolution (ICSME), pp 456–460

  • Chen TH, Nagappan M, Shihab E, Hassan AE (2014) An empirical study of dormant bugs. In: 11th working conference on mining software repositories (MSR)

  • Cruzes DS, Dyba T (2011) Recommended steps for thematic synthesis in software engineering. In: 5th international symposium on empirical software engineering and measurement (ESEM), pp 275–284

  • da Cost DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. Trans Soft Eng 43(7):641–657

    Article  Google Scholar 

  • Di Penta M, Bavota G, Zampetti F (2020) On the relationship between refactoring actions and bugs: A differentiated replication. In: 28th european software engineering conference and symposium on the foundations of software engineering (FSE), pp 556–567

  • Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: 20th european conference on object-oriented programming (ECOOP), pp 404–428

  • Dig D, Johnson R (2005) How do APIs evolve? a story of refactoring. In: 22nd international conference on software maintenance (ICSM), pp 83–107

  • Fernandes E (2019) Stuck in the middle: Removing obstacles to new program features through batch refactoring. In: 41st international conference on software engineering: companion proceedings (ICSE), pp 206–209

  • Fernandes E, Uchôa A., Bibiano AC, Garcia A (2019) On the alternatives for composing batch refactoring. In: 3rd international workshop on refactoring (IWOR), pp 9–12

  • Fowler M (1999) Refactoring: Improving the design of existing code. Addison-Wesley, Reading, MA

    MATH  Google Scholar 

  • Ge X, Sarkar S, Murphy-Hill E (2014) Towards refactoring-aware code review. In: 7th international workshop on cooperative and human aspects of software engineering (CHASE). ACM, pp 99–102

  • Ge X, Sarkar S, Witschey J, Murphy-Hill E (2017) Refactoring-aware code review. In: Symposium on visual languages and human-centric computing (VL/HCC), pp 71–79

  • Gómez VU, Ducasse S, D’Hondt T (2010) Visually supporting source code changes integration: the Torch dashboard. In: 17th working conference on reverse engineering (WCRE)

  • Gómez VU, Ducasse S, D’Hondt T (2015) Visually characterizing source code changes. Sci Comput Program 98(P3):376–393

    Article  Google Scholar 

  • Grund F, Chowdhury S, Bradley N, Hall B, Holmes R (2021) CodeShovel: Constructing method-level source code histories. In: 43rd international conference on software engineering: Companion proceedings (ICSE), pp 1510–1522

  • Hattori L, Lanza M (2009) Mining the history of synchronous changes to refine code ownership. In: 6th international working conference on mining software repositories (MSR), pp 141–150

  • Hayashi S, Thangthumachit S, Saeki M (2013) Rediffs: Refactoring-aware difference viewer for Java. In: 20th working conference on reverse engineering (WCRE), pp 487–488

  • Higo Y, Hayashi S, Kusumoto S (2020) On tracking Java methods with git mechanisms. J Syst Soft (JSS) pp 165

  • Hinkle D, Wiersma W, Jurs S (2003) Applied statistics for the behavioral sciences. Houghton Mifflin, Boston

    Google Scholar 

  • Hora A, Robbes R (2020) Characteristics of method extractions in Java: A large scale empirical study. Empir Softw Eng 25:1798–1833

    Article  Google Scholar 

  • Hora A, Silva D, Robbes R, Valente MT (2018) Assessing the threat of untracked changes in software evolution. In: 40th international conference on software engineering (ICSE), pp 1102–1113

  • Iammarino M, Zampetti F, Aversano L, Penta MD (2019) Self-admitted technical debt removal and refactoring actions: Co-occurrence or more?. In: 35th international conference on software maintenance and evolution (ICSME), pp 186–190

  • Jiang Y, Liu H, Niu N, Zhang L, Hu Y (2021) Extracting concise bug-fixing patches from human-written patches in version control systems. In: 43rd international conference on software engineering (ICSE), pp 1–13

  • Jiau HC, Mar LW, Chen JC (2013) OBEY: Optimal batched refactoring plan execution for class responsibility redistribution. Trans Soft Eng 39 (9):1245–1263

    Article  Google Scholar 

  • Kim J, Batory D, Dig D, Azanza M (2016) Improving refactoring speed by 10x. In: 38th international conference on software engineering (ICSE), pp 1145–1156

  • Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: A refactoring reconstruction tool based on logic query templates. In: 8th international symposium on foundations of software engineering (FSE), pp 371–372

  • Kim M, Zimmermann T, Nagappan N (2012) A field study of refactoring challenges and benefits. In: 20th international symposium on the foundations of software engineering (FSE), pp 50:1–50:11

  • Kim M, Zimmermann T, Nagappan N (2014) An empirical study of refactoring challenge and benefits at Microsoft. Trans Soft Eng 40(7):633–649

    Article  Google Scholar 

  • Kim S, Zimmermann T, Pan K, Whitehead EJJ (2006) Automatic identification of bug-introducing changes. In: 21st international conference on automated software engineering (ASE), pp 81–90

  • Leung C (2010) Technical notes on extending gSpan to directed graphs. Tech. rep., Singapore Management University

  • Lin B, Nagy C, Bavota G, Lanza M (2019) On the impact of refactoring operations on code naturalness. In: 26th international conference on software analysis, evolution and reengineering (SANER), pp 594–598

  • Lin Y, Peng X, Cai Y, Dig D, Zheng D, Zhao W (2016) Interactive and guided architectural refactoring with search-based recommendation. In: 24th international symposium on foundations of software engineering (FSE), pp 535–546

  • Mahmoudi M, Nadi S, Tsantalis N (2019) Are refactorings to blame? an empirical study of refactorings in merge conflicts. In: 26th international conference on software analysis, evolution and reengineering (SANER), pp 151–162

  • Mazinanian D, Ketkar A, Tsantalis N, Dig D (2017) Understanding the use of lambda expressions in Java. Program Lang 1(85):85:1–85:31

    Google Scholar 

  • Meananeatra P (2012) Identifying refactoring sequences for improving software maintainability. In: 27th international conference on automated software engineering (ASE), pp 406–409

  • Meneely A, Williams O (2012) Interactive churn metrics: socio-technical variants of code churn. Softw Eng Notes 37(6)

  • Murphy-Hill E, Parnin C, Black AP (2009) How we refactor, and how we know it. In: 31st international conference on software engineering (ICSE), pp 287–297

  • Negara S, Chen N, Vakilian M, Johnson RE, Dig D (2013) A comparative study of manual and automated refactorings. In: 27th european conference on object-oriented programming (ECOOP), pp 552–576

  • Neto EC, da Costa DA, Kulesza U (2018) The impact of refactoring changes on the SZZ algorithm: An empirical study. In: 25th international conference on software analysis, evolution and reengineering (SANER), pp 380–390

  • Paixao M, Uchôa A., Bibiano AC, Oliveira D, Garcia A, Krinke J, Arvonio E (2020) Behind the intents: An in-depth empirical study on software refactoring in modern code review. In: 17th international conference on mining software repositories (MSR), pp 125–136

  • Palomba F, Zaidman A, Oliveto R, Lucia AD (2017) An exploratory study on the relationship between changes and refactoring. In: 25th international conference on program comprehension (ICPC), pp 176–185

  • Pantiuchina J, Zampetti F, Scalabrino S, Piantadosi V, Oliveto R, Bavota G, Penta MD (2020) Why developers refactor source code: A mining-based study. ACM Trans Softw Eng Methodol 37(4):1–32

    Article  Google Scholar 

  • Peruma A, Mkaouer M, Decker M, Newman C (2018) An empirical investigation of how and why developers rename identifiers. In: 2nd international workshop on refactoring (IWoR), pp 26–33

  • Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: 33rd international conference on software engineering (ICSE), pp 491–500

  • Rahman F, Posnett D, Hindle A, Barr E, Devanbu P (2011) BugCache for inspections: hit or miss?. In: 19th international symposium on the foundations of software engineering (FSE), pp 322–331

  • Ray B, Hellendoorn V, Godhane S, Tu Z, Bacchelli A, Devanbu P (2016) On the naturalness of buggy code. In: 38th international conference on software engineering (ICSE), pp 428–439

  • Shen B, Zhang W, Zhao H, Liang G, Jin Z, Wang Q (2019) IntelliMerge: A refactoring-aware software merging technique. Program Lang 3(170):170:1–170:28

    Google Scholar 

  • Silva D, da Silva JP, Santos G, Terra R, Valente MT (2021) RefDiff 2.0: A multi-language refactoring detection tool. IEEE Trans Softw Eng 1 (1):1–17

    Article  Google Scholar 

  • Silva D, Tsantalis N, Valente MT (2016) Why we refactor? Confessions of GitHub contributors. In: 24th international symposium on the foundations of software engineering (FSE), pp 858–870

  • Silva D, Valente MT (2017) RefDiff: Detecting refactorings in version histories. In: 14th international conference on mining software repositories (MSR), pp 269–279

  • Sousa L, Cedrim D, Garcia A, Oizumi W, Bibiano AC, Oliveira D, Kim M, Oliveira A (2020) Characterizing and identifying composite refactorings: Concepts, heuristics and patterns. In: 17th international conference on mining software repositories (MSR), pp 186–197

  • Spadini D, Aniche M, Bacchelli A (2018) PyDriller: Python framework for mining software repositories. In: 26th software engineering conference and symposium on the foundations of software engineering (FSE), pp 908–911

  • Spinellis D (2017) A repository of Unix history and evolution. Empir Softw Eng 22(3):1372–1404

    Article  Google Scholar 

  • Szóke G, Nagy C, Ferenc R, Gyimóthy T (2016) Designing and developing automated refactoring transformations: An experience report. In: 23rd international conference on software analysis, evolution, and reengineering (SANER), pp 693–697

  • Tenorio D, Bibiano AC, Garcia A (2019) On the customization of batch refactoring. In: 3rd international workshop on refactoring (IWOR), pp 13–16

  • Terra R, Valente MT, Miranda S, Sales V (2018) JMove: A novel heuristic and tool to detect move method refactoring opportunities. J Syst Soft 138:19–36

    Article  Google Scholar 

  • Tsantalis N, Guana V, Stroulia E, Hindle A (2013) A multidimensional empirical study on refactoring activity. In: 23th conference of the center for advanced studies on collaborative research (CASCON), pp 132–146

  • Tsantalis N, Ketkar A, Dig D (2020) RefactoringMiner 2.0 IEEE Trans Softw Eng

  • Tsantalis N, Mansouri M, Eshkevari LM, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: 40th international conference on software engineering (ICSE), pp 483–494

  • Vassallo C, Grano G, Palomba F, Gall H, Bacchelli A (2019) A large-scale empirical exploration on refactoring activities in open source software projects. Sci Comput Program 180:1–15

    Article  Google Scholar 

  • Wang Y (2009) What motivate software engineers to refactor source code? evidences from professional developers. In: International conference on software maintenance (ICSM), pp 413–416

  • Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: 2nd international conference on data mining (ICDM), pp 721–724

  • Zimmermann T, Kim S, Zeller A, Whitehead Jr. EJ (2006) Mining version archives for co-changed lines. In: 3rd international workshop on mining software repositories (MSR), pp 72–75

Download references

Acknowledgements

This research is supported by grants from FAPEMIG, CNPq, and CAPES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aline Brito.

Additional information

Communicated by: Robert Feldt and Thomas Zimmermann

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brito, A., Hora, A. & Valente, M.T. Characterizing refactoring graphs in Java and JavaScript projects. Empir Software Eng 26, 125 (2021). https://doi.org/10.1007/s10664-021-10023-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10023-3

Keywords

Navigation