Abstract
Refactoring is an essential activity during software evolution. Frequently, practitioners rely on such transformations to improve source code maintainability and quality. As a consequence, this process may produce new source code entities or change the structure of existing ones. Sometimes, the transformations are atomic, i.e., performed in a single commit. In other cases, they generate sequences of modifications performed over time. To study and reason about refactorings over time, we rely on refactoring graphs. Using this abstraction, we provide quantitative and qualitative investigation on 20 popular open-source Java and JavaScript-based projects. After eliminating trivial graphs, we characterize a large sample of 1,525 refactoring graphs, providing quantitative data on their size, commits, age, refactoring composition, ownership, operations over time, and refactoring graph patterns. Besides, we contact the authors of subgraphs describing large refactoring operations to understand the reasons behind their operations. We conclude by discussing applications and implications of refactoring graphs, for example, to improve code comprehension, detect refactoring patterns, and support software evolution studies.
Similar content being viewed by others
Notes
In Section 5.1.1, we detail the results of a precision analysis of RefDiff: Java (87%) and JavaScript (93%)
GSpan output does not include information about the edges, such as commit or date. The algorithm only reports the occurrence of a pattern in a set of subgraphs. As a consequence, for graph patterns involving a single element (i.e., refactoring from the same source or refactoring to the same target), it is not possible to infer they include refactorings over time.
All 9,200 subgraphs presented in Table 3 minus the 289 subgraphs with cycles.
All 2,141 subgraphs presented in Table 4 minus the 47 subgraphs with cycles.
References
AlOmar EA, Mkaouer M, Ouni A (2021) Toward the automatic classification of self-affirmed refactoring. J Syst Soft (JSS) 171:110821
Alves ELG, Song M, Kim M (2014) RefDistiller: A refactoring aware code review tool for inspecting manual refactoring edits. In: 22nd international symposium on foundations of software engineering (FSE), pp 751–754
Avelino G, Passos L, Hora A, Valente MT (2016) A novel approach for estimating truck factors. In: 24th international conference on program comprehension (ICPC), pp 1–10
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: 35th international conference on software engineering (ICSE), pp 712–721
Bavota G, De Carluccio B, De Lucia A, Di Penta M, Oliveto R, Strollo O (2012) When does a refactoring induce bugs? an empirical study. In: 12th international working conference on source code analysis and manipulation (SCAM), pp 104–113
Bavota G, Lucia AD, Penta MD, Oliveto R, Palomba F (2015) An experimental investigation on the innate relationship between quality and refactoring. J Syst Softw 107(C):1–14
Bibiano A, Soares V, Coutinho D, Fernandes E, Correia J, Santos K, Oliveira A, Garcia A, Gheyi R, BaldoinoFonseca, Ribeiro M, Silva C, Oliveira D (2020) How does incomplete composite refactoring affect internal quality attributes. In: 28th international conference on program comprehension (ICPC), pp 149–159
Bibiano AC, Garcia EFDOA, Kalinowski M, Fonseca B, Oliveira R, Oliveira A, Cedrim D (2019) A quantitative study on characteristics and effect of batch refactoring on code smells. In: 13th international symposium on empirical software engineering and measurement (ESEM), pp 1–11
Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of GitHub repositories. In: 32nd international conference on software maintenance and evolution (ICSME), pp 334–344
Borges H, Valente MT (2018) What’s in a GitHub star? Understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129
Brito A, Hora A, Valente MT (2020) Refactoring graphs: Assessing refactoring over time. In: International conference on software analysis, evolution and reengineering (SANER), pp 367–377
Brito A, Xavier L, Hora A, Valente MT (2018) APIDiff: Detecting API breaking changes. In: 25th international conference on software analysis, evolution and reengineering (saner), tool track, pp 507–511
Brito R, Valente MT (2020) RefDiff4Go: Detecting refactorings in Go. In: 14th brazilian symposium on software components, architectures, and reuse (SBCARS), pp 101–110
Brito R, Valente MT (2021) RAID - Refactoring aware and intelligent diffs. In: 29th international conference on program comprehension (ICPC), pp 265–275
Catolino G, Palomba F, Tamburri DA, Serebrenik A, Ferrucci F (2020) Refactoring community smells in the wild: The practitioner’s field manual. In: 42nd international conference on software engineering: companion proceedings (ICSE), pp 25–34
Cedrim D (2018) Understanding and improving batch refactoring in software systems. Ph.D. thesis, PUC-Rio
Chaparro O, Bavota G, Marcus A, Penta MD (2014) On the impact of refactoring operations on code quality metrics. In: 30th international conference on software maintenance and evolution (ICSME), pp 456–460
Chen TH, Nagappan M, Shihab E, Hassan AE (2014) An empirical study of dormant bugs. In: 11th working conference on mining software repositories (MSR)
Cruzes DS, Dyba T (2011) Recommended steps for thematic synthesis in software engineering. In: 5th international symposium on empirical software engineering and measurement (ESEM), pp 275–284
da Cost DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. Trans Soft Eng 43(7):641–657
Di Penta M, Bavota G, Zampetti F (2020) On the relationship between refactoring actions and bugs: A differentiated replication. In: 28th european software engineering conference and symposium on the foundations of software engineering (FSE), pp 556–567
Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: 20th european conference on object-oriented programming (ECOOP), pp 404–428
Dig D, Johnson R (2005) How do APIs evolve? a story of refactoring. In: 22nd international conference on software maintenance (ICSM), pp 83–107
Fernandes E (2019) Stuck in the middle: Removing obstacles to new program features through batch refactoring. In: 41st international conference on software engineering: companion proceedings (ICSE), pp 206–209
Fernandes E, Uchôa A., Bibiano AC, Garcia A (2019) On the alternatives for composing batch refactoring. In: 3rd international workshop on refactoring (IWOR), pp 9–12
Fowler M (1999) Refactoring: Improving the design of existing code. Addison-Wesley, Reading, MA
Ge X, Sarkar S, Murphy-Hill E (2014) Towards refactoring-aware code review. In: 7th international workshop on cooperative and human aspects of software engineering (CHASE). ACM, pp 99–102
Ge X, Sarkar S, Witschey J, Murphy-Hill E (2017) Refactoring-aware code review. In: Symposium on visual languages and human-centric computing (VL/HCC), pp 71–79
Gómez VU, Ducasse S, D’Hondt T (2010) Visually supporting source code changes integration: the Torch dashboard. In: 17th working conference on reverse engineering (WCRE)
Gómez VU, Ducasse S, D’Hondt T (2015) Visually characterizing source code changes. Sci Comput Program 98(P3):376–393
Grund F, Chowdhury S, Bradley N, Hall B, Holmes R (2021) CodeShovel: Constructing method-level source code histories. In: 43rd international conference on software engineering: Companion proceedings (ICSE), pp 1510–1522
Hattori L, Lanza M (2009) Mining the history of synchronous changes to refine code ownership. In: 6th international working conference on mining software repositories (MSR), pp 141–150
Hayashi S, Thangthumachit S, Saeki M (2013) Rediffs: Refactoring-aware difference viewer for Java. In: 20th working conference on reverse engineering (WCRE), pp 487–488
Higo Y, Hayashi S, Kusumoto S (2020) On tracking Java methods with git mechanisms. J Syst Soft (JSS) pp 165
Hinkle D, Wiersma W, Jurs S (2003) Applied statistics for the behavioral sciences. Houghton Mifflin, Boston
Hora A, Robbes R (2020) Characteristics of method extractions in Java: A large scale empirical study. Empir Softw Eng 25:1798–1833
Hora A, Silva D, Robbes R, Valente MT (2018) Assessing the threat of untracked changes in software evolution. In: 40th international conference on software engineering (ICSE), pp 1102–1113
Iammarino M, Zampetti F, Aversano L, Penta MD (2019) Self-admitted technical debt removal and refactoring actions: Co-occurrence or more?. In: 35th international conference on software maintenance and evolution (ICSME), pp 186–190
Jiang Y, Liu H, Niu N, Zhang L, Hu Y (2021) Extracting concise bug-fixing patches from human-written patches in version control systems. In: 43rd international conference on software engineering (ICSE), pp 1–13
Jiau HC, Mar LW, Chen JC (2013) OBEY: Optimal batched refactoring plan execution for class responsibility redistribution. Trans Soft Eng 39 (9):1245–1263
Kim J, Batory D, Dig D, Azanza M (2016) Improving refactoring speed by 10x. In: 38th international conference on software engineering (ICSE), pp 1145–1156
Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: A refactoring reconstruction tool based on logic query templates. In: 8th international symposium on foundations of software engineering (FSE), pp 371–372
Kim M, Zimmermann T, Nagappan N (2012) A field study of refactoring challenges and benefits. In: 20th international symposium on the foundations of software engineering (FSE), pp 50:1–50:11
Kim M, Zimmermann T, Nagappan N (2014) An empirical study of refactoring challenge and benefits at Microsoft. Trans Soft Eng 40(7):633–649
Kim S, Zimmermann T, Pan K, Whitehead EJJ (2006) Automatic identification of bug-introducing changes. In: 21st international conference on automated software engineering (ASE), pp 81–90
Leung C (2010) Technical notes on extending gSpan to directed graphs. Tech. rep., Singapore Management University
Lin B, Nagy C, Bavota G, Lanza M (2019) On the impact of refactoring operations on code naturalness. In: 26th international conference on software analysis, evolution and reengineering (SANER), pp 594–598
Lin Y, Peng X, Cai Y, Dig D, Zheng D, Zhao W (2016) Interactive and guided architectural refactoring with search-based recommendation. In: 24th international symposium on foundations of software engineering (FSE), pp 535–546
Mahmoudi M, Nadi S, Tsantalis N (2019) Are refactorings to blame? an empirical study of refactorings in merge conflicts. In: 26th international conference on software analysis, evolution and reengineering (SANER), pp 151–162
Mazinanian D, Ketkar A, Tsantalis N, Dig D (2017) Understanding the use of lambda expressions in Java. Program Lang 1(85):85:1–85:31
Meananeatra P (2012) Identifying refactoring sequences for improving software maintainability. In: 27th international conference on automated software engineering (ASE), pp 406–409
Meneely A, Williams O (2012) Interactive churn metrics: socio-technical variants of code churn. Softw Eng Notes 37(6)
Murphy-Hill E, Parnin C, Black AP (2009) How we refactor, and how we know it. In: 31st international conference on software engineering (ICSE), pp 287–297
Negara S, Chen N, Vakilian M, Johnson RE, Dig D (2013) A comparative study of manual and automated refactorings. In: 27th european conference on object-oriented programming (ECOOP), pp 552–576
Neto EC, da Costa DA, Kulesza U (2018) The impact of refactoring changes on the SZZ algorithm: An empirical study. In: 25th international conference on software analysis, evolution and reengineering (SANER), pp 380–390
Paixao M, Uchôa A., Bibiano AC, Oliveira D, Garcia A, Krinke J, Arvonio E (2020) Behind the intents: An in-depth empirical study on software refactoring in modern code review. In: 17th international conference on mining software repositories (MSR), pp 125–136
Palomba F, Zaidman A, Oliveto R, Lucia AD (2017) An exploratory study on the relationship between changes and refactoring. In: 25th international conference on program comprehension (ICPC), pp 176–185
Pantiuchina J, Zampetti F, Scalabrino S, Piantadosi V, Oliveto R, Bavota G, Penta MD (2020) Why developers refactor source code: A mining-based study. ACM Trans Softw Eng Methodol 37(4):1–32
Peruma A, Mkaouer M, Decker M, Newman C (2018) An empirical investigation of how and why developers rename identifiers. In: 2nd international workshop on refactoring (IWoR), pp 26–33
Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: 33rd international conference on software engineering (ICSE), pp 491–500
Rahman F, Posnett D, Hindle A, Barr E, Devanbu P (2011) BugCache for inspections: hit or miss?. In: 19th international symposium on the foundations of software engineering (FSE), pp 322–331
Ray B, Hellendoorn V, Godhane S, Tu Z, Bacchelli A, Devanbu P (2016) On the naturalness of buggy code. In: 38th international conference on software engineering (ICSE), pp 428–439
Shen B, Zhang W, Zhao H, Liang G, Jin Z, Wang Q (2019) IntelliMerge: A refactoring-aware software merging technique. Program Lang 3(170):170:1–170:28
Silva D, da Silva JP, Santos G, Terra R, Valente MT (2021) RefDiff 2.0: A multi-language refactoring detection tool. IEEE Trans Softw Eng 1 (1):1–17
Silva D, Tsantalis N, Valente MT (2016) Why we refactor? Confessions of GitHub contributors. In: 24th international symposium on the foundations of software engineering (FSE), pp 858–870
Silva D, Valente MT (2017) RefDiff: Detecting refactorings in version histories. In: 14th international conference on mining software repositories (MSR), pp 269–279
Sousa L, Cedrim D, Garcia A, Oizumi W, Bibiano AC, Oliveira D, Kim M, Oliveira A (2020) Characterizing and identifying composite refactorings: Concepts, heuristics and patterns. In: 17th international conference on mining software repositories (MSR), pp 186–197
Spadini D, Aniche M, Bacchelli A (2018) PyDriller: Python framework for mining software repositories. In: 26th software engineering conference and symposium on the foundations of software engineering (FSE), pp 908–911
Spinellis D (2017) A repository of Unix history and evolution. Empir Softw Eng 22(3):1372–1404
Szóke G, Nagy C, Ferenc R, Gyimóthy T (2016) Designing and developing automated refactoring transformations: An experience report. In: 23rd international conference on software analysis, evolution, and reengineering (SANER), pp 693–697
Tenorio D, Bibiano AC, Garcia A (2019) On the customization of batch refactoring. In: 3rd international workshop on refactoring (IWOR), pp 13–16
Terra R, Valente MT, Miranda S, Sales V (2018) JMove: A novel heuristic and tool to detect move method refactoring opportunities. J Syst Soft 138:19–36
Tsantalis N, Guana V, Stroulia E, Hindle A (2013) A multidimensional empirical study on refactoring activity. In: 23th conference of the center for advanced studies on collaborative research (CASCON), pp 132–146
Tsantalis N, Ketkar A, Dig D (2020) RefactoringMiner 2.0 IEEE Trans Softw Eng
Tsantalis N, Mansouri M, Eshkevari LM, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: 40th international conference on software engineering (ICSE), pp 483–494
Vassallo C, Grano G, Palomba F, Gall H, Bacchelli A (2019) A large-scale empirical exploration on refactoring activities in open source software projects. Sci Comput Program 180:1–15
Wang Y (2009) What motivate software engineers to refactor source code? evidences from professional developers. In: International conference on software maintenance (ICSM), pp 413–416
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: 2nd international conference on data mining (ICDM), pp 721–724
Zimmermann T, Kim S, Zeller A, Whitehead Jr. EJ (2006) Mining version archives for co-changed lines. In: 3rd international workshop on mining software repositories (MSR), pp 72–75
Acknowledgements
This research is supported by grants from FAPEMIG, CNPq, and CAPES.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Robert Feldt and Thomas Zimmermann
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Brito, A., Hora, A. & Valente, M.T. Characterizing refactoring graphs in Java and JavaScript projects. Empir Software Eng 26, 125 (2021). https://doi.org/10.1007/s10664-021-10023-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10023-3