short-paper

On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?

Authors:

Yutaro Kashiwa,

Ken'Ichi Yamaguchi,

Hajimu IidaAuthors Info & Claims

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

Pages 375 - 380

https://doi.org/10.1145/3661167.3661183

Published: 18 June 2024 Publication History

Abstract

Code review is a critical but time-consuming process for ensuring code quality in modern software engineering. To alleviate the effort of reviewing source code, recent studies have investigated the possibility of automating the review process. Moreover, tools based on large language models such as ChatGPT are playing an increasingly important role in this vision. Understanding how these tools are used during code review can provide valuable insights for code review automation.

This study investigates for what purposes developers use ChatGPT during code review and how developers react to the information and suggestions provided by ChatGPT. We manually analyze 229 review comments in 205 pull requests from 179 projects. We find that developers often use ChatGPT for outsourcing their work as frequently as asking for references. Moreover, we observe that only 30.7% of responses to the answers provided by ChatGPT are negative. We further analyze the reasons behind the negative reactions. Our results provide valuable insights for improving the effectiveness of LLMs in code reviews.

References

[1]

Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: a customized sentiment analysis tool for code review interactions. In Proc. of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 106–111.

Digital Library

[2]

Eman Abdullah AlOmar, Anushkrishna Venkatakrishnan, Mohamed Wiem Mkaouer, Christian D. Newman, and Ali Ouni. 2024. How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations. CoRR abs/2402.06013 (2024).

[3]

Alberto Bacchelli and Christian Bird. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. In Proc. of the 2013 International Conference on Software Engineering (ICSE ’13). 712–721.

[4]

Gabriele Bavota and Barbara Russo. 2015. Four eyes are better than two: On the impact of code reviews on software quality. In Proc. of the 2015 International Conference on Software Maintenance and Evolution (ICSME’15). 81–90.

Digital Library

[5]

Amiangshu Bosu and Jeffrey C. Carver. 2013. Impact of Peer Code Review on Peer Impression Formation: A Survey. In Proc. of the 2013 International Symposium on Empirical Software Engineering and Measurement (ESEM’13). 133–142.

[6]

Amiangshu Bosu, Jeffrey C. Carver, Munawar Hafiz, Patrick Hilley, and Derek Janni. 2014. Identifying the characteristics of vulnerable code changes: an empirical study. In Proc. of the 22nd International Symposium on Foundations of Software Engineering (FSE’14). 257–268.

Digital Library

[7]

Carolyn D. Egelman, Emerson R. Murphy-Hill, Elizabeth Kammer, Margaret Morrow Hodges, Collin Green, Ciera Jaspan, and James Lin. 2020. Predicting developers’ negative feelings about code review. In Proc. of the 42nd International Conference on Software Engineering (ICSE’20). 174–185.

[8]

Isabella Ferreira, Jinghui Cheng, and Bram Adams. 2021. The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions. Proc. ACM Hum. Comput. Interact. 5 (2021), 353:1–353:35.

Digital Library

[9]

Daisuke Fukumoto, Yutaro Kashiwa, Toshiki Hirao, Kenji Fujiwara, and Hajimu Iida. 2023. An Empirical Investigation on the Performance of Domain Adaptation for T5 Code Completion. In Proc. of the 30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’23). 693–697.

[10]

Qi Guo, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, and Xin Peng. 2024. Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study. In Proc. of the 46th IEEE/ACM International Conference on Software Engineering (ICSE’24). 34:1–34:13.

Digital Library

[11]

Vincent J. Hellendoorn, Jason Tsay, Manisha Mukherjee, and Martin Hirzel. 2021. Towards automating code review at scale. In Proc. of the 29th Symposium on the Foundations of Software Engineering (FSE’21). 1479–1482.

Digital Library

[12]

Kailun Jin, Chung-Yu Wang, Hung Viet Pham, and Hadi Hemmati. 2024. Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation. In Proc. of the International Conference on Mining Software Repositories (MSR 2024).

[13]

Samia Kabir, David N. Udo-Imeh, Bonan Kou, and Tianyi Zhang. 2023. Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions. CoRR abs/2308.02312 (2023).

[14]

Yutaro Kashiwa, Ryoma Nishikawa, Yasutaka Kamei, Masanari Kondo, Emad Shihab, Ryosuke Sato, and Naoyasu Ubayashi. 2022. An empirical study on self-admitted technical debt in modern code review. Information and Software Technology (IST) 146 (2022), 106855.

Digital Library

[15]

Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code review quality: how developers see it. In Proc. of the 38th International Conference on Software Engineering (ICSE’16). 1028–1038.

Digital Library

[16]

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33 (1977), 159–174.

[17]

Tsz On Li, Wenxi Zong, Yibo Wang, Haoye Tian, Ying Wang, Shing-Chi Cheung, and Jeff Kramer. 2023. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In Proc. of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23). 14–26.

Digital Library

[18]

Zhiyu Li, Shuai Lu, Daya Guo, Nan Duan, Shailesh Jannu, Grant Jenks, Deep Majumder, Jared Green, Alexey Svyatkovskiy, Shengyu Fu, and Neel Sundaresan. 2022. Automating code review activities by large-scale pre-training. In Proc. of the 30th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE’22). 1035–1047.

Digital Library

[19]

Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li Li, Xuan-Bach D. Le, and David Lo. 2024. Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues. ACM Trans. Softw. Eng. Methodol. (2024).

[20]

Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2014. The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects. In Proc. of the 11th Working Conference on Mining Software Repositories (MSR’14). 192–201.

Digital Library

[21]

Matheus Paixão, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, and Mark Harman. 2021. The Impact of Code Review on Architectural Changes. IEEE Trans. Software Eng. 47, 5 (2021), 1041–1059.

[22]

Aniket Potdar and Emad Shihab. 2014. An Exploratory Study on Self-Admitted Technical Debt. In Proc. of the 30th International Conference on Software Maintenance and Evolution (ICSME’14). 91–100.

Digital Library

[23]

Peter C. Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Proc. of the 2013 Joint Meeting of the European Software Engineering Conference (FSE’13). 202–212.

[24]

Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern code review: a case study at google. In Proc. of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’18). 181–190.

Digital Library

[25]

Takafumi Sakura, Ryo Soga, Hideyuki Kanuka, Kazumasa Shimari, and Takashi Ishio. 2023. Leveraging Execution Trace with ChatGPT: A Case Study on Automated Fault Diagnosis. In Proc. of the 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023). 397–402.

[26]

Jaydeb Sarker, Sayma Sultana, Steven R. Wilson, and Amiangshu Bosu. 2023. ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments. In Proc. of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’23). 1–12.

[27]

Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, and Xuan Huo. 2019. Automatic Code Review by Learning the Revision of Source Code. In Proc. of the 33th AAAI Conference on Artificial Intelligence (AAAI’19). 4910–4917.

Digital Library

[28]

Dominik Sobania, Martin Briesch, Carol Hanna, and Justyna Petke. 2023. An Analysis of the Automatic Bug Fixing Performance of ChatGPT. In Proc. of the 2023 International Workshop on Automated Program Repair (APR’23). 23–30.

[29]

Patanamon Thongtanunam, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. 2022. AutoTransform: Automated Code Transformation to Support Modern Code Review Process. In Proc. of the 44th International Conference on Software Engineering (ICSE’22). 237–248.

Digital Library

[30]

Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On learning meaningful code changes via neural machine translation. In Proc. of the 41st International Conference on Software Engineering (ICSE’19). 25–36.

Digital Library

[31]

Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota. 2022. Using Pre-Trained Models to Boost Code Review Automation. In Proc. of the 44th International Conference on Software Engineering (ICSE’22). 2291–2302.

Digital Library

[32]

Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In Proc. of the 43rd International Conference on Software Engineering (ICSE’21). 163–174.

Digital Library

[33]

Tao Xiao, Christoph Treude, Hideaki Hata, and Kenichi Matsumoto. 2024. DevGPT: Studying Developer-ChatGPT Conversations. In Proc. of the International Conference on Mining Software Repositories (MSR 2024).

Cited By

Morikawa YKashiwa YFujiwara KIida H(2024)RevToken: A Token-Level Review Recommendation: How Far Are We?2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00068(654-659)Online publication date: 6-Oct-2024
https://doi.org/10.1109/ICSME58944.2024.00068

Index Terms

On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?
1. Software and its engineering
  1. Software creation and management

Recommendations

Using Code from ChatGPT: Finding Patterns in the Developers’ Interaction with ChatGPT
Reuse and Software Quality
Abstract
ChatGPT can advise developers and provide code on how to fix bugs, add new features, refactor, reuse, and secure their code but currently, there is little knowledge about whether the developers trust ChatGPT’s responses and actually use the ...
Understanding code snippets in code reviews: a preliminary study of the OpenStack community
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension

Code review is a mature practice for software quality assurance in software development with which reviewers check the code that has been committed by developers, and verify the quality of code. During the code review discussions, reviewers and ...
Demystifying code snippets in code reviews: a study of the OpenStack and Qt communities and a practitioner survey
Abstract
Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

June 2024

728 pages

ISBN:9798400717017

DOI:10.1145/3661167

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

Conference

EASE 2024

EASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering

June 18 - 21, 2024

Salerno, Italy

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
331
Total Downloads

Downloads (Last 12 months)331
Downloads (Last 6 weeks)60

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Morikawa YKashiwa YFujiwara KIida H(2024)RevToken: A Token-Level Review Recommendation: How Far Are We?2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00068(654-659)Online publication date: 6-Oct-2024
https://doi.org/10.1109/ICSME58944.2024.00068

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten