skip to main content
10.1145/3661167.3661183acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
short-paper

On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?

Published: 18 June 2024 Publication History

Abstract

Code review is a critical but time-consuming process for ensuring code quality in modern software engineering. To alleviate the effort of reviewing source code, recent studies have investigated the possibility of automating the review process. Moreover, tools based on large language models such as ChatGPT are playing an increasingly important role in this vision. Understanding how these tools are used during code review can provide valuable insights for code review automation.
This study investigates for what purposes developers use ChatGPT during code review and how developers react to the information and suggestions provided by ChatGPT. We manually analyze 229 review comments in 205 pull requests from 179 projects. We find that developers often use ChatGPT for outsourcing their work as frequently as asking for references. Moreover, we observe that only 30.7% of responses to the answers provided by ChatGPT are negative. We further analyze the reasons behind the negative reactions. Our results provide valuable insights for improving the effectiveness of LLMs in code reviews.

References

[1]
Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: a customized sentiment analysis tool for code review interactions. In Proc. of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 106–111.
[2]
Eman Abdullah AlOmar, Anushkrishna Venkatakrishnan, Mohamed Wiem Mkaouer, Christian D. Newman, and Ali Ouni. 2024. How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations. CoRR abs/2402.06013 (2024).
[3]
Alberto Bacchelli and Christian Bird. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. In Proc. of the 2013 International Conference on Software Engineering (ICSE ’13). 712–721.
[4]
Gabriele Bavota and Barbara Russo. 2015. Four eyes are better than two: On the impact of code reviews on software quality. In Proc. of the 2015 International Conference on Software Maintenance and Evolution (ICSME’15). 81–90.
[5]
Amiangshu Bosu and Jeffrey C. Carver. 2013. Impact of Peer Code Review on Peer Impression Formation: A Survey. In Proc. of the 2013 International Symposium on Empirical Software Engineering and Measurement (ESEM’13). 133–142.
[6]
Amiangshu Bosu, Jeffrey C. Carver, Munawar Hafiz, Patrick Hilley, and Derek Janni. 2014. Identifying the characteristics of vulnerable code changes: an empirical study. In Proc. of the 22nd International Symposium on Foundations of Software Engineering (FSE’14). 257–268.
[7]
Carolyn D. Egelman, Emerson R. Murphy-Hill, Elizabeth Kammer, Margaret Morrow Hodges, Collin Green, Ciera Jaspan, and James Lin. 2020. Predicting developers’ negative feelings about code review. In Proc. of the 42nd International Conference on Software Engineering (ICSE’20). 174–185.
[8]
Isabella Ferreira, Jinghui Cheng, and Bram Adams. 2021. The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions. Proc. ACM Hum. Comput. Interact. 5 (2021), 353:1–353:35.
[9]
Daisuke Fukumoto, Yutaro Kashiwa, Toshiki Hirao, Kenji Fujiwara, and Hajimu Iida. 2023. An Empirical Investigation on the Performance of Domain Adaptation for T5 Code Completion. In Proc. of the 30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’23). 693–697.
[10]
Qi Guo, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, and Xin Peng. 2024. Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study. In Proc. of the 46th IEEE/ACM International Conference on Software Engineering (ICSE’24). 34:1–34:13.
[11]
Vincent J. Hellendoorn, Jason Tsay, Manisha Mukherjee, and Martin Hirzel. 2021. Towards automating code review at scale. In Proc. of the 29th Symposium on the Foundations of Software Engineering (FSE’21). 1479–1482.
[12]
Kailun Jin, Chung-Yu Wang, Hung Viet Pham, and Hadi Hemmati. 2024. Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation. In Proc. of the International Conference on Mining Software Repositories (MSR 2024).
[13]
Samia Kabir, David N. Udo-Imeh, Bonan Kou, and Tianyi Zhang. 2023. Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions. CoRR abs/2308.02312 (2023).
[14]
Yutaro Kashiwa, Ryoma Nishikawa, Yasutaka Kamei, Masanari Kondo, Emad Shihab, Ryosuke Sato, and Naoyasu Ubayashi. 2022. An empirical study on self-admitted technical debt in modern code review. Information and Software Technology (IST) 146 (2022), 106855.
[15]
Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code review quality: how developers see it. In Proc. of the 38th International Conference on Software Engineering (ICSE’16). 1028–1038.
[16]
J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33 (1977), 159–174.
[17]
Tsz On Li, Wenxi Zong, Yibo Wang, Haoye Tian, Ying Wang, Shing-Chi Cheung, and Jeff Kramer. 2023. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In Proc. of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23). 14–26.
[18]
Zhiyu Li, Shuai Lu, Daya Guo, Nan Duan, Shailesh Jannu, Grant Jenks, Deep Majumder, Jared Green, Alexey Svyatkovskiy, Shengyu Fu, and Neel Sundaresan. 2022. Automating code review activities by large-scale pre-training. In Proc. of the 30th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE’22). 1035–1047.
[19]
Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li Li, Xuan-Bach D. Le, and David Lo. 2024. Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues. ACM Trans. Softw. Eng. Methodol. (2024).
[20]
Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2014. The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects. In Proc. of the 11th Working Conference on Mining Software Repositories (MSR’14). 192–201.
[21]
Matheus Paixão, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, and Mark Harman. 2021. The Impact of Code Review on Architectural Changes. IEEE Trans. Software Eng. 47, 5 (2021), 1041–1059.
[22]
Aniket Potdar and Emad Shihab. 2014. An Exploratory Study on Self-Admitted Technical Debt. In Proc. of the 30th International Conference on Software Maintenance and Evolution (ICSME’14). 91–100.
[23]
Peter C. Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Proc. of the 2013 Joint Meeting of the European Software Engineering Conference (FSE’13). 202–212.
[24]
Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern code review: a case study at google. In Proc. of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’18). 181–190.
[25]
Takafumi Sakura, Ryo Soga, Hideyuki Kanuka, Kazumasa Shimari, and Takashi Ishio. 2023. Leveraging Execution Trace with ChatGPT: A Case Study on Automated Fault Diagnosis. In Proc. of the 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023). 397–402.
[26]
Jaydeb Sarker, Sayma Sultana, Steven R. Wilson, and Amiangshu Bosu. 2023. ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments. In Proc. of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’23). 1–12.
[27]
Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, and Xuan Huo. 2019. Automatic Code Review by Learning the Revision of Source Code. In Proc. of the 33th AAAI Conference on Artificial Intelligence (AAAI’19). 4910–4917.
[28]
Dominik Sobania, Martin Briesch, Carol Hanna, and Justyna Petke. 2023. An Analysis of the Automatic Bug Fixing Performance of ChatGPT. In Proc. of the 2023 International Workshop on Automated Program Repair (APR’23). 23–30.
[29]
Patanamon Thongtanunam, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. 2022. AutoTransform: Automated Code Transformation to Support Modern Code Review Process. In Proc. of the 44th International Conference on Software Engineering (ICSE’22). 237–248.
[30]
Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On learning meaningful code changes via neural machine translation. In Proc. of the 41st International Conference on Software Engineering (ICSE’19). 25–36.
[31]
Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota. 2022. Using Pre-Trained Models to Boost Code Review Automation. In Proc. of the 44th International Conference on Software Engineering (ICSE’22). 2291–2302.
[32]
Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In Proc. of the 43rd International Conference on Software Engineering (ICSE’21). 163–174.
[33]
Tao Xiao, Christoph Treude, Hideaki Hata, and Kenichi Matsumoto. 2024. DevGPT: Studying Developer-ChatGPT Conversations. In Proc. of the International Conference on Mining Software Repositories (MSR 2024).

Cited By

View all
  • (2024)RevToken: A Token-Level Review Recommendation: How Far Are We?2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00068(654-659)Online publication date: 6-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
June 2024
728 pages
ISBN:9798400717017
DOI:10.1145/3661167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ChatGPT
  2. Code Review
  3. Empirical Study

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

Conference

EASE 2024

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)331
  • Downloads (Last 6 weeks)60
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)RevToken: A Token-Level Review Recommendation: How Far Are We?2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00068(654-659)Online publication date: 6-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media